skrywer

avatar

Profiel van Setimes

Geskep by 2011.09.19
Gemaak deur Guest

A parallel corpus of news articles in the Balkan languages, originally extracted from http://www.setimes.com. The corpus is compiled by Nikola Ljubešić and is taken from http://www.nljubesic.net/resources/corpora/setimes provided under the CC-BY-SA license 10 languages, 45 bitexts total number of files: 90 total number of tokens: 425.89M total number of sentence fragments: 17.60M Please cite the following article if you use any part of the corpus in your own work: Jörg Tiedemann, 2009, News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces. In N. Nicolov and K. Bontcheva and G. Angelova and R. Mitkov (eds.) Recent Advances in Natural Language Processing (vol V), pages 237-248, John Benjamins, Amsterdam/Philadelphia