22,735 research outputs found
Navigating multilingual news collections using automatically extracted information
We are presenting a text analysis tool set that allows analysts in various
fields to sieve through large collections of multilingual news items quickly
and to find information that is of relevance to them. For a given document
collection, the tool set automatically clusters the texts into groups of
similar articles, extracts names of places, people and organisations, lists the
user-defined specialist terms found, links clusters and entities, and generates
hyperlinks. Through its daily news analysis operating on thousands of articles
per day, the tool also learns relationships between people and other entities.
The fully functional prototype system allows users to explore and navigate
multilingual document collections across languages and time.Comment: This paper describes the main functionality of the JRC's
fully-automatic news analysis system NewsExplorer, which is freely accessible
in currently thirteen languages at http://press.jrc.it/NewsExplorer/ . 8
page
Massive migration from the steppe is a source for Indo-European languages in Europe
We generated genome-wide data from 69 Europeans who lived between 8,000-3,000
years ago by enriching ancient DNA libraries for a target set of almost four
hundred thousand polymorphisms. Enrichment of these positions decreases the
sequencing required for genome-wide ancient DNA analysis by a median of around
250-fold, allowing us to study an order of magnitude more individuals than
previous studies and to obtain new insights about the past. We show that the
populations of western and far eastern Europe followed opposite trajectories
between 8,000-5,000 years ago. At the beginning of the Neolithic period in
Europe, ~8,000-7,000 years ago, closely related groups of early farmers
appeared in Germany, Hungary, and Spain, different from indigenous
hunter-gatherers, whereas Russia was inhabited by a distinctive population of
hunter-gatherers with high affinity to a ~24,000 year old Siberian6 . By
~6,000-5,000 years ago, a resurgence of hunter-gatherer ancestry had occurred
throughout much of Europe, but in Russia, the Yamnaya steppe herders of this
time were descended not only from the preceding eastern European
hunter-gatherers, but from a population of Near Eastern ancestry. Western and
Eastern Europe came into contact ~4,500 years ago, as the Late Neolithic Corded
Ware people from Germany traced ~3/4 of their ancestry to the Yamnaya,
documenting a massive migration into the heartland of Europe from its eastern
periphery. This steppe ancestry persisted in all sampled central Europeans
until at least ~3,000 years ago, and is ubiquitous in present-day Europeans.
These results provide support for the theory of a steppe origin of at least
some of the Indo-European languages of Europe
Sentence Alignment as the Basis for Translation Memory Database
Sentence alignment represents the basis for computer-assisted translation (CAT), terminology management, term extraction, word alignment and crosslinguistic information retrieval. Created out of the sentence alignment process, translation memory (TM) represents the basis for further research in translation equivalencies. Automatic sentence alignment, based on parallel texts, faces two types of problems: robustness and discrepancies between source and target texts in layout and omissions which have an influence on the accuracy of the alignment process. The aim of the paper is to present research on the sentence alignment process carried out on the Croatian-English parallel texts (laws, regulations, acts and decisions) and implemented by the alignment tool WinAlign 7.5.0 by SDL Trados 2006 Professional. The alignment process and its impact on the creation of translation memories is presented through comparison of translation memories that differ regarding the levels of expert intervention in the set up of the alignment program and preparation of the source text for the segmentation. Recommendations for further development using statistical analysis, automatic learning techniques and language knowledge are suggested
- …