114 research outputs found
Navigating multilingual news collections using automatically extracted information
We are presenting a text analysis tool set that allows analysts in various
fields to sieve through large collections of multilingual news items quickly
and to find information that is of relevance to them. For a given document
collection, the tool set automatically clusters the texts into groups of
similar articles, extracts names of places, people and organisations, lists the
user-defined specialist terms found, links clusters and entities, and generates
hyperlinks. Through its daily news analysis operating on thousands of articles
per day, the tool also learns relationships between people and other entities.
The fully functional prototype system allows users to explore and navigate
multilingual document collections across languages and time.Comment: This paper describes the main functionality of the JRC's
fully-automatic news analysis system NewsExplorer, which is freely accessible
in currently thirteen languages at http://press.jrc.it/NewsExplorer/ . 8
page
Inclination not force is sensed by plants during shoot gravitropism
International audienceGravity perception plays a key role in how plants develop and adapt to environmental changes. However, more than a century after the pioneering work of Darwin, little is known on the sensing mechanism. Using a centrifugal device combined with growth kinematics imaging, we show that shoot gravitropic responses to steady levels of gravity in four representative angiosperm species is independent of gravity intensity. All gravitropic responses tested are dependent only on the angle of inclination from the direction of gravity. We thus demonstrate that shoot gravitropism is stimulated by sensing inclination not gravitational force or acceleration as previously believed. This contrasts with the otolith system in the internal ear of vertebrates and explains the robustness of the control of growth direction by plants despite perturbations like wind shaking. Our results will help retarget the search for the molecular mechanism linking shifting statoliths to signal transduction
Aeolian sans ripples: experimental study of saturated states
We report an experimental investigation of aeolian sand ripples, performed
both in a wind tunnel and on stoss slopes of dunes. Starting from a flat bed,
we can identify three regimes: appearance of an initial wavelength, coarsening
of the pattern and finally saturation of the ripples. We show that both initial
and final wavelengths, as well as the propagative speed of the ripples, are
linear functions of the wind velocity. Investigating the evolution of an
initially corrugated bed, we exhibit non-linear stable solutions for a finite
range of wavelengths, which demonstrates the existence of a saturation in
amplitude. These results contradict most of the models.Comment: 4 pages, 5 figures, submitted to Phys. Rev. Lett. Title changed,
figures corrected and simplified, more field data included, text clarifie
Extracting and Learning Social Networks out of Multilingual News
Various kinds of social networks can be derived from the analysis of news articles. We present here our experience in building social networks by the extraction of relationships between entities all automatically derived from multilingual news articles. Unqualified relationships between persons can be extracted through simple co-occurrence statistics. Qualified relationships can be extracted using linguistic patterns. Our highly redundant sources (50,000 daily articles in 40 languages) are used to both validate our algorithms and strengthen pertinent relationships. Due to the amount of data we process these social networks provide a complex challenge for their useful visualization and navigation.JRC.G.2-Support to external securit
The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages
We present a new, unique and freely available parallel corpus containing
European Union (EU) documents of mostly legal nature. It is available in all 20
official EUanguages, with additional documents being available in the languages
of the EU candidate countries. The corpus consists of almost 8,000 documents
per language, with an average size of nearly 9 million words per language.
Pair-wise paragraph alignment information produced by two different aligners
(Vanilla and HunAlign) is available for all 190+ language pair combinations.
Most texts have been manually classified according to the EUROVOC subject
domains so that the collection can also be used to train and test multi-label
classification algorithms and keyword-assignment software. The corpus is
encoded in XML, according to the Text Encoding Initiative Guidelines. Due to
the large number of parallel texts in many languages, the JRC-Acquis is
particularly suitable to carry out all types of cross-language research, as
well as to test and benchmark text analysis software across different languages
(for instance for alignment, sentence splitting and term extraction).Comment: A multilingual textual resource with meta-data freely available for
download at http://langtech.jrc.it/JRC-Acquis.htm
- …