921 research outputs found
Clustering and Relational Ambiguity: from Text Data to Natural Data
Text data is often seen as "take-away" materials with little noise and easy
to process information. Main questions are how to get data and transform them
into a good document format. But data can be sensitive to noise oftenly called
ambiguities. Ambiguities are aware from a long time, mainly because polysemy is
obvious in language and context is required to remove uncertainty. I claim in
this paper that syntactic context is not suffisant to improve interpretation.
In this paper I try to explain that firstly noise can come from natural data
themselves, even involving high technology, secondly texts, seen as verified
but meaningless, can spoil content of a corpus; it may lead to contradictions
and background noise
Open Data Platform for Knowledge Access in Plant Health Domain : VESPA Mining
Important data are locked in ancient literature. It would be uneconomic to
produce these data again and today or to extract them without the help of text
mining technologies. Vespa is a text mining project whose aim is to extract
data on pest and crops interactions, to model and predict attacks on crops, and
to reduce the use of pesticides. A few attempts proposed an agricultural
information access. Another originality of our work is to parse documents with
a dependency of the document architecture
- …