67,385 research outputs found
Network analysis of named entity co-occurrences in written texts
The use of methods borrowed from statistics and physics to analyze written
texts has allowed the discovery of unprecedent patterns of human behavior and
cognition by establishing links between models features and language structure.
While current models have been useful to unveil patterns via analysis of
syntactical and semantical networks, only a few works have probed the relevance
of investigating the structure arising from the relationship between relevant
entities such as characters, locations and organizations. In this study, we
represent entities appearing in the same context as a co-occurrence network,
where links are established according to a null model based on random, shuffled
texts. Computational simulations performed in novels revealed that the proposed
model displays interesting topological features, such as the small world
feature, characterized by high values of clustering coefficient. The
effectiveness of our model was verified in a practical pattern recognition task
in real networks. When compared with traditional word adjacency networks, our
model displayed optimized results in identifying unknown references in texts.
Because the proposed representation plays a complementary role in
characterizing unstructured documents via topological analysis of named
entities, we believe that it could be useful to improve the characterization of
written texts (and related systems), specially if combined with traditional
approaches based on statistical and deeper paradigms
Modeling Task Effects in Human Reading with Neural Attention
Humans read by making a sequence of fixations and saccades. They often skip
words, without apparent detriment to understanding. We offer a novel
explanation for skipping: readers optimize a tradeoff between performing a
language-related task and fixating as few words as possible. We propose a
neural architecture that combines an attention module (deciding whether to skip
words) and a task module (memorizing the input). We show that our model
predicts human skipping behavior, while also modeling reading times well, even
though it skips 40% of the input. A key prediction of our model is that
different reading tasks should result in different skipping behaviors. We
confirm this prediction in an eye-tracking experiment in which participants
answers questions about a text. We are able to capture these experimental
results using the our model, replacing the memorization module with a task
module that performs neural question answering
Open Data Platform for Knowledge Access in Plant Health Domain : VESPA Mining
Important data are locked in ancient literature. It would be uneconomic to
produce these data again and today or to extract them without the help of text
mining technologies. Vespa is a text mining project whose aim is to extract
data on pest and crops interactions, to model and predict attacks on crops, and
to reduce the use of pesticides. A few attempts proposed an agricultural
information access. Another originality of our work is to parse documents with
a dependency of the document architecture
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
- …