6,731 research outputs found
Neogeography: The Challenge of Channelling Large and Ill-Behaved Data Streams
Neogeography is the combination of user generated data and experiences with mapping technologies. In this article we present a research project to extract valuable structured information with a geographic component from unstructured user generated text in wikis, forums, or SMSes. The extracted information should be integrated together to form a collective knowledge about certain domain. This structured information can be used further to help users from the same domain who want to get information using simple question answering system. The project intends to help workers communities in developing countries to share their knowledge, providing a simple and cheap way to contribute and get benefit using the available communication technology
MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach
Entity linking has recently been the subject of a significant body of
research. Currently, the best performing approaches rely on trained
mono-lingual models. Porting these approaches to other languages is
consequently a difficult endeavor as it requires corresponding training data
and retraining of the models. We address this drawback by presenting a novel
multilingual, knowledge-based agnostic and deterministic approach to entity
linking, dubbed MAG. MAG is based on a combination of context-based retrieval
on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data
sets and in 7 languages. Our results show that the best approach trained on
English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse
on datasets in other languages. MAG, on the other hand, achieves
state-of-the-art performance on English datasets and reaches a micro F-measure
that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc
Probabilistic Bag-Of-Hyperlinks Model for Entity Linking
Many fundamental problems in natural language processing rely on determining
what entities appear in a given text. Commonly referenced as entity linking,
this step is a fundamental component of many NLP tasks such as text
understanding, automatic summarization, semantic search or machine translation.
Name ambiguity, word polysemy, context dependencies and a heavy-tailed
distribution of entities contribute to the complexity of this problem.
We here propose a probabilistic approach that makes use of an effective
graphical model to perform collective entity disambiguation. Input mentions
(i.e.,~linkable token spans) are disambiguated jointly across an entire
document by combining a document-level prior of entity co-occurrences with
local information captured from mentions and their surrounding context. The
model is based on simple sufficient statistics extracted from data, thus
relying on few parameters to be learned.
Our method does not require extensive feature engineering, nor an expensive
training procedure. We use loopy belief propagation to perform approximate
inference. The low complexity of our model makes this step sufficiently fast
for real-time usage. We demonstrate the accuracy of our approach on a wide
range of benchmark datasets, showing that it matches, and in many cases
outperforms, existing state-of-the-art methods
Event-based Access to Historical Italian War Memoirs
The progressive digitization of historical archives provides new, often
domain specific, textual resources that report on facts and events which have
happened in the past; among these, memoirs are a very common type of primary
source. In this paper, we present an approach for extracting information from
Italian historical war memoirs and turning it into structured knowledge. This
is based on the semantic notions of events, participants and roles. We evaluate
quantitatively each of the key-steps of our approach and provide a graph-based
representation of the extracted knowledge, which allows to move between a Close
and a Distant Reading of the collection.Comment: 23 pages, 6 figure
- âŠ