212,733 research outputs found
Boosting Entity Linking Performance by Leveraging Unlabeled Documents
Modern entity linking systems rely on large collections of documents
specifically annotated for the task (e.g., AIDA CoNLL). In contrast, we propose
an approach which exploits only naturally occurring information: unlabeled
documents and Wikipedia. Our approach consists of two stages. First, we
construct a high recall list of candidate entities for each mention in an
unlabeled document. Second, we use the candidate lists as weak supervision to
constrain our document-level entity linking model. The model treats entities as
latent variables and, when estimated on a collection of unlabelled texts,
learns to choose entities relying both on local context of each mention and on
coherence with other entities in the document. The resulting approach rivals
fully-supervised state-of-the-art systems on standard test sets. It also
approaches their performance in the very challenging setting: when tested on a
test set sampled from the data used to estimate the supervised systems. By
comparing to Wikipedia-only training of our model, we demonstrate that modeling
unlabeled documents is beneficial.Comment: ACL201
LODE: Linking Digital Humanities Content to the Web of Data
Numerous digital humanities projects maintain their data collections in the
form of text, images, and metadata. While data may be stored in many formats,
from plain text to XML to relational databases, the use of the resource
description framework (RDF) as a standardized representation has gained
considerable traction during the last five years. Almost every digital
humanities meeting has at least one session concerned with the topic of digital
humanities, RDF, and linked data. While most existing work in linked data has
focused on improving algorithms for entity matching, the aim of the
LinkedHumanities project is to build digital humanities tools that work "out of
the box," enabling their use by humanities scholars, computer scientists,
librarians, and information scientists alike. With this paper, we report on the
Linked Open Data Enhancer (LODE) framework developed as part of the
LinkedHumanities project. With LODE we support non-technical users to enrich a
local RDF repository with high-quality data from the Linked Open Data cloud.
LODE links and enhances the local RDF repository without compromising the
quality of the data. In particular, LODE supports the user in the enhancement
and linking process by providing intuitive user-interfaces and by suggesting
high-quality linking candidates using tailored matching algorithms. We hope
that the LODE framework will be useful to digital humanities scholars
complementing other digital humanities tools
Pair-Linking for Collective Entity Disambiguation: Two Could Be Better Than All
Collective entity disambiguation aims to jointly resolve multiple mentions by
linking them to their associated entities in a knowledge base. Previous works
are primarily based on the underlying assumption that entities within the same
document are highly related. However, the extend to which these mentioned
entities are actually connected in reality is rarely studied and therefore
raises interesting research questions. For the first time, we show that the
semantic relationships between the mentioned entities are in fact less dense
than expected. This could be attributed to several reasons such as noise, data
sparsity and knowledge base incompleteness. As a remedy, we introduce MINTREE,
a new tree-based objective for the entity disambiguation problem. The key
intuition behind MINTREE is the concept of coherence relaxation which utilizes
the weight of a minimum spanning tree to measure the coherence between
entities. Based on this new objective, we design a novel entity disambiguation
algorithms which we call Pair-Linking. Instead of considering all the given
mentions, Pair-Linking iteratively selects a pair with the highest confidence
at each step for decision making. Via extensive experiments, we show that our
approach is not only more accurate but also surprisingly faster than many
state-of-the-art collective linking algorithms
Systems Biology Graphical Notation: Entity Relationship language Level 1
Standard graphical representations have played a crucial role in science and engineering throughout the last century. Without electrical symbolism, it is very likely that our industrial society would not have evolved at the same pace. Similarly, specialised notations such as the Feynmann notation or the process flow diagrams did a lot for the adoption of concepts in their own fields. With the advent of Systems Biology, and more recently of Synthetic Biology, the need for precise and unambiguous descriptions of biochemical interactions has become more pressing. While some ideas have been advanced over the last decade, with a few detailed proposals, no actual community standard has emerged. The Systems Biology Graphical Notation (SBGN) is a graphical representation crafted over several years by a community of biochemists, modellers and computer scientists. Three orthogonal and complementary languages have been created, the Process Descriptions, the Entity Relationships and the Activity Flows. Using these three idioms a scientist can represent any network of biochemical interactions, which can then be interpreted in an unambiguous way. The set of symbols used is limited, and the grammar quite simple, to allow its usage in textbooks and its teaching directly in high schools. The first level of the SBGN Entity Relationship language has been publicly released. Shared by the communities of biochemists, genomicians, theoreticians and computational biologists, SBGN languages will foster efficient storage, exchange and reuse of information on signalling pathways, metabolic networks and gene regulatory maps
Learning Relatedness Measures for Entity Linking
Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowl- edge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant enti- ties selected for annotation, since this minimizes errors in disambiguating entity-linking.
The definition of an e↵ective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high-quality entity relatedness functions. First, we formalize the problem of learning entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of dif- ferent state-of-the-art entity-linking algorithms
- …