28,200 research outputs found
Framing Named Entity Linking Error Types
Named Entity Linking (NEL) and relation extraction forms the backbone of Knowledge Base Population tasks. The recent rise of
large open source Knowledge Bases and the continuous focus on improving NEL performance has led to the creation of automated
benchmark solutions during the last decade. The benchmarking of NEL systems offers a valuable approach to understand a NEL
system’s performance quantitatively. However, an in-depth qualitative analysis that helps improving NEL methods by identifying error
causes usually requires a more thorough error analysis. This paper proposes a taxonomy to frame common errors and applies this
taxonomy in a survey study to assess the performance of four well-known Named Entity Linking systems on three recent gold standards.
Keywords: Named Entity Linking, Linked Data Quality, Corpora, Evaluation, Error Analysi
Same but Different: Distant Supervision for Predicting and Understanding Entity Linking Difficulty
Entity Linking (EL) is the task of automatically identifying entity mentions
in a piece of text and resolving them to a corresponding entity in a reference
knowledge base like Wikipedia. There is a large number of EL tools available
for different types of documents and domains, yet EL remains a challenging task
where the lack of precision on particularly ambiguous mentions often spoils the
usefulness of automated disambiguation results in real applications. A priori
approximations of the difficulty to link a particular entity mention can
facilitate flagging of critical cases as part of semi-automated EL systems,
while detecting latent factors that affect the EL performance, like
corpus-specific features, can provide insights on how to improve a system based
on the special characteristics of the underlying corpus. In this paper, we
first introduce a consensus-based method to generate difficulty labels for
entity mentions on arbitrary corpora. The difficulty labels are then exploited
as training data for a supervised classification task able to predict the EL
difficulty of entity mentions using a variety of features. Experiments over a
corpus of news articles show that EL difficulty can be estimated with high
accuracy, revealing also latent features that affect EL performance. Finally,
evaluation results demonstrate the effectiveness of the proposed method to
inform semi-automated EL pipelines.Comment: Preprint of paper accepted for publication in the 34th ACM/SIGAPP
Symposium On Applied Computing (SAC 2019
Neural End-to-End Learning for Computational Argumentation Mining
We investigate neural techniques for end-to-end computational argumentation
mining (AM). We frame AM both as a token-based dependency parsing and as a
token-based sequence tagging problem, including a multi-task learning setup.
Contrary to models that operate on the argument component level, we find that
framing AM as dependency parsing leads to subpar performance results. In
contrast, less complex (local) tagging models based on BiLSTMs perform robustly
across classification scenarios, being able to catch long-range dependencies
inherent to the AM problem. Moreover, we find that jointly learning 'natural'
subtasks, in a multi-task learning setup, improves performance.Comment: To be published at ACL 201
An Annotated Corpus for Machine Reading of Instructions in Wet Lab Protocols
We describe an effort to annotate a corpus of natural language instructions
consisting of 622 wet lab protocols to facilitate automatic or semi-automatic
conversion of protocols into a machine-readable format and benefit biological
research. Experimental results demonstrate the utility of our corpus for
developing machine learning approaches to shallow semantic parsing of
instructional texts. We make our annotated Wet Lab Protocol Corpus available to
the research community
Visual Event Cueing in Linked Spatiotemporal Data
abstract: The media disperses a large amount of information daily pertaining to political events social movements, and societal conflicts. Media pertaining to these topics, no matter the format of publication used, are framed a particular way. Framing is used not for just guiding audiences to desired beliefs, but also to fuel societal change or legitimize/delegitimize social movements. For this reason, tools that can help to clarify when changes in social discourse occur and identify their causes are of great use. This thesis presents a visual analytics framework that allows for the exploration and visualization of changes that occur in social climate with respect to space and time. Focusing on the links between data from the Armed Conflict Location and Event Data Project (ACLED) and a streaming RSS news data set, users can be cued into interesting events enabling them to form and explore hypothesis. This visual analytics framework also focuses on improving intervention detection, allowing users to hypothesize about correlations between events and happiness levels, and supports collaborative analysis.Dissertation/ThesisMasters Thesis Computer Science 201
A Fair and In-Depth Evaluation of Existing End-to-End Entity Linking Systems
Existing evaluations of entity linking systems often say little about how the
system is going to perform for a particular application. There are four
fundamental reasons for this: many benchmarks focus on named entities; it is
hard to define which other entities to include; there are ambiguities in entity
recognition and entity linking; many benchmarks have errors or artifacts that
invite overfitting or lead to evaluation results of limited meaningfulness.
We provide a more meaningful and fair in-depth evaluation of a variety of
existing end-to-end entity linkers. We characterize the strengths and
weaknesses of these linkers and how well the results from the respective
publications can be reproduced. Our evaluation is based on several widely used
benchmarks, which exhibit the problems mentioned above to various degrees, as
well as on two new benchmarks, which address these problems
- …