8 research outputs found
Framing Named Entity Linking Error Types
Named Entity Linking (NEL) and relation extraction forms the backbone of Knowledge Base Population tasks. The recent rise of
large open source Knowledge Bases and the continuous focus on improving NEL performance has led to the creation of automated
benchmark solutions during the last decade. The benchmarking of NEL systems offers a valuable approach to understand a NEL
system’s performance quantitatively. However, an in-depth qualitative analysis that helps improving NEL methods by identifying error
causes usually requires a more thorough error analysis. This paper proposes a taxonomy to frame common errors and applies this
taxonomy in a survey study to assess the performance of four well-known Named Entity Linking systems on three recent gold standards.
Keywords: Named Entity Linking, Linked Data Quality, Corpora, Evaluation, Error Analysi
Same but Different: Distant Supervision for Predicting and Understanding Entity Linking Difficulty
Entity Linking (EL) is the task of automatically identifying entity mentions
in a piece of text and resolving them to a corresponding entity in a reference
knowledge base like Wikipedia. There is a large number of EL tools available
for different types of documents and domains, yet EL remains a challenging task
where the lack of precision on particularly ambiguous mentions often spoils the
usefulness of automated disambiguation results in real applications. A priori
approximations of the difficulty to link a particular entity mention can
facilitate flagging of critical cases as part of semi-automated EL systems,
while detecting latent factors that affect the EL performance, like
corpus-specific features, can provide insights on how to improve a system based
on the special characteristics of the underlying corpus. In this paper, we
first introduce a consensus-based method to generate difficulty labels for
entity mentions on arbitrary corpora. The difficulty labels are then exploited
as training data for a supervised classification task able to predict the EL
difficulty of entity mentions using a variety of features. Experiments over a
corpus of news articles show that EL difficulty can be estimated with high
accuracy, revealing also latent features that affect EL performance. Finally,
evaluation results demonstrate the effectiveness of the proposed method to
inform semi-automated EL pipelines.Comment: Preprint of paper accepted for publication in the 34th ACM/SIGAPP
Symposium On Applied Computing (SAC 2019
Named Entity Recognition -- Is there a glass ceiling?
Recent developments in Named Entity Recognition (NER) have resulted in better
and better models. However, is there a glass ceiling? Do we know which types of
errors are still hard or even impossible to correct? In this paper, we present
a detailed analysis of the types of errors in state-of-the-art machine learning
(ML) methods. Our study reveals the weak and strong points of the Stanford,
CMU, FLAIR, ELMO and BERT models, as well as their shared limitations. We also
introduce new techniques for improving annotation, for training processes and
for checking a model's quality and stability. Presented results are based on
the CoNLL 2003 data set for the English language. A new enriched semantic
annotation of errors for this data set and new diagnostic data sets are
attached in the supplementary materials.Comment: Accepted to CoNLL 201
Name Variants for Improving Entity Discovery and Linking
Identifying all names that refer to a particular set of named entities is a challenging task, as quite often we need to consider many features that include a lot of variation like abbreviations, aliases, hypocorism, multilingualism or partial matches. Each entity type can also have specific rules for name variances: people names can include titles, country and branch names are sometimes removed from organization names, while locations are often plagued by the issue of nested entities. The lack of a clear strategy for collecting, processing and computing name variants significantly lowers the recall of tasks such as Named Entity Linking and Knowledge Base Population since name variances are frequently used in all kind of textual content.
This paper proposes several strategies to address these issues. Recall can be improved by combining knowledge repositories and by computing additional variances based on algorithmic approaches. Heuristics and machine learning methods then analyze the generated name variances and mark ambiguous names to increase precision. An extensive evaluation demonstrates the effects of integrating these methods into a new Named Entity Linking framework and confirms that systematically considering name variances yields significant performance improvements
A Fair and In-Depth Evaluation of Existing End-to-End Entity Linking Systems
Existing evaluations of entity linking systems often say little about how the
system is going to perform for a particular application. There are four
fundamental reasons for this: many benchmarks focus on named entities; it is
hard to define which other entities to include; there are ambiguities in entity
recognition and entity linking; many benchmarks have errors or artifacts that
invite overfitting or lead to evaluation results of limited meaningfulness.
We provide a more meaningful and fair in-depth evaluation of a variety of
existing end-to-end entity linkers. We characterize the strengths and
weaknesses of these linkers and how well the results from the respective
publications can be reproduced. Our evaluation is based on several widely used
benchmarks, which exhibit the problems mentioned above to various degrees, as
well as on two new benchmarks, which address these problems
On the Importance of Drill-Down Analysis for Assessing Gold Standards and Named Entity Linking Performance
Rigorous evaluations and analyses of evaluation results are key towards improving Named Entity Linking systems. Nevertheless, most current evaluation tools are focused on benchmarking and comparative evaluations. Therefore, they only provide aggregated statistics such as precision, recall and F1-measure to assess system performance and no means for conducting detailed analyses up to the level of individual annotations.
This paper addresses the need for transparent benchmarking and fine-grained error analysis by introducing Orbis, an extensible framework that supports drill-down analysis, multiple annotation tasks and resource versioning. Orbis complements approaches like those deployed through the GERBIL and TAC KBP tools and helps developers to better understand and address shortcomings in their Named Entity Linking tools.
We present three uses cases in order to demonstrate the usefulness of Orbis for both research and production systems: (i)improving Named Entity Linking tools; (ii) detecting gold standard errors; and (iii) performing Named Entity Linking evaluations with multiple versions of the included resources