Search CORE

18,778 research outputs found

Good Applications for Crummy Entity Linkers? The Case of Corpus Selection in Digital Humanities

Author: Beelen Kaspar
Kamps Jaap
Marx Maarten
Olieman Alex
van Lange Milan
Publication venue
Publication date: 01/01/2017
Field of study

Over the last decade we have made great progress in entity linking (EL) systems, but performance may vary depending on the context and, arguably, there are even principled limitations preventing a "perfect" EL system. This also suggests that there may be applications for which current "imperfect" EL is already very useful, and makes finding the "right" application as important as building the "right" EL system. We investigate the Digital Humanities use case, where scholars spend a considerable amount of time selecting relevant source texts. We developed WideNet; a semantically-enhanced search tool which leverages the strengths of (imperfect) EL without getting in the way of its expert users. We evaluate this tool in two historical case-studies aiming to collect a set of references to historical periods in parliamentary debates from the last two decades; the first targeted the Dutch Golden Age, and the second World War II. The case-studies conclude with a critical reflection on the utility of WideNet for this kind of research, after which we outline how such a real-world application can help to improve EL technology in general.Comment: Accepted for presentation at SEMANTiCS '1

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Robustness Evaluation of Entity Disambiguation Using Prior Probes: the Case of Entity Overshadowing

Author: Bhargav S.
Kanoulas E.
Provatorova V.
Vakulenko S.
Publication venue
Publication date: 01/01/2021
Field of study

Entity disambiguation (ED) is the last step of entity linking (EL), when candidate entities are reranked according to the context they appear in. All datasets for training and evaluating models for EL consist of convenience samples, such as news articles and tweets, that propagate the prior probability bias of the entity distribution towards more frequently occurring entities. It was previously shown that the performance of the EL systems on such datasets is overestimated since it is possible to obtain higher accuracy scores by merely learning the prior. To provide a more adequate evaluation benchmark, we introduce the ShadowLink dataset, which includes 16K short text snippets annotated with entity mentions. We evaluate and report the performance of popular EL systems on the ShadowLink benchmark. The results show a considerable difference in accuracy between more and less common entities for all of the EL systems under evaluation, demonstrating the effects of prior probability bias and entity overshadowing

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Ontologies across disciplines

Author: Nickles Matthias
Pease Adam
Schalley Andrea C.
Schalley Andrea C.
Zaefferer Dietmar
Zaefferer Dietmar
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2007
Field of study

Open Access LMU

Machine Translation: Phrase-Based, Rule-Based and Neural Approaches with Linguistic Evaluation

Author: Avramidis Eleftherios
Burchardt Aljoscha
Helcl Jindrich
Macketanz Vivien
Srivastava Ankit
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 26/06/2017
Field of study

Edinburgh Research Explorer

The role of knowledge in determining identity of long-tail entities

Author: Hovy Eduard
Ilievski Filip
Schlobach Stefan
Vossen Piek
Xie Qizhe
Publication venue: 'Elsevier BV'
Publication date: 01/03/2020
Field of study

The NIL entities do not have an accessible representation, which means that their identity cannot be established through traditional disambiguation. Consequently, they have received little attention in entity linking systems and tasks so far. Given the non-redundancy of knowledge on NIL entities, the lack of frequency priors, their potentially extreme ambiguity, and numerousness, they form an extreme class of long-tail entities and pose a great challenge for state-of-the-art systems. In this paper, we investigate the role of knowledge when establishing the identity of NIL entities mentioned in text. What kind of knowledge can be applied to establish the identity of NILs? Can we potentially link to them at a later point? How to capture implicit knowledge and fill knowledge gaps in communication? We formulate and test hypotheses to provide insights to these questions. Due to the unavailability of instance-level knowledge, we propose to enrich the locally extracted information with profiling models that rely on background knowledge in Wikidata. We describe and implement two profiling machines based on state-of-the-art neural models. We evaluate their intrinsic behavior and their impact on the task of determining identity of NIL entities

VU Research Portal

Good Applications for Crummy Entity Linkers? The Case of Corpus Selection in Digital Humanities

Author: Beelen K.
Kamps J.
Marx M.
Olieman A.
van Lange M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

International Migration, Integration and Social Cohesion online publications

Workshop on Extracting and Using Constructions in Computational Linguistics

Author: Knutsson Ola
Sahlgren Magnus
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Crosstalk and the spectrum of biological global broadcasts: Toward generalization of the Baars consciousness model across physiological subsystems

Author: Rodrick Wallace
Publication venue
Publication date: 17/02/2012
Field of study

Once cognitive biological phenomena are recognized as necessarily having 'dual' information sources, it is easy to show that the information theory chain rule implies isolating coresident information sources from crosstalk requires more metabolic free energy than permitting correlation. This provides conditions for an evolutionary exaptation leading to dynamic global broadcasts of interacting cognitive biological processes analogous to, but slower than, consciousness, itself included within the paradigm. The argument is closely analogous to the well-studied exaptation of noise to trigger stochastic resonance amplification in physiological systems

Nature Precedings