18,778 research outputs found
Good Applications for Crummy Entity Linkers? The Case of Corpus Selection in Digital Humanities
Over the last decade we have made great progress in entity linking (EL)
systems, but performance may vary depending on the context and, arguably, there
are even principled limitations preventing a "perfect" EL system. This also
suggests that there may be applications for which current "imperfect" EL is
already very useful, and makes finding the "right" application as important as
building the "right" EL system. We investigate the Digital Humanities use case,
where scholars spend a considerable amount of time selecting relevant source
texts. We developed WideNet; a semantically-enhanced search tool which
leverages the strengths of (imperfect) EL without getting in the way of its
expert users. We evaluate this tool in two historical case-studies aiming to
collect a set of references to historical periods in parliamentary debates from
the last two decades; the first targeted the Dutch Golden Age, and the second
World War II. The case-studies conclude with a critical reflection on the
utility of WideNet for this kind of research, after which we outline how such a
real-world application can help to improve EL technology in general.Comment: Accepted for presentation at SEMANTiCS '1
Robustness Evaluation of Entity Disambiguation Using Prior Probes: the Case of Entity Overshadowing
Entity disambiguation (ED) is the last step of entity linking (EL), when
candidate entities are reranked according to the context they appear in. All
datasets for training and evaluating models for EL consist of convenience
samples, such as news articles and tweets, that propagate the prior probability
bias of the entity distribution towards more frequently occurring entities. It
was previously shown that the performance of the EL systems on such datasets is
overestimated since it is possible to obtain higher accuracy scores by merely
learning the prior. To provide a more adequate evaluation benchmark, we
introduce the ShadowLink dataset, which includes 16K short text snippets
annotated with entity mentions. We evaluate and report the performance of
popular EL systems on the ShadowLink benchmark. The results show a considerable
difference in accuracy between more and less common entities for all of the EL
systems under evaluation, demonstrating the effects of prior probability bias
and entity overshadowing
The role of knowledge in determining identity of long-tail entities
The NIL entities do not have an accessible representation, which means that their identity cannot be established through traditional disambiguation. Consequently, they have received little attention in entity linking systems and tasks so far. Given the non-redundancy of knowledge on NIL entities, the lack of frequency priors, their potentially extreme ambiguity, and numerousness, they form an extreme class of long-tail entities and pose a great challenge for state-of-the-art systems. In this paper, we investigate the role of knowledge when establishing the identity of NIL entities mentioned in text. What kind of knowledge can be applied to establish the identity of NILs? Can we potentially link to them at a later point? How to capture implicit knowledge and fill knowledge gaps in communication? We formulate and test hypotheses to provide insights to these questions. Due to the unavailability of instance-level knowledge, we propose to enrich the locally extracted information with profiling models that rely on background knowledge in Wikidata. We describe and implement two profiling machines based on state-of-the-art neural models. We evaluate their intrinsic behavior and their impact on the task of determining identity of NIL entities
Crosstalk and the spectrum of biological global broadcasts: Toward generalization of the Baars consciousness model across physiological subsystems
Once cognitive biological phenomena are recognized as necessarily having 'dual' information sources, it is easy to show that the information theory chain rule implies isolating coresident information sources from crosstalk requires more metabolic free energy than permitting correlation. This provides conditions for an evolutionary exaptation leading to dynamic global broadcasts of interacting cognitive biological processes analogous to, but slower than, consciousness, itself included within the paradigm. The argument is closely analogous to the well-studied exaptation of noise to trigger stochastic resonance amplification in physiological systems
- …