147 research outputs found

    Linking named entities to Wikipedia

    Get PDF
    Natural language is fraught with problems of ambiguity, including name reference. A name in text can refer to multiple entities just as an entity can be known by different names. This thesis examines how a mention in text can be linked to an external knowledge base (KB), in our case, Wikipedia. The named entity linking (NEL) task requires systems to identify the KB entry, or Wikipedia article, that a mention refers to; or, if the KB does not contain the correct entry, return NIL. Entity linking systems can be complex and we present a framework for analysing their different components, which we use to analyse three seminal systems which are evaluated on a common dataset and we show the importance of precise search for linking. The Text Analysis Conference (TAC) is a major venue for NEL research. We report on our submissions to the entity linking shared task in 2010, 2011 and 2012. The information required to disambiguate entities is often found in the text, close to the mention. We explore apposition, a common way for authors to provide information about entities. We model syntactic and semantic restrictions with a joint model that achieves state-of-the-art apposition extraction performance. We generalise from apposition to examine local descriptions specified close to the mention. We add local description to our state-of-the-art linker by using patterns to extract the descriptions and matching against this restricted context. Not only does this make for a more precise match, we are also able to model failure to match. Local descriptions help disambiguate entities, further improving our state-of-the-art linker. The work in this thesis seeks to link textual entity mentions to knowledge bases. Linking is important for any task where external world knowledge is used and resolving ambiguity is fundamental to advancing research into these problems

    Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories

    Full text link
    The first stage of every knowledge base question answering approach is to link entities in the input question. We investigate entity linking in the context of a question answering task and present a jointly optimized neural architecture for entity mention detection and entity disambiguation that models the surrounding context on different levels of granularity. We use the Wikidata knowledge base and available question answering datasets to create benchmarks for entity linking on question answering data. Our approach outperforms the previous state-of-the-art system on this data, resulting in an average 8% improvement of the final score. We further demonstrate that our model delivers a strong performance across different entity categories.Comment: Accepted as *SEM 2018 Long Paper (co-located with NAACL 2018), 9 page

    Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications

    Full text link
    In this paper, we describe our effort to create a new corpus for the evaluation of detecting and linking so-called survey variables in social science publications (e.g., "Do you believe in Heaven?"). The task is to recognize survey variable mentions in a given text, disambiguate them, and link them to the corresponding variable within a knowledge base. Since there are generally hundreds of candidates to link to and due to the wide variety of forms they can take, this is a challenging task within NLP. The contribution of our work is the first gold standard corpus for the variable detection and linking task. We describe the annotation guidelines and the annotation process. The produced corpus is multilingual - German and English - and includes manually curated word and phrase alignments. Moreover, it includes text samples that could not be assigned to any variables, denoted as negative examples. Based on the new dataset, we conduct an evaluation of several state-of-the-art text classification and textual similarity methods. The annotated corpus is made available along with an open-source baseline system for variable mention identification and linking

    Inducing Implicit Arguments via Cross-document Alignment: A Framework and its Applications

    Get PDF
    Natural language texts frequently contain related information in different positions in discourse. As human readers, we can recognize such information across sentence boundaries and correctly infer relations between them. Given this inference capability, we understand texts that describe complex dependencies even if central aspects are not repeated in every sentence. In linguistics, certain omissions of redundant information are known under the term ellipsis and have been studied as cohesive devices in discourse (Halliday and Hasan, 1976). For computational approaches to semantic processing, such cohesive devices are problematic because methods are traditionally applied on the sentence level and barely take surrounding context into account. In this dissertation, we investigate omission phenomena on the level of predicate-argument structures. In particular, we examine instances of structures involving arguments that are not locally realized but inferable from context. The goal of this work is to automatically acquire and process such instances, which we also refer to as implicit arguments, to improve natural language processing applications. Our main contribution is a framework that identifies implicit arguments by aligning and comparing predicate-argument structures across pairs of comparable texts. As part of this framework, we develop a novel graph-based clustering approach, which detects corresponding predicate-argument structures using pairwise similarity metrics. To find discourse antecedents of implicit arguments, we further design a heuristic method that utilizes automatic annotations from various linguistic pre-processing tools. We empirically validate the utility of automatically induced instances of implicit arguments and discourse antecedents in three extrinsic evaluation scenarios. In the first scenario, we show that our induced pairs of arguments and antecedents can successfully be applied to improve a pre-existing model for linking implicit arguments in discourse. In two further evaluation settings, we show that induced instances of implicit arguments, together with their aligned explicit counterparts, can be used as training material for a novel model of local coherence. Given discourse-level and semantic features, this model can predict whether a specific argument should be explicitly realized to establish local coherence or whether it is inferable and hence redundant in context
    • …
    corecore