4,517 research outputs found

    Linking named entities to Wikipedia

    Get PDF
    Natural language is fraught with problems of ambiguity, including name reference. A name in text can refer to multiple entities just as an entity can be known by different names. This thesis examines how a mention in text can be linked to an external knowledge base (KB), in our case, Wikipedia. The named entity linking (NEL) task requires systems to identify the KB entry, or Wikipedia article, that a mention refers to; or, if the KB does not contain the correct entry, return NIL. Entity linking systems can be complex and we present a framework for analysing their different components, which we use to analyse three seminal systems which are evaluated on a common dataset and we show the importance of precise search for linking. The Text Analysis Conference (TAC) is a major venue for NEL research. We report on our submissions to the entity linking shared task in 2010, 2011 and 2012. The information required to disambiguate entities is often found in the text, close to the mention. We explore apposition, a common way for authors to provide information about entities. We model syntactic and semantic restrictions with a joint model that achieves state-of-the-art apposition extraction performance. We generalise from apposition to examine local descriptions specified close to the mention. We add local description to our state-of-the-art linker by using patterns to extract the descriptions and matching against this restricted context. Not only does this make for a more precise match, we are also able to model failure to match. Local descriptions help disambiguate entities, further improving our state-of-the-art linker. The work in this thesis seeks to link textual entity mentions to knowledge bases. Linking is important for any task where external world knowledge is used and resolving ambiguity is fundamental to advancing research into these problems

    Unsupervised entity linking using graph-based semantic similarity

    Get PDF
    Nowadays, the human textual data constitutes a great proportion of the shared information resources such as World Wide Web (WWW). Social networks, news and learning resources as well as Knowledge Bases (KBs) are just the small examples that widely contain the textual data which is used by both human and machine readers. The nature of human languages is highly ambiguous, means that a short portion of a textual context (such as words or phrases) can semantically be interpreted in different ways. A language processor should detect the best interpretation depending on the context in which each word or phrase appears. In case of human readers, the brain is quite proficient in interfering textual data. Human language developed in a way that reflects the innate ability provided by the brain’s neural networks. However, there still exist the moments that the text disambiguation task would remain a hard challenge for the human readers. In case of machine readers, it has been a long-term challenge to develop the ability to do natural language processing and machine learning. Different interpretation can change the broad range of topics and targets. The different in interpretation can cause serious impacts when it is used in critical domains that need high precision. Thus, the correctly inferring the ambiguous words would be highly crucial. To tackle it, two tasks have been developed: Word Sense Disambiguation (WSD) to infer the sense (i.e. meaning) of ambiguous words, when the word has multiple meanings, and Entity Linking (EL) (also called, Named Entity Disambiguation–NED, Named Entity Recognition and Disambiguation–NERD, or Named Entity Normalization–NEN) which is used to explore the correct reference of Named Entity (NE) mentions occurring in documents. The solution to these problems impacts other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference. This document summarizes the works towards developing an unsupervised Entity Linking (EL) system using graph-based semantic similarity aiming to disambiguate Named Entity (NE) mentions occurring in a target document. The EL task is highly challenging since each entity can usually be referred to by several NE mentions (synonymy). In addition, a NE mention may be used to indicate distinct entities (polysemy). Thus, much effort is necessary to tackle these challenges. Our EL system disambiguates the NE mentions in several steps. For each step, we have proposed, implemented, and evaluated several approaches. We evaluated our EL system in TAC-KBP4 English EL evaluation framework in which the system input consists of a set of queries, each containing a query name (target NE mention) along with start and end offsets of that mention in the target document. The output is either a NE entry id in a reference Knowledge Base (KB) or a Not-in-KB (NIL) id in the case that system could not find any appropriate entry for that query. At the end, we have analyzed our result in different aspects. To disambiguate query name we apply a graph-based semantic similarity approach to extract the network of the semantic knowledge existing in the content of target document.Este documento es un resumen del trabajo realizado para la construccion de un sistema de Entity Linking (EL) destinado a desambiguar menciones de Entidades Nombradas (Named Entities, NE) que aparecen en un documento de referencia. La tarea de EL presenta una gran dificultad ya que cada entidad puede ser mencionada de varias maneras (sinonimia). Ademas cada mencion puede referirse a mas de una entidad (polisemia). Asi pues, se debe realizar un gran esfuerzo para hacer frente a estos retos. Nuestro sistema de EL lleva a cabo la desambiguacion de las menciones de NE en varias etapas. Para cada etapa hemos propuesto, implementado y evaluado varias aproximaciones. Hemos evaluado nuestro sistema de EL en el marco del TAC-KBP English EL evaluation framework. En este marco la evaluacion se realiza a partir de una entrada que consiste en un conjunto de consultas cada una de las cuales consta de un nombre (query name) que corresponde a una mencion objetivo cuya posicion en un documento de referencia se indica. La salida debe indicar a que entidad en una base de conocimiento (Knowledge Base, KB) corresponde la mencion. En caso de no existir un referente apropiado la respuesta sera Not-in-KB (NIL). La tesis concluye con un analisis pormenorizado de los resultados obtenidos en la evaluacion.Postprint (published version
    corecore