11,171 research outputs found

    Topic modeling for entity linking using keyphrase

    Get PDF
    This paper proposes an Entity Linking system that applies a topic modeling ranking. We apply a novel approach in order to provide new relevant elements to the model. These elements are keyphrases related to the queries and gathered from a huge Wikipedia-based knowledge resourcePeer ReviewedPostprint (author’s final draft

    Automated Proof Reading of Clinical Notes

    Get PDF

    Unsupervised entity linking using graph-based semantic similarity

    Get PDF
    Nowadays, the human textual data constitutes a great proportion of the shared information resources such as World Wide Web (WWW). Social networks, news and learning resources as well as Knowledge Bases (KBs) are just the small examples that widely contain the textual data which is used by both human and machine readers. The nature of human languages is highly ambiguous, means that a short portion of a textual context (such as words or phrases) can semantically be interpreted in different ways. A language processor should detect the best interpretation depending on the context in which each word or phrase appears. In case of human readers, the brain is quite proficient in interfering textual data. Human language developed in a way that reflects the innate ability provided by the brain’s neural networks. However, there still exist the moments that the text disambiguation task would remain a hard challenge for the human readers. In case of machine readers, it has been a long-term challenge to develop the ability to do natural language processing and machine learning. Different interpretation can change the broad range of topics and targets. The different in interpretation can cause serious impacts when it is used in critical domains that need high precision. Thus, the correctly inferring the ambiguous words would be highly crucial. To tackle it, two tasks have been developed: Word Sense Disambiguation (WSD) to infer the sense (i.e. meaning) of ambiguous words, when the word has multiple meanings, and Entity Linking (EL) (also called, Named Entity Disambiguation–NED, Named Entity Recognition and Disambiguation–NERD, or Named Entity Normalization–NEN) which is used to explore the correct reference of Named Entity (NE) mentions occurring in documents. The solution to these problems impacts other computer-related writing, such as discourse, improving relevance of search engines, anaphora resolution, coherence, and inference. This document summarizes the works towards developing an unsupervised Entity Linking (EL) system using graph-based semantic similarity aiming to disambiguate Named Entity (NE) mentions occurring in a target document. The EL task is highly challenging since each entity can usually be referred to by several NE mentions (synonymy). In addition, a NE mention may be used to indicate distinct entities (polysemy). Thus, much effort is necessary to tackle these challenges. Our EL system disambiguates the NE mentions in several steps. For each step, we have proposed, implemented, and evaluated several approaches. We evaluated our EL system in TAC-KBP4 English EL evaluation framework in which the system input consists of a set of queries, each containing a query name (target NE mention) along with start and end offsets of that mention in the target document. The output is either a NE entry id in a reference Knowledge Base (KB) or a Not-in-KB (NIL) id in the case that system could not find any appropriate entry for that query. At the end, we have analyzed our result in different aspects. To disambiguate query name we apply a graph-based semantic similarity approach to extract the network of the semantic knowledge existing in the content of target document.Este documento es un resumen del trabajo realizado para la construccion de un sistema de Entity Linking (EL) destinado a desambiguar menciones de Entidades Nombradas (Named Entities, NE) que aparecen en un documento de referencia. La tarea de EL presenta una gran dificultad ya que cada entidad puede ser mencionada de varias maneras (sinonimia). Ademas cada mencion puede referirse a mas de una entidad (polisemia). Asi pues, se debe realizar un gran esfuerzo para hacer frente a estos retos. Nuestro sistema de EL lleva a cabo la desambiguacion de las menciones de NE en varias etapas. Para cada etapa hemos propuesto, implementado y evaluado varias aproximaciones. Hemos evaluado nuestro sistema de EL en el marco del TAC-KBP English EL evaluation framework. En este marco la evaluacion se realiza a partir de una entrada que consiste en un conjunto de consultas cada una de las cuales consta de un nombre (query name) que corresponde a una mencion objetivo cuya posicion en un documento de referencia se indica. La salida debe indicar a que entidad en una base de conocimiento (Knowledge Base, KB) corresponde la mencion. En caso de no existir un referente apropiado la respuesta sera Not-in-KB (NIL). La tesis concluye con un analisis pormenorizado de los resultados obtenidos en la evaluacion.Postprint (published version

    NCBO Ontology Recommender 2.0: An Enhanced Approach for Biomedical Ontology Recommendation

    Get PDF
    Biomedical researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs. To overcome this problem, in 2010 the National Center for Biomedical Ontology (NCBO) released the Ontology Recommender, which is a service that receives a biomedical text corpus or a list of keywords and suggests ontologies appropriate for referencing the indicated terms. We developed a new version of the NCBO Ontology Recommender. Called Ontology Recommender 2.0, it uses a new recommendation approach that evaluates the relevance of an ontology to biomedical text data according to four criteria: (1) the extent to which the ontology covers the input data; (2) the acceptance of the ontology in the biomedical community; (3) the level of detail of the ontology classes that cover the input data; and (4) the specialization of the ontology to the domain of the input data. Our evaluation shows that the enhanced recommender provides higher quality suggestions than the original approach, providing better coverage of the input data, more detailed information about their concepts, increased specialization for the domain of the input data, and greater acceptance and use in the community. In addition, it provides users with more explanatory information, along with suggestions of not only individual ontologies but also groups of ontologies. It also can be customized to fit the needs of different scenarios. Ontology Recommender 2.0 combines the strengths of its predecessor with a range of adjustments and new features that improve its reliability and usefulness. Ontology Recommender 2.0 recommends over 500 biomedical ontologies from the NCBO BioPortal platform, where it is openly available.Comment: 29 pages, 8 figures, 11 table
    • …
    corecore