1,014 research outputs found

    MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach

    Full text link
    Entity linking has recently been the subject of a significant body of research. Currently, the best performing approaches rely on trained mono-lingual models. Porting these approaches to other languages is consequently a difficult endeavor as it requires corresponding training data and retraining of the models. We address this drawback by presenting a novel multilingual, knowledge-based agnostic and deterministic approach to entity linking, dubbed MAG. MAG is based on a combination of context-based retrieval on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data sets and in 7 languages. Our results show that the best approach trained on English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse on datasets in other languages. MAG, on the other hand, achieves state-of-the-art performance on English datasets and reaches a micro F-measure that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc

    The Effect of Gender in the Publication Patterns in Mathematics

    Get PDF
    Despite the increasing number of women graduating in mathematics, a systemic gender imbalance persists and is signified by a pronounced gender gap in the distribution of active researchers and professors. Especially at the level of university faculty, women mathematicians continue being drastically underrepresented, decades after the first affirmative action measures have been put into place. A solid publication record is of paramount importance for securing permanent positions. Thus, the question arises whether the publication patterns of men and women mathematicians differ in a significant way. Making use of the zbMATH database, one of the most comprehensive metadata sources on mathematical publications, we analyze the scholarly output of ~150,000 mathematicians from the past four decades whose gender we algorithmically inferred. We focus on development over time, collaboration through coautorships, presumed journal quality and distribution of research topics -- factors known to have a strong impact on job perspectives. We report significant differences between genders which may put women at a disadvantage when pursuing an academic career in mathematics.Comment: 24 pages, 12 figure

    Provenance in Open Data Entity-Centric Aggregation

    Get PDF
    An increasing number of web services these days require combining data from several data providers into an aggregated database. Usually this aggregation is based on the linked data approach. On the other hand, the entity-centric model is a promising data model that outperforms the linked data approach because it solves the lack of explicit semantics and the semantic heterogeneity problems. However, current open data which is available on the web as raw datasets can not be used in the entity-centric model before processing them with an import process to extract the data elements and insert them correctly in the aggregated entity-centric database. It is essential to certify the quality of these imported data elements, especially the background knowledge part which acts as input to semantic computations, because the quality of this part affects directly the quality of the web services which are built on top of it. Furthermore, the aggregation of entities and their attribute values from different sources raises three problems: the need to trace the source of each element, the need to trace the links between entities which can be considered equivalent and the need to handle possible conflicts between different values when they are imported from various data sources. In this thesis, we introduce a new model to certify the quality of a back ground knowledge base which separates linguistic and language independent elements. We also present a pipeline to import entities from open data repositories to add the missing implicit semantics and to eliminate the semantic heterogeneity. Finally, we show how to trace the source of attribute values coming from different data providers; how to choose a strategy for handling possible conflicts between these values; and how to keep the links between identical entities which represent the same real world entity

    The Parallel Meaning Bank:A Framework for Semantically Annotating Multiple Languages

    Get PDF
    This paper gives a general description of the ideas behind the Parallel Meaning Bank, a framework with the aim to provide an easy way to annotate compositional semantics for texts written in languages other than English. The annotation procedure is semi-automatic, and comprises seven layers of linguistic information: segmentation, symbolisation, semantic tagging, word sense disambiguation, syntactic structure, thematic role labelling, and co-reference. New languages can be added to the meaning bank as long as the documents are based on translations from English, but also introduce new interesting challenges on the linguistics assumptions underlying the Parallel Meaning Bank.Comment: 13 pages, 5 figures, 1 tabl

    Knowledge-based Biomedical Data Science 2019

    Full text link
    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

    Linking named entities to Wikipedia

    Get PDF
    Natural language is fraught with problems of ambiguity, including name reference. A name in text can refer to multiple entities just as an entity can be known by different names. This thesis examines how a mention in text can be linked to an external knowledge base (KB), in our case, Wikipedia. The named entity linking (NEL) task requires systems to identify the KB entry, or Wikipedia article, that a mention refers to; or, if the KB does not contain the correct entry, return NIL. Entity linking systems can be complex and we present a framework for analysing their different components, which we use to analyse three seminal systems which are evaluated on a common dataset and we show the importance of precise search for linking. The Text Analysis Conference (TAC) is a major venue for NEL research. We report on our submissions to the entity linking shared task in 2010, 2011 and 2012. The information required to disambiguate entities is often found in the text, close to the mention. We explore apposition, a common way for authors to provide information about entities. We model syntactic and semantic restrictions with a joint model that achieves state-of-the-art apposition extraction performance. We generalise from apposition to examine local descriptions specified close to the mention. We add local description to our state-of-the-art linker by using patterns to extract the descriptions and matching against this restricted context. Not only does this make for a more precise match, we are also able to model failure to match. Local descriptions help disambiguate entities, further improving our state-of-the-art linker. The work in this thesis seeks to link textual entity mentions to knowledge bases. Linking is important for any task where external world knowledge is used and resolving ambiguity is fundamental to advancing research into these problems

    Coreference Resolution in Freeling 4.0

    Get PDF
    This paper presents the integration of RelaxCor into FreeLing. RelaxCor is a coreference resolution system based on constraint satisfaction that ranked second in the CoNLL-2011 shared task. FreeLing is an open-source library for NLP with more than fifteen years of existence and a widespread user community. We present the difficulties found in porting RelaxCor from a shared task scenario to a production enviroment, as well as the solutions devised. We present two strategies for this integration and a rough evaluation of the obtained resultsPeer ReviewedPostprint (published version
    corecore