Search CORE

1,014 research outputs found

MAG: A Multilingual, Knowledge-base Agnostic and Deterministic Entity Linking Approach

Author: Bryl Volha
Brümmer Martin
Consoli Sergio
Cucerzan Silviu
Devi Pooja
Erp Marieke Van
Ferreira Thiago Castro
Hoffart Johannes
Juan
Luo Gang
Nuzzolese Andrea-Giovanni
Röder Michael
Steinmetz Nadine
van Erp Marieke
Zhang Lei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/10/2017
Field of study

Entity linking has recently been the subject of a significant body of research. Currently, the best performing approaches rely on trained mono-lingual models. Porting these approaches to other languages is consequently a difficult endeavor as it requires corresponding training data and retraining of the models. We address this drawback by presenting a novel multilingual, knowledge-based agnostic and deterministic approach to entity linking, dubbed MAG. MAG is based on a combination of context-based retrieval on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data sets and in 7 languages. Our results show that the best approach trained on English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse on datasets in other languages. MAG, on the other hand, achieves state-of-the-art performance on English datasets and reaches a micro F-measure that is up to 0.6 higher than that of PBOH on non-English languages.Comment: Accepted in K-CAP 2017: Knowledge Capture Conferenc

arXiv.org e-Print Archive

Crossref

The Effect of Gender in the Publication Patterns in Mathematics

Author: Mihaljević-Brandt Helena
Santamaría Lucía
Tullney Marco
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

Despite the increasing number of women graduating in mathematics, a systemic gender imbalance persists and is signified by a pronounced gender gap in the distribution of active researchers and professors. Especially at the level of university faculty, women mathematicians continue being drastically underrepresented, decades after the first affirmative action measures have been put into place. A solid publication record is of paramount importance for securing permanent positions. Thus, the question arises whether the publication patterns of men and women mathematicians differ in a significant way. Making use of the zbMATH database, one of the most comprehensive metadata sources on mathematical publications, we analyze the scholarly output of ~150,000 mathematicians from the past four decades whose gender we algorithmically inferred. We focus on development over time, collaboration through coautorships, presumed journal quality and distribution of research topics -- factors known to have a strong impact on job perspectives. We report significant differences between genders which may put women at a disadvantage when pursuing an academic career in mathematics.Comment: 24 pages, 12 figure

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Repositorium für Naturwissenschaften und Technik

Provenance in Open Data Entity-Centric Aggregation

Author: B. G. Malm
K.-M. Persson
L.-E. Wernersson
v. Haartman M.
Publication venue: University of Trento
Publication date: 01/01/2013
Field of study

An increasing number of web services these days require combining data from several data providers into an aggregated database. Usually this aggregation is based on the linked data approach. On the other hand, the entity-centric model is a promising data model that outperforms the linked data approach because it solves the lack of explicit semantics and the semantic heterogeneity problems. However, current open data which is available on the web as raw datasets can not be used in the entity-centric model before processing them with an import process to extract the data elements and insert them correctly in the aggregated entity-centric database. It is essential to certify the quality of these imported data elements, especially the background knowledge part which acts as input to semantic computations, because the quality of this part affects directly the quality of the web services which are built on top of it. Furthermore, the aggregation of entities and their attribute values from different sources raises three problems: the need to trace the source of each element, the need to trace the links between entities which can be considered equivalent and the need to handle possible conflicts between different values when they are imported from various data sources. In this thesis, we introduce a new model to certify the quality of a back ground knowledge base which separates linguistic and language independent elements. We also present a pipeline to import entities from open data repositories to add the missing implicit semantics and to eliminate the semantic heterogeneity. Finally, we show how to trace the source of attribute values coming from different data providers; how to choose a strategy for handling possible conflicts between these values; and how to keep the links between identical entities which represent the same real world entity

Publikationer från KTH

Crossref

Lund University Publications

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Unitn-eprints PhD

The Parallel Meaning Bank:A Framework for Semantically Annotating Multiple Languages

Author: Abzianidze Lasha
Bos Johan
van Noord Rik
Wang Chunliu
Publication venue
Publication date: 01/01/2020
Field of study

This paper gives a general description of the ideas behind the Parallel Meaning Bank, a framework with the aim to provide an easy way to annotate compositional semantics for texts written in languages other than English. The annotation procedure is semi-automatic, and comprises seven layers of linguistic information: segmentation, symbolisation, semantic tagging, word sense disambiguation, syntactic structure, thematic role labelling, and co-reference. New languages can be added to the meaning bank as long as the documents are based on translations from English, but also introduce new interesting challenges on the linguistics assumptions underlying the Parallel Meaning Bank.Comment: 13 pages, 5 figures, 1 tabl

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Utrecht University Repository

Dissertations of the University of Groningen

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Knowledge-based Biomedical Data Science 2019

Author: Callahan Tiffany J.
Hunter Lawrence E.
Pielke-Lombardo Harrison
Tripodi Ignacio J.
Publication venue
Publication date: 08/10/2019
Field of study

Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

arXiv.org e-Print Archive

Linking named entities to Wikipedia

Author: Radford William Edward John
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2015
Field of study

Natural language is fraught with problems of ambiguity, including name reference. A name in text can refer to multiple entities just as an entity can be known by different names. This thesis examines how a mention in text can be linked to an external knowledge base (KB), in our case, Wikipedia. The named entity linking (NEL) task requires systems to identify the KB entry, or Wikipedia article, that a mention refers to; or, if the KB does not contain the correct entry, return NIL. Entity linking systems can be complex and we present a framework for analysing their different components, which we use to analyse three seminal systems which are evaluated on a common dataset and we show the importance of precise search for linking. The Text Analysis Conference (TAC) is a major venue for NEL research. We report on our submissions to the entity linking shared task in 2010, 2011 and 2012. The information required to disambiguate entities is often found in the text, close to the mention. We explore apposition, a common way for authors to provide information about entities. We model syntactic and semantic restrictions with a joint model that achieves state-of-the-art apposition extraction performance. We generalise from apposition to examine local descriptions specified close to the mention. We add local description to our state-of-the-art linker by using patterns to extract the descriptions and matching against this restricted context. Not only does this make for a more precise match, we are also able to model failure to match. Local descriptions help disambiguate entities, further improving our state-of-the-art linker. The work in this thesis seeks to link textual entity mentions to knowledge bases. Linking is important for any task where external world knowledge is used and resolving ambiguity is fundamental to advancing research into these problems

Sydney eScholarship

Coreference Resolution in Freeling 4.0

Author: Marimon Montserrat
Padró Lluís
Turmo Borras Jorge
Publication venue
Publication date: 01/01/2018
Field of study

This paper presents the integration of RelaxCor into FreeLing. RelaxCor is a coreference resolution system based on constraint satisfaction that ranked second in the CoNLL-2011 shared task. FreeLing is an open-source library for NLP with more than fifteen years of existence and a widespread user community. We present the difficulties found in porting RelaxCor from a shared task scenario to a production enviroment, as well as the solutions devised. We present two strategies for this integration and a rough evaluation of the obtained resultsPeer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC