Search CORE

780 research outputs found

Named Entity Extraction and Disambiguation: The Reinforcement Effect.

Author: Habib Mena B.
Keulen Maurice van
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2011
Field of study

Named entity extraction and disambiguation have received much attention in recent years. Typical fields addressing these topics are information retrieval, natural language processing, and semantic web. Although these topics are highly dependent, almost no existing works examine this dependency. It is the aim of this paper to examine the dependency and show how one affects the other, and vice versa. We conducted experiments with a set of descriptions of holiday homes with the aim to extract and disambiguate toponyms as a representative example of named entities. We experimented with three approaches for disambiguation with the purpose to infer the country of the holiday home. We examined how the effectiveness of extraction influences the effectiveness of disambiguation, and reciprocally, how filtering out ambiguous names (an activity that depends on the disambiguation process) improves the effectiveness of extraction. Since this, in turn, may improve the effectiveness of disambiguation again, it shows that extraction and disambiguation may reinforce each other.\u

CiteSeerX

Maastricht University Research Portal

University of Twente Research Information

Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation

Author: A Jimeno
A Jimeno-Yepes
A Schwartz
A Yeh
Alan R Aronson
Antonio J Jimeno-Yepes
B McInnes
B McInnes
Bridget T McInnes
C Leacock
C Manning
G Leroy
H Liu
H Liu
H Liu
J Fan
L Hirschman
M Stevenson
M Weeber
P Pezik
R Leaman
S Gaudan
S Humphrey
T Pedersen
WA Gale
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Evaluation of Word Sense Disambiguation (WSD) methods in the biomedical domain is difficult because the available resources are either too small or too focused on specific types of entities (e.g. diseases or genes). We present a method that can be used to automatically develop a WSD test collection using the Unified Medical Language System (UMLS) Metathesaurus and the manual MeSH indexing of MEDLINE. We demonstrate the use of this method by developing such a data set, called MSH WSD. Methods In our method, the Metathesaurus is first screened to identify ambiguous terms whose possible senses consist of two or more MeSH headings. We then use each ambiguous term and its corresponding MeSH heading to extract MEDLINE citations where the term and only one of the MeSH headings co-occur. The term found in the MEDLINE citation is automatically assigned the UMLS CUI linked to the MeSH heading. Each instance has been assigned a UMLS Concept Unique Identifier (CUI). We compare the characteristics of the MSH WSD data set to the previously existing NLM WSD data set. Results The resulting MSH WSD data set consists of 106 ambiguous abbreviations, 88 ambiguous terms and 9 which are a combination of both, for a total of 203 ambiguous entities. For each ambiguous term/abbreviation, the data set contains a maximum of 100 instances per sense obtained from MEDLINE. We evaluated the reliability of the MSH WSD data set using existing knowledge-based methods and compared their performance to that of the results previously obtained by these algorithms on the pre-existing data set, NLM WSD. We show that the knowledge-based methods achieve different results but keep their relative performance except for the Journal Descriptor Indexing (JDI) method, whose performance is below the other methods. Conclusions The MSH WSD data set allows the evaluation of WSD algorithms in the biomedical domain. Compared to previously existing data sets, MSH WSD contains a larger number of biomedical terms/abbreviations and covers the largest set of UMLS Semantic Types. Furthermore, the MSH WSD data set has been generated automatically reusing already existing annotations and, therefore, can be regenerated from subsequent UMLS versions.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A word sense disambiguation corpus for Urdu

Author: Nawab Rao Muhammad Adeel
Rayson Paul
Saeed Ali
Stevenson Mark
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2019
Field of study

The aim of word sense disambiguation (WSD) is to correctly identify the meaning of a word in context. All natural languages exhibit word sense ambiguities and these are often hard to resolve automatically. Consequently WSD is considered an important problem in natural language processing (NLP). Standard evaluation resources are needed to develop, evaluate and compare WSD methods. A range of initiatives have lead to the development of benchmark WSD corpora for a wide range of languages from various language families. However, there is a lack of benchmark WSD corpora for South Asian languages including Urdu, despite there being over 300 million Urdu speakers and a large amounts of Urdu digital text available online. To address that gap, this study describes a novel benchmark corpus for the Urdu Lexical Sample WSD task. This corpus contains 50 target words (30 nouns, 11 adjectives, and 9 verbs). A standard, manually crafted dictionary called Urdu Lughat is used as a sense inventory. Four baseline WSD approaches were applied to the corpus. The results show that the best performance was obtained using a simple Bag of Words approach. To encourage NLP research on the Urdu language the corpus is freely available to the research community

Lancaster E-Prints

A word sense disambiguation corpus for Urdu

Author: A Daud
A McEnery
A Naseer
AI Arieff
Ali Saeed
BD Prasad
E McKean
H Schütze
J Jiang
JP Gee
M Abid
M Anand Kumar
M Sharjeel
M Sokolova
Mark Stevenson
N Mishra
NS Altman
P Edmonds
Paul Rayson
R Lior
R Navigli
Rao Muhammad Adeel Nawab
S Landes
SN Khan
SZ Arif
T Sreeganesh
UD Board
WN Francis
WS McCulloch
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/09/2019
Field of study

Crossref

White Rose Research Online

Japanese all-words WSD system using the Kyoto Text Analysis ToolKit

Author: Komiya Kanako
Mori Shinsuke
Sasaki Minoru
Shinnou Hiroyuki
Publication venue: the National University (Philippines)
Publication date: 01/01/2017
Field of study

Waseda University Repository

A Learning-Based Approach for Biomedical Word Sense Disambiguation

Author: Al-Mubaid Hisham
Gungu Sandeep
Publication venue: The Scientific World Journal
Publication date: 01/01/2012
Field of study

In the biomedical domain, word sense ambiguity is a widely spread problem with bioinformatics research effort devoted to it being not commensurate and allowing for more development. This paper presents and evaluates a learning-based approach for sense disambiguation within the biomedical domain. The main limitation with supervised methods is the need for a corpus of manually disambiguated instances of the ambiguous words. However, the advances in automatic text annotation and tagging techniques with the help of the plethora of knowledge sources like ontologies and text literature in the biomedical domain will help lessen this limitation. The proposed method utilizes the interaction model (mutual information) between the context words and the senses of the target word to induce reliable learning models for sense disambiguation. The method has been evaluated with the benchmark dataset NLM-WSD with various settings and in biomedical entity species disambiguation. The evaluation results showed that the approach is very competitive and outperforms recently reported results of other published techniques

Crossref

Directory of Open Access Journals

PubMed Central