Search CORE

4 research outputs found

The viability of automatic indexing for biomedical literature

Author: Sen B.A.
Publication venue: ISHIMR
Publication date: 01/09/2011
Field of study

Automatic indexing is evaluated as an aid/replacement to manual indexing for biomedical literature. Manual indexing is costly and labour intensive. Technological innovations have the potential to increase efficiency and reduce costs. British Library produces a bibliographic database of allied and complementary medicine (AMED). This study compares articles which have been indexed manually for AMED with the same documents submitted to an automated indexing tool. The indexing tool selected was Helping Interdisciplinary Vocabulary Engineering, (HIVE) which is a jointly funded project by the University of North Carolina and the National Evolutionary Synthesis Center, North Carolina. A random selection of 100 records from a total of 1059 articles was selected. Each manually indexed document was compared with results returned by HIVE. Data analysis was made using SPSS. Results showed that HIVE does not provide a suitable replacement for the skills of a human indexer. Continued development of automatic indexing tools is recommended

White Rose Research Online

Knowledge-based biomedical word sense disambiguation: comparison of approaches

Author: A Aronson
A Aronson
A Aronson
A Jimeno-Yepes
A Jimeno-Yepes
Alan R Aronson
Antonio J Jimeno-Yepes
B McInnes
C Leacock
D Alexopoulou
D Demner-Fushman
D Rebholz-Schuhmann
E Agirre
E Agirre
F Vasilescu
G Leroy
J Mork
M Joshi
M Lesk
M Schuemie
M Stevenson
M Weeber
S Gaudan
S Humphrey
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Word sense disambiguation (WSD) algorithms attempt to select the proper sense of ambiguous terms in text. Resources like the UMLS provide a reference thesaurus to be used to annotate the biomedical literature. Statistical learning approaches have produced good results, but the size of the UMLS makes the production of training data infeasible to cover all the domain. Methods We present research on existing WSD approaches based on knowledge bases, which complement the studies performed on statistical learning. We compare four approaches which rely on the UMLS Metathesaurus as the source of knowledge. The first approach compares the overlap of the context of the ambiguous word to the candidate senses based on a representation built out of the definitions, synonyms and related terms. The second approach collects training data for each of the candidate senses to perform WSD based on queries built using monosemous synonyms and related terms. These queries are used to retrieve MEDLINE citations. Then, a machine learning approach is trained on this corpus. The third approach is a graph-based method which exploits the structure of the Metathesaurus network of relations to perform unsupervised WSD. This approach ranks nodes in the graph according to their relative structural importance. The last approach uses the semantic types assigned to the concepts in the Metathesaurus to perform WSD. The context of the ambiguous word and semantic types of the candidate concepts are mapped to Journal Descriptors. These mappings are compared to decide among the candidate concepts. Results are provided estimating accuracy of the different methods on the WSD test collection available from the NLM. Conclusions We have found that the last approach achieves better results compared to the other methods. The graph-based approach, using the structure of the Metathesaurus network to estimate the relevance of the Metathesaurus concepts, does not perform well compared to the first two methods. In addition, the combination of methods improves the performance over the individual approaches. On the other hand, the performance is still below statistical learning trained on manually produced data and below the maximum frequency sense baseline. Finally, we propose several directions to improve the existing methods and to improve the Metathesaurus to be more effective in WSD.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

UMLS Content Views Appropriate for NLP Processing of the Biomedical Literature vs. Clinical Text

Author: Alan R. Aronson
Dina Demner-fushman
James G. Mork
Sonya E. Shooshan
Publication venue
Publication date: 31/03/2010
Field of study

Identification of medical terms in free text is a first step in such Natural Language Processing (NLP) tasks as automatic indexing of biomedical literature and extraction of patients ’ problem lists from the text of clinical notes. Many tools developed to perform these tasks use biomedical knowledge encoded in the Unified Medical Language System (UMLS) Metathesaurus. Continuing exploration of automatic approaches to creation of subsets (UMLS content views) which can support NLP processing of both the biomedical literature and clinical text, we found suppression of highly ambiguous terms in the conservative AutoFilter data view can replace manual filtering for literature applications and suppression of two character mappings in the same data view achieves acceptable performance in clinical applications. Backgroun

CiteSeerX

Elsevier - Publisher Connector

Learning Clinical Data Representations for Machine Learning

Author: Sulieman Lina Mahmoud
Publication venue: VANDERBILT
Publication date
Field of study