Search CORE

6 research outputs found

Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts

Author: A Jimeno-Yepes
A Jimeno-Yepes
Alan R Aronson
Alberto Díaz
Antonio J Jimeno-Yepes
AR Aronson
AR Aronson
B McInnes
BT McInnes
C Leacock
CY Lin
CY Lin
E Agirre
E Agirre
F Martínez
F Vasilescu
G Erkan
I Mani
J Carrillo de Albornoz
J Gómez
J Kupiec
L Hunter
L Plaza
L Plaza
Laura Plaza
LH Reeve
LH Reeve
M Apidianaki
M Apidianaki
M Fiszman
M Jaoua
M Joshi
M Lesk
M Schuemie
M Stevenson
M Weeber
M Weeber
R Barzilay
R Mihalcea
S Brin
S Teufel
SD Afantenos
SE Shooshan
SM Humphrey
TC Rindflesch
Z Shi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

ShotgunWSD: An unsupervised algorithm for global word sense disambiguation inspired by DNA sequencing

Author: Butnaru Andrei M.
Hristea Florentina
Ionescu Radu Tudor
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we present a novel unsupervised algorithm for word sense disambiguation (WSD) at the document level. Our algorithm is inspired by a widely-used approach in the field of genetics for whole genome sequencing, known as the Shotgun sequencing technique. The proposed WSD algorithm is based on three main steps. First, a brute-force WSD algorithm is applied to short context windows (up to 10 words) selected from the document in order to generate a short list of likely sense configurations for each window. In the second step, these local sense configurations are assembled into longer composite configurations based on suffix and prefix matching. The resulted configurations are ranked by their length, and the sense of each word is chosen based on a voting scheme that considers only the top k configurations in which the word appears. We compare our algorithm with other state-of-the-art unsupervised WSD algorithms and demonstrate better performance, sometimes by a very large margin. We also show that our algorithm can yield better performance than the Most Common Sense (MCS) baseline on one data set. Moreover, our algorithm has a very small number of parameters, is robust to parameter tuning, and, unlike other bio-inspired methods, it gives a deterministic solution (it does not involve random choices).Comment: In Proceedings of EACL 201

arXiv.org e-Print Archive

Crossref

Text summarization in the biomedical domain: A systematic review of recent research

Author: Bian Jiantao
Del Fiol Guilherme
Fiszman Marcelo
Jonnalagadda Siddhartha
Mishra Rashmi
Mostafa Javed
Weir Charlene R.
Publication venue
Publication date: 01/01/2014
Field of study

The amount of information for clinicians and clinical researchers is growing exponentially. Text summarization reduces information as an attempt to enable users to find and understand relevant source texts more quickly and effortlessly. In recent years, substantial research has been conducted to develop and evaluate various summarization techniques in the biomedical domain. The goal of this study was to systematically review recent published research on summarization of textual documents in the biomedical domain

Elsevier - Publisher Connector

Crossref

PubMed Central

Carolina Digital Repository

Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts

Author: Aronson AR
Diaz A
Jimeno-Yepes AJ
Plaza L
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/08/2011
Field of study

BACKGROUND: Word sense disambiguation (WSD) attempts to solve lexical ambiguities by identifying the correct meaning of a word based on its context. WSD has been demonstrated to be an important step in knowledge-based approaches to automatic summarization. However, the correlation between the accuracy of the WSD methods and the summarization performance has never been studied. RESULTS: We present three existing knowledge-based WSD approaches and a graph-based summarizer. Both the WSD approaches and the summarizer employ the Unified Medical Language System (UMLS) Metathesaurus as the knowledge source. We first evaluate WSD directly, by comparing the prediction of the WSD methods to two reference sets: the NLM WSD dataset and the MSH WSD collection. We next apply the different WSD methods as part of the summarizer, to map documents onto concepts in the UMLS Metathesaurus, and evaluate the summaries that are generated. The results obtained by the different methods in both evaluations are studied and compared. CONCLUSIONS: It has been found that the use of WSD techniques has a positive impact on the results of our graph-based summarizer, and that, when both the WSD and summarization tasks are assessed over large and homogeneous evaluation collections, there exists a correlation between the overall results of the WSD and summarization tasks. Furthermore, the best WSD algorithm in the first task tends to be also the best one in the second. However, we also found that the improvement achieved by the summarizer is not directly correlated with the WSD performance. The most likely reason is that the errors in disambiguation are not equally important but depend on the relative salience of the different concepts in the document to be summarized

University of Melbourne Institutional Repository

Studying the correlation between different word sense disambiguation methods and summarization effectiveness in biomedical texts

Author: Aronson Alan R
Díaz Alberto
Jimeno-Yepes Antonio J
Plaza Laura
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2011
Field of study

Abstract Background Word sense disambiguation (WSD) attempts to solve lexical ambiguities by identifying the correct meaning of a word based on its context. WSD has been demonstrated to be an important step in knowledge-based approaches to automatic summarization. However, the correlation between the accuracy of the WSD methods and the summarization performance has never been studied. Results We present three existing knowledge-based WSD approaches and a graph-based summarizer. Both the WSD approaches and the summarizer employ the Unified Medical Language System (UMLS) Metathesaurus as the knowledge source. We first evaluate WSD directly, by comparing the prediction of the WSD methods to two reference sets: the NLM WSD dataset and the MSH WSD collection. We next apply the different WSD methods as part of the summarizer, to map documents onto concepts in the UMLS Metathesaurus, and evaluate the summaries that are generated. The results obtained by the different methods in both evaluations are studied and compared. Conclusions It has been found that the use of WSD techniques has a positive impact on the results of our graph-based summarizer, and that, when both the WSD and summarization tasks are assessed over large and homogeneous evaluation collections, there exists a correlation between the overall results of the WSD and summarization tasks. Furthermore, the best WSD algorithm in the first task tends to be also the best one in the second. However, we also found that the improvement achieved by the summarizer is not directly correlated with the WSD performance. The most likely reason is that the errors in disambiguation are not equally important but depend on the relative salience of the different concepts in the document to be summarized.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central