Search CORE

40 research outputs found

The AMTEx approach in the medical document indexing and retrieval application

Author: Ananiadou
Angelos Hliaoutakis
Divita
Euripides G.M. Petrakis
Frantzi
Hliaoutakis
Jacquemin
Kaliope Zervanou
Li
Manning
Maynard
Salton
Yu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Taxonomic corpus-based concept summary generation for document annotation.

Author: A Hliaoutakis
AR Aronson
D Trieschnigg
G Giannopoulos
K Dramé
L Leal Bando
M Huang
R Knappe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/09/2017
Field of study

Semantic annotation is an enabling technology which links documents to concepts that unambiguously describe their content. Annotation improves access to document contents for both humans and software agents. However, the annotation process is a challenging task as annotators often have to select from thousands of potentially relevant concepts from controlled vocabularies. The best approaches to assist in this task rely on reusing the annotations of an annotated corpus. In the absence of a pre-annotated corpus, alternative approaches suffer due to insufficient descriptive texts for concepts in most vocabularies. In this paper, we propose an unsupervised method for recommending document annotations based on generating node descriptors from an external corpus. We exploit knowledge of the taxonomic structure of a thesaurus to ensure that effective descriptors (concept summaries) are generated for concepts. Our evaluation on recommending annotations show that the content that we generate effectively represents the concepts. Also, our approach outperforms those which rely on information from a thesaurus alone and is comparable with supervised approaches

Crossref

Open Access Institutional Repository at Robert Gordon University

NERC Open Research Archive

Yet Another Ranking Function for Automatic Multiword Term Extraction

Author: A. Barrón-Cedeño
A. Hliaoutakis
A. Ittoo
F. Rousseau
J.A. Lossio-Ventura
K. Frantzi
K. Kageura
L. Ji
M.S. Conrado
N.J. Eck Van
R. Blanco
T.. Noh
V. Stoykova
Y. Matsuo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

International audienceTerm extraction is an essential task in domain knowledge acquisition. We propose two new measures to extract multiword terms from a domain-specific text. The first measure is both linguistic and statistical based. The second measure is graph-based, allowing assessment of the importance of a multiword term of a domain. Existing measures often solve some problems related (but not completely) to term extraction, e.g., noise, silence, low frequency, large-corpora, complexity of the multiword term extraction process. Instead, we focus on managing the entire set of problems, e.g., detecting rare terms and overcoming the low frequency issue. We show that the two proposed measures outperform precision results previously reported for automatic multiword extraction by comparing them with the state-of-the-art reference measures

Recommended from our members

A framework for evaluating automatic indexing or classification in the context of retrieval

Author: Anderson
Aronson
Bainbridge
Beaulieu
Belkin
Blandford
Borlund
Braschler
Brenner
Buckley
Buckley
Chung
Cleverdon
Colosimo
Cooper
Davis
Fidel
Golub
Golub
Golub
Hersh
Hliaoutakis
Hosseini
Hripcsak
Huang
Iivonen
Ingwersen
Ingwersen
Kazai
Kekäläinen
Kim
Lalmas
Lancaster
Lancaster
Lewis
Liu
Lykke
Mai
Markey
Medelyan
Mladenic
Moens
Oard
Olson
Paynter
Plaunt
Purpura
Ribeiro-Neto
Roberts
Roitblat
Rolling
Rosenberg
Ruiz
Saracevic
Saracevic
Saracevic
Sebastiani
Silvester
Soergel
Soergel
Sormunen
Sparck Jones
Suomela
Svarre
Tonkin
Tsai
Venanzi
Voorhees
Publication venue: 'Wiley'
Publication date: 22/10/2015
Field of study

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. While some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The paper reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single “gold standard” method when evaluating indexing and retrieval and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on indexing, classification and approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard; evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance

City Research Online

Crossref

University of South Wales Research Explorer

VBN

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Linnéuniversitetets forskningsdatabas

Explore Bristol Research

Acknowledgements ix

Author: Angelos Hliaoutakis
Publication venue
Publication date
Field of study

CiteSeerX

Medical Document Indexing and Retrieval: AMTEx vs. NLM MMTx

Author: Angelos Hliaoutakis
Euripides G. M. Petrakis
Kalliopi Zervanou
Publication venue
Publication date: 01/01/2007
Field of study

AMTEx is a medical document indexing method, specifically designed for the automatic indexing of documents in large medical collections, such as MEDLINE, the premier bibliographic database of the U.S. National Library of Medicine (NLM). AMTEx combines MeSH, the terminological thesaurus resource of NLM, with a wellestablished method for term extraction, the C/NC-value method. The performance evaluation of two AMTEx configurations is measured against the current state-of-theart, the MMTx method in indexing and retrieval tasks in three experiments. In the first, a subset of MEDLINE (PMC) full document corpus was used for the indexing task. In the second and third, a subset of MEDLINE (OHSUMED) abstracts was used for indexing and retrieval respectively. The experimental results demonstrate that AMTEx achieves better precision in all tasks, in 50-20 % of the processing time compared to MMTx

CiteSeerX

Repository TU/e

The AMTEx approach in the medical document indexing and retrieval application

Author: Hliaoutakis Angelos
Petrakis Euripides G M
Zervanou K
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

\u3cp\u3eAMTEx is a medical document indexing method, specifically designed for the automatic indexing of documents in large medical collections, such as MEDLINE, the premier bibliographic database of the US National Library of Medicine (NLM). AMTEx combines MeSH, the terminological thesaurus resource of NLM, with a well-established method for extraction of terminology, the C/NC-value method. The performance evaluation of two AMTEx configurations is measured against the current state-of-the-art, the MetaMap Transfer (MMTx) method in four experiments, using two types of corpora: a subset of MEDLINE (PMC) full document corpus and a subset of MEDLINE (OHSUMED) abstracts, for each of the indexing and retrieval tasks, respectively. The experimental results demonstrate that AMTEx performs better in indexing in 20-50% of the processing time compared to MMTx, while for the retrieval task, AMTEx performs better in the full text (PMC) corpus.\u3c/p\u3

Repository TU/e

MedSearch: A retrieval system for medical information based on semantic similarity

Author: Angelos Hliaoutakis
Euripides G. M. Petrakis
Evangelos Milios
Giannis Varelas
Publication venue
Publication date: 01/01/2006
Field of study

Abstract. MedSearch 1 is a complete retrieval system for Medline, the premier bibliographic database of the U.S. National Library of Medicine (NLM). MedSearch implements SSRM, a novel information retrieval method for discovering similarities between documents containing semantically similar but not necessarily lexically similar terms.

CiteSeerX