Search CORE

6 research outputs found

Terminology Extraction and Term Ranking for Standardizing Term Banks

Author: Foo Jody
Merkel Magnus
Publication venue
Publication date: 23/05/2007
Field of study

Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit. University of Tartu, Tartu, 2007. ISBN 978-9985-4-0513-0 (online) ISBN 978-9985-4-0514-7 (CD-ROM) pp. 349-354

DSpace at Tartu University Library

Conference Program

Author
Publication venue
Publication date: 23/05/2007
Field of study

DSpace at Tartu University Library

Author
Publication venue
Publication date: 29/05/2007
Field of study

DSpace at Tartu University Library

Creating a medical dictionary using word alignment: The influence of sources and resources

Author: FJ Och
Hans Åhlfeldt
Håkan Petersson
ID Melamed
J Foo
L Ahrenberg
LR Dice
M Merkel
M Merkel
Magnus Merkel
Mikael Nyström
MT Pazienza
Nordic Medico-Statistical Committee
P Tapanainen
PF Brown
Socialstyrelsen
Socialstyrelsen
Socialstyrelsen
Socialstyrelsen
WA Gale
World Health Organization
World Health Organization
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, ICF, NCSP, KSH97-P and parts of MeSH and how the terminology systems and type of resources influence the quality. Methods We automatically word aligned the terminology systems using static resources, like dictionaries, statistical resources, like statistically derived dictionaries, and training resources, which were generated from manual word alignment. We varied which part of the terminology systems that we used to generate the resources, which parts that we word aligned and which types of resources we used in the alignment process to explore the influence the different terminology systems and resources have on the recall and precision. After the analysis, we used the best configuration of the automatic word alignment for generation of candidate term pairs. We then manually verified the candidate term pairs and included the correct pairs in an English-Swedish dictionary. Results The results indicate that more resources and resource types give better results but the size of the parts used to generate the resources only partly affects the quality. The most generally useful resources were generated from ICD-10 and resources generated from MeSH were not as general as other resources. Systematic inter-language differences in the structure of the terminology system rubrics make the rubrics harder to align. Manually created training resources give nearly as good results as a union of static resources, statistical resources and training resources and noticeably better results than a union of static resources and statistical resources. The verified English-Swedish dictionary contains 24,000 term pairs in base forms. Conclusion More resources give better results in the automatic word alignment, but some resources only give small improvements. The most important type of resource is training and the most general resources were generated from ICD-10.</p

Publikationer från Linköpings universitet

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Terminology extraction and term ranking for standardizing term banks

Author: Foo Jody
Merkel Magnus
Publication venue: Tartu, Estonia : University of Tartu
Publication date: 01/01/2007
Field of study

This paper presents how word alignment techniques could be used for building standardized term banks. It is shown that time and effort could be saved by a relatively simple evaluation metric based on frequency data from term pairs, and source and target distributions inside the alignment results. The proposed Q-value metric is shown to outperform other tested metrics such as Dice's coefficient, and simple pair frequency.

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

DSpace at Tartu University Library

Foo J: Terminology extraction and term ranking for standardizing term banks: May 25-26; Tartu

Author: Magnus Merkel
Publication venue
Publication date
Field of study

CiteSeerX