Search CORE

6 research outputs found

Improving Statistical Machine Translation in the Medical Domain using the Unified Medical Language system

Author: Eck Matthias
Vogel Stephan
Waibel Alex
Publication venue: Association for Computational Linguistics
Publication date: 03/01/2024
Field of study

Texts from the medical domain are an important task for natural language processing. This paper investigates the usefulness of a large medical database (the Unified Medical Language System) for the translation of dialogues between doctors and patients using a statistical machine translation system. We are able to show that the extraction of a large dictionary and the usage of semantic type information to generalize the training data significantly improves the translation performance

KITopen

Adaptation of machine translation for multilingual information retrieval in the medical domain

Author: Dušek Ondřej
Goeuriot Lorraine
Hajič Jan
Hlaváčová Jaroslava
Jones Gareth J.F.
Kelly Liadh
Leveling Johannes
Mareček David
Novák Michal
Pecina Pavel
Popel Martin
Rosa Rudolf
Tamchyna Aleš
Urešová Zdeňka
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Objective. We investigate machine translation (MT) of user search queries in the context of cross-lingual information retrieval (IR) in the medical domain. The main focus is on techniques to adapt MT to increase translation quality; however, we also explore MT adaptation to improve eectiveness of cross-lingual IR. Methods and Data. Our MT system is Moses, a state-of-the-art phrase-based statistical machine translation system. The IR system is based on the BM25 retrieval model implemented in the Lucene search engine. The MT techniques employed in this work include in-domain training and tuning, intelligent training data selection, optimization of phrase table configuration, compound splitting, and exploiting synonyms as translation variants. The IR methods include morphological normalization and using multiple translation variants for query expansion. The experiments are performed and thoroughly evaluated on three language pairs: Czech–English, German–English, and French–English. MT quality is evaluated on data sets created within the Khresmoi project and IR eectiveness is tested on the CLEF eHealth 2013 data sets. Results. The search query translation results achieved in our experiments are outstanding – our systems outperform not only our strong baselines, but also Google Translate and Microsoft Bing Translator in direct comparison carried out on all the language pairs. The baseline BLEU scores increased from 26.59 to 41.45 for Czech–English, from 23.03 to 40.82 for German–English, and from 32.67 to 40.82 for French–English. This is a 55% improvement on average. In terms of the IR performance on this particular test collection, a significant improvement over the baseline is achieved only for French–English. For Czech–English and German–English, the increased MT quality does not lead to better IR results. Conclusions. Most of the MT techniques employed in our experiments improve MT of medical search queries. Especially the intelligent training data selection proves to be very successful for domain adaptation of MT. Certain improvements are also obtained from German compound splitting on the source language side. Translation quality, however, does not appear to correlate with the IR performance – better translation does not necessarily yield better retrieval. We discuss in detail the contribution of the individual techniques and state-of-the-art features and provide future research directions

Crossref

Hal - Université Grenoble Alpes

Irish Universities

DCU Online Research Access Service

Biblio at Institute of Formal and Applied Linguistics

Improving statistical machine translation in the medical domain using the unified medical language system

Author: Alex Waibel
Matthias Eck
Stephan Vogel
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2004
Field of study

CiteSeerX

Crossref

Developing Deployable Spoken Language Translation Systems given Limited Resources

Author: Eck Matthias
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2008
Field of study

Approaches are presented that support the deployment of spoken language translation systems. Newly developed methods allow low cost portability to new language pairs. Proposed translation model pruning techniques achieve a high translation performance even in low memory situations. The named entity and specialty vocabulary coverage, particularly on small and mobile devices, is targeted to an individual user by translation model personalization

KITopen

A Quadruple-Based Text Analysis System for History and Philosophy of Science

Author
Publication venue
Publication date: 01/01/2014
Field of study

abstract: Computational tools in the digital humanities often either work on the macro-scale, enabling researchers to analyze huge amounts of data, or on the micro-scale, supporting scholars in the interpretation and analysis of individual documents. The proposed research system that was developed in the context of this dissertation ("Quadriga System") works to bridge these two extremes by offering tools to support close reading and interpretation of texts, while at the same time providing a means for collaboration and data collection that could lead to analyses based on big datasets. In the field of history of science, researchers usually use unstructured data such as texts or images. To computationally analyze such data, it first has to be transformed into a machine-understandable format. The Quadriga System is based on the idea to represent texts as graphs of contextualized triples (or quadruples). Those graphs (or networks) can then be mathematically analyzed and visualized. This dissertation describes two projects that use the Quadriga System for the analysis and exploration of texts and the creation of social networks. Furthermore, a model for digital humanities education is proposed that brings together students from the humanities and computer science in order to develop user-oriented, innovative tools, methods, and infrastructures.Dissertation/ThesisDoctoral Dissertation Biology 201

ASU Digital Repository