7,764 research outputs found
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Rude waiter but mouthwatering pastries! An exploratory study into Dutch aspect-based sentiment analysis
The fine-grained task of automatically detecting all sentiment expressions within a given document and the aspects to which they refer is known as aspect-based sentiment analysis. In this paper we present the first full aspect-based sentiment analysis pipeline for Dutch
and apply it to customer reviews. To this purpose, we collected reviews from two different domains, i.e. restaurant and smartphone reviews. Both corpora have been manually annotated using newly developed guidelines that comply to standard practices in the field. For our experimental pipeline we perceive aspect-based sentiment analysis as a task consisting of three main subtasks which have to be tackled incrementally: aspect term extraction, aspect category classification and polarity classification. First experiments on our Dutch restaurant corpus reveal that this is indeed a feasible approach that yields promising results
Adapting a general parser to a sublanguage
In this paper, we propose a method to adapt a general parser (Link Parser) to
sublanguages, focusing on the parsing of texts in biology. Our main proposal is
the use of terminology (identication and analysis of terms) in order to reduce
the complexity of the text to be parsed. Several other strategies are explored
and finally combined among which text normalization, lexicon and
morpho-guessing module extensions and grammar rules adaptation. We compare the
parsing results before and after these adaptations
Terminology Extraction for and from Communications in Multi-disciplinary Domains
Terminology extraction generally refers to methods and systems for identifying term candidates in a uni-disciplinary and uni-lingual
environment such as engineering, medical, physical and geological sciences, or administration, business and leisure. However, as
human enterprises get more and more complex, it has become increasingly important for teams in one discipline to collaborate with
others from not only a non-cognate discipline but also speaking a different language. Disaster mitigation and recovery, and conflict
resolution are amongst the areas where there is a requirement to use standardised multilingual terminology for communication. This
paper presents a feasibility study conducted to build terminology (and ontology) in the domain of disaster management and is part of the
broader work conducted for the EU project Sland \ub4 ail (FP7 607691). We have evaluated CiCui (for Chinese name \ub4 \u8bcd\u8403, which translates to
words gathered), a corpus-based text analytic system that combine frequency, collocation and linguistic analyses to extract candidates
terminologies from corpora comprised of domain texts from diverse sources. CiCui was assessed against four terminology extraction
systems and the initial results show that it has an above average precision in extracting terms
Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration
Cross-language information retrieval (CLIR), where queries and documents are
in different languages, has of late become one of the major topics within the
information retrieval community. This paper proposes a Japanese/English CLIR
system, where we combine a query translation and retrieval modules. We
currently target the retrieval of technical documents, and therefore the
performance of our system is highly dependent on the quality of the translation
of technical terms. However, the technical term translation is still
problematic in that technical terms are often compound words, and thus new
terms are progressively created by combining existing base words. In addition,
Japanese often represents loanwords based on its special phonogram.
Consequently, existing dictionaries find it difficult to achieve sufficient
coverage. To counter the first problem, we produce a Japanese/English
dictionary for base words, and translate compound words on a word-by-word
basis. We also use a probabilistic method to resolve translation ambiguity. For
the second problem, we use a transliteration method, which corresponds words
unlisted in the base word dictionary to their phonetic equivalents in the
target language. We evaluate our system using a test collection for CLIR, and
show that both the compound word translation and transliteration methods
improve the system performance
Knowledge-based best of breed approach for automated detection of clinical events based on German free text digital hospital discharge letters
OBJECTIVES:
The secondary use of medical data contained in electronic medical records, such as hospital discharge letters, is a valuable resource for the improvement of clinical care (e.g. in terms of medication safety) or for research purposes. However, the automated processing and analysis of medical free text still poses a huge challenge to available natural language processing (NLP) systems. The aim of this study was to implement a knowledge-based best of breed approach, combining a terminology server with integrated ontology, a NLP pipeline and a rules engine.
METHODS:
We tested the performance of this approach in a use case. The clinical event of interest was the particular drug-disease interaction "proton-pump inhibitor [PPI] use and osteoporosis". Cases were to be identified based on free text digital discharge letters as source of information. Automated detection was validated against a gold standard.
RESULTS:
Precision of recognition of osteoporosis was 94.19%, and recall was 97.45%. PPIs were detected with 100% precision and 97.97% recall. The F-score for the detection of the given drug-disease-interaction was 96,13%.
CONCLUSION:
We could show that our approach of combining a NLP pipeline, a terminology server, and a rules engine for the purpose of automated detection of clinical events such as drug-disease interactions from free text digital hospital discharge letters was effective. There is huge potential for the implementation in clinical and research contexts, as this approach enables analyses of very high numbers of medical free text documents within a short time period
A linguistically-driven methodology for detecting impending and unfolding emergencies from social media messages
Natural disasters have demonstrated the crucial role of social media before, during and after emergencies
(Haddow & Haddow 2013). Within our EU project Sland \ub4 ail, we aim to ethically improve \ub4
the use of social media in enhancing the response of disaster-related agen-cies. To this end, we
have collected corpora of social and formal media to study newsroom communication of emergency
management organisations in English and Italian. Currently, emergency management agencies
in English-speaking countries use social media in different measure and different degrees,
whereas Italian National Protezione Civile only uses Twitter at the moment. Our method is developed
with a view to identifying communicative strategies and detecting sentiment in order to
distinguish warnings from actual disasters and major from minor disasters. Our linguistic analysis
uses humans to classify alert/warning messages or emer-gency response and mitigation ones based
on the terminology used and the sentiment expressed. Results of linguistic analysis are then used
to train an application by tagging messages and detecting disaster- and/or emergency-related terminology
and emotive language to simulate human rating and forward information to an emergency
management system
- …