Search CORE

2,204 research outputs found

Dutch parallel corpus : a multilingual annotated corpus

Author: Desmet Piet
Macken Lieve
Paulussen Hans
Rura Lidia
Trushkina Julia
Vandeweghe Willy
Publication venue
Publication date: 01/01/2007
Field of study

Ghent University Academic Bibliography

Dutch parallel corpus: a balanced parallel corpus for Dutch-English and Dutch-French

Author: FJ Och
G Sutter De
G Vanderbauwhede
Isabelle Delaere
L Macken
L Macken
Lieve Macken
M Kay
M Simard
MP Marcus
P Keirsbilck Van
PF Brown
R Moore
W Daelemans
WA Gale
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

status: publishe

Lirias

Crossref

Springer - Publisher Connector

Ghent University Academic Bibliography

Parallel Corpora in translator education

Author: Ruiz Yepes Guadalupe
Publication venue
Publication date: 01/01/2011
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Málaga

Multi-word expression-sensitive word alignment

Author: Graham Yvette
Maldonado Guerra Alfredo
Okita Tsuyoshi
Way Andy
Publication venue: Coling 2010 Organizing Committee
Publication date: 01/01/2010
Field of study

This paper presents a new word alignment method which incorporates knowledge about Bilingual Multi-Word Expressions (BMWEs). Our method of word alignment first extracts such BMWEs in a bidirectional way for a given corpus and then starts conventional word alignment, considering the properties of BMWEs in their grouping as well as their alignment links. We give partial annotation of alignment links as prior knowledge to the word alignment process; by replacing the maximum likelihood estimate in the M-step of the IBM Models with the Maximum A Posteriori (MAP) estimate, prior knowledge about BMWEs is embedded in the prior in this MAP estimate. In our experiments, we saw an improvement of 0.77 Bleu points absolute in JP–EN. Except for one case, our method gave better results than the method using only BMWEs grouping. Even though this paper does not directly address the issues in Cross-Lingual Information Retrieval (CLIR), it discusses an approach of direct relevance to the field. This approach could be viewed as the opposite of current trends in CLIR on semantic space that incorporate a notion of order in the bag-of-words model (e.g. co-occurences)

Irish Universities

DCU Online Research Access Service

Dutch parallel corpus: a multifunctional and multilingual corpus

Author: Desmet Piet
Macken Lieve
Paulussen Hans
Trushkina Julia
Vandeweghe Willy
Publication venue
Publication date: 01/01/2006
Field of study

Ghent University Academic Bibliography

In no uncertain terms : a dataset for monolingual and multilingual automatic term extraction from comparable corpora

Author: Hoste Veronique
Lefever Els
Rigouts Terryn Ayla
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Automatic term extraction is a productive field of research within natural language processing, but it still faces significant obstacles regarding datasets and evaluation, which require manual term annotation. This is an arduous task, made even more difficult by the lack of a clear distinction between terms and general language, which results in low inter-annotator agreement. There is a large need for well-documented, manually validated datasets, especially in the rising field of multilingual term extraction from comparable corpora, which presents a unique new set of challenges. In this paper, a new approach is presented for both monolingual and multilingual term annotation in comparable corpora. The detailed guidelines with different term labels, the domain- and language-independent methodology and the large volumes annotated in three different languages and four different domains make this a rich resource. The resulting datasets are not just suited for evaluation purposes but can also serve as a general source of information about terms and even as training data for supervised methods. Moreover, the gold standard for multilingual term extraction from comparable corpora contains information about term variants and translation equivalents, which allows an in-depth, nuanced evaluation

Ghent University Academic Bibliography

Improving the translation environment for professional translators

Author: Augustinus Liesbeth
Bulté Bram
Buysschaert Joost
Coppers Sven
Daems Joke
Heyman Geert
Hoste Veronique
Lefever Els
Luyten Kris
Macken Lieve
Moens Marie-Francine
Pelemans Joris
Rigouts Terryn Ayla
Steurs Frieda
Tezcan Arda
Van den Bergh Jan
van der Lek-Ciudin Iulianna
Van Eynde Frank
Vanallemeersch Tom
Vandeghinste Vincent
Verwimp Lyan
Wambacq Patrick
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

Multidisciplinary Digital Publishing Institute

Ghent University Academic Bibliography