Search CORE

12 research outputs found

Formemes in English-Czech Deep Syntactic MT

Author: Dušek Ondřej
Majliš Martin
Mareček David
Novák Michal
Popel Martin
Žabokrtský Zdeněk
Publication venue
Publication date: 01/01/2012
Field of study

One of the most notable recent improvements of the TectoMT English-to-Czech translation is a systematic and theoretically supported revision of formemes—the annotation of morpho-syntactic features of content words in deep dependency syntactic structures based on the Prague tectogrammatics theory. Our modifications aim at reducing data sparsity, increasing consistency across languages and widening the usage area of this markup. Formemes can be used not only in MT, but in various other NLP tasks

Biblio at Institute of Formal and Applied Linguistics

New Language Pairs in TectoMT

Author: Dušek Ondřej
Gomes Luís
Novák Michal
Popel Martin
Rosa Rudolf
Publication venue
Publication date: 01/01/2015
Field of study

The TectoMT tree-to-tree machine translation system has been updated this year to support easier retraining for more translation directions. We use multilingual standards for morphology and syntax annotation and language-independent base rules. We include a simple, non-parametric way of combining TectoMT’s transfer model outputs

Biblio at Institute of Formal and Applied Linguistics

SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task

Author: Agirre Eneko
Branco António
Gaudio Rosa
Gomes Luís
Labaka Gorka
Neale Steven
Oele Dieke
Osenova Petya
Popel Martin
Querido Andreia
Rendeiro Nuno
Rodrigues João
Silva João
Simov Kiril
van Noord Gertjan
Publication venue
Publication date: 01/01/2016
Field of study

This paper presents the description of 12 systems submitted to the WMT16 IT-task, covering six different languages, namely Basque, Bulgarian, Dutch, Czech, Portuguese and Spanish. All these systems were developed under the scope of the QTLeap project, presenting a common strategy. For each language two different systems were submitted, namely a phrase-based MT system built using Moses, and a system exploiting deep language engineering approaches, that in all the languages but Bulgarian was implemented using TectoMT. For 4 of the 6 languages, the TectoMT-based system performs better than the Moses-based one

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Biblio at Institute of Formal and Applied Linguistics

Dissertations of the University of Groningen

Using Parallel Texts and Lexicons for Verbal Word Sense Disambiguation

Author: Dušek Ondřej
Fučíková Eva
Hajič Jan
Popel Martin
Urešová Zdeňka
Šindlerová Jana
Publication venue
Publication date: 01/01/2015
Field of study

We present a system for verbal Word Sense Disambiguation (WSD) that is able to exploit additional information from parallel texts and lexicons. It is an extension of our previous WSD method, which gave promising results but used only monolingual features. In the follow-up work described here, we have explored two additional ideas: using English-Czech bilingual resources (as features only - the task itself remains a monolingual WSD task), and using a 'hybrid' approach, adding features extracted both from a parallel corpus and from manually aligned bilingual valency lexicon entries, which contain subcategorization information. Albeit not all types of features proved useful, both ideas and additions have led to significant improvements for both languages explored

Biblio at Institute of Formal and Applied Linguistics

Traducción automática basada en tectogramática para inglés-español e inglés-euskara

Author: Agirre Bengoa Eneko
Alegría Loinaz Iñaki
Aranberri Nora
Díaz de Ilarraza Sánchez Arantza
Jauregi Oneka
Labaka Intxauspe Gorka
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2016
Field of study

Presentamos los primeros sistemas de traducción automática para inglés-español e inglés-euskara basados en tectogramática. A partir del modelo ya existente inglés-checo, describimos las herramientas para el análisis y síntesis, y los recursos para la trasferencia. La evaluación muestra el potencial de estos sistemas para adaptarse a nuevas lenguas y dominios.We present the first attempt to build machine translation systems for the English-Spanish and English-Basque language pairs following the tectogrammar approach. Based on the English-Czech system, we describe the language-specific tools added in the analysis and synthesis steps, and the resources for bilingual transfer. Evaluation shows the potential of these systems for new languages and domains.The research leading to these results has received funding from FP7-ICT-2013-10-610516 (QTLeap project, qtleap.eu)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

English-to-Czech MT: Large Data and Beyond

Author: Bojar Ondřej
Publication venue
Publication date: 06/12/2018
Field of study

CU Digital Repository

Automated Translation with Interlingual Word Representations

Author: Oele Dieke Merel
Publication venue: Rijksuniversiteit Groningen
Publication date: 01/01/2018
Field of study

University of Groningen

Automated Translation with Interlingual Word Representations

Author: Oele Dieke Merel
Publication venue: Rijksuniversiteit Groningen
Publication date: 01/01/2018
Field of study

In dit proefschrift onderzoeken we het gebruik vertaalsystemen die gebruiken maken van een transferfase met interlinguale representaties van woorden. Op deze manier benaderen we het probleem van de lexicale ambiguïteit in de automatische vertaalsystemen als twee afzonderlijke taken: het bepalen van woordbetekenis en lexicale selectie. Eerst worden de woorden in de brontaal op basis van hun betekenis gedesambigueerd, resulterend in interlinguale representaties van woorden. Vervolgens wordt een lexicale selectiemodule gebruikt die het meest geschikte woord in de doeltaal selecteert. We geven een gedetailleerde beschrijving van de ontwikkeling en evaluatie van vertaalsystemen voor Nederlands-Engels. Dit biedt een achtergrond voor de experimenten in het tweede en derde deel van dit proefschrift. Daarna beschrijven we een methode die de betekenis van woorden bepaalt. Deze is vergelijkbaar met het klassieke Lesk-algoritme, omdat het gebruik maakt van het idee dat gedeelde woorden tussen de context van een woord en zijn definitie informatie over de betekenis ervan verschaffen. Wij gebruiken echter, in plaats daarvan, woord- en betekenisvectoren om de overeenkomst te berekenen tussen de definitie van een betekenis en de context van een woord. We gebruiken onze methode bovendien voor het localiseren en -interpreteren van woordgrapjes.Ten slotte presenteren we een model voor lexicale keuze dat lemma's selecteert, gegeven de abstracte representaties van woorden. Dit doen we door de grammaticale bomen te converteren naar hidden Markov bomen. Op deze manier kan de optimale combinatie van lemmas en hun context berekend worden

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Typologically robust statistical machine translation:Understanding and exploiting differences and similarities between languages in machine translation

Author: Daiber J.
Publication venue
Publication date: 01/01/2018
Field of study

International Migration, Integration and Social Cohesion online publications