466 research outputs found

    Terminology Integration in Statistical Machine Translation

    Get PDF
    Elektroniskā versija nesatur pielikumusPromocijas darbs apraksta autora izpētītas metodes un izstrādātus rīkus divvalodu terminoloģijas integrācijai statistiskās mašīntulkošanas sistēmās. Autors darbā piedāvā inovatīvas metodes terminu integrācijai SMT sistēmu trenēšanas fāzē (ar statiskas integrācijas palīdzību) un tulkošanas fāzē (ar dinamiskas integrācijas palīdzību). Darbā uzmanība pievērsta ne tikai metodēm terminu integrācijai SMT, bet arī metodēm valodas resursu, kas nepieciešami dažādu uzdevumu veikšanai terminu integrācijas SMT darbplūsmās, ieguvei. Piedāvātās metodes ir novērtētas automātiskas un manuālas novērtēšanas eksperimentos. Iegūtie rezultāti parāda, ka statiskās un dinamiskās integrācijas metodes ļauj būtiski uzlabot tulkošanas kvalitāti. Darbā aprakstītie rezultāti ir aprobēti vairākos pētniecības projektos un ieviesti praktiskos risinājumos. Atslēgvārdi: statistiskā mašīntulkošana, terminoloģija, starpvalodu informācijas izvilkšanaThe doctoral thesis describes methods and tools researched and developed by the author for bilingual terminology integration into statistical machine translation systems. The author presents novel methods for terminology integration in SMT systems during training (through static integration) and during translation (through dynamic integration). The work focusses not only on the SMT integration techniques, but also on methods for acquisition of linguistic resources that are necessary for different tasks involved in workflows for terminology integration in SMT systems. The proposed methods have been evaluated using automatic and manual evaluation methods. The results show that both static and dynamic integration methods allow increasing translation quality. The thesis describes also areas where the methods have been approbated in practice. Keywords: statistical machine translation, terminology, cross-lingual information extractio

    Zināšanās bāzētu un korpusā bāzētu metožu kombinētā izmantošanas mašīntulkošanā

    Get PDF
    ANOTĀCIJA. Mašīntulkošanas (MT) sistēmas tiek būvētas izmantojot dažādas metodes (zināšanās un korpusā bāzētas). Zināšanās bāzēta MT tulko tekstu, izmantojot cilvēka rakstītus likumus. Korpusā bāzēta MT izmanto no tulkojumu piemēriem automātiski izgūtus modeļus. Abām metodēm ir gan priekšrocības, gan trūkumi. Šajā darbā tiek meklēta kombināta metode MT kvalitātes uzlabošanai, kombinējot abas metodes. Darbā tiek pētīta metožu piemērotība latviešu valodai, kas ir maza, morfoloģiski bagāta valoda ar ierobežotiem resursiem. Tiek analizētas esošās metodes un tiek piedāvātas vairākas kombinētās metodes. Metodes ir realizētas un novērtētas, izmantojot gan automātiskas, gan cilvēka novērtēšanas metodes. Faktorēta statistiskā MT ar zināšanās balstītu morfoloģisko analizatoru ir piedāvāta kā perspektīvākā. Darbā aprakstīts arī metodes praktiskais pielietojums. Atslēgas vārdi: mašīntulkošana (MT), zināšanās balstīta MT, korpusā balstīta MT, kombinēta metodeABSTRACT. Machine Translation (MT) systems are built using different methods (knowledge-based and corpus-based). Knowledge-based MT translates text using human created rules. Corpus-based MT uses models which are automatically built from translation examples. Both methods have their advantages and disadvantages. This work aims to find a combined method to improve the MT quality combining both methods. An applicability of the methods for Latvian (a small, morphologically rich, under-resourced language) is researched. The existing MT methods have been analyzed and several combined methods have been proposed. Methods have been implemented and evaluated using an automatic and human evaluation. The factored statistical MT with a rule-based morphological analyzer is proposed to be the most promising. The practical application of methods is described. Keywords: Machine Translation (MT), Rule-based MT, Statistical MT, Combined approac

    Conference Program

    Get PDF
    Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), xii-xvii. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

    Conference Program

    Get PDF
    Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit. University of Tartu, Tartu, 2007. ISBN 978-9985-4-0513-0 (online) ISBN 978-9985-4-0514-7 (CD-ROM) pp. xiii-xviii

    Contents

    Get PDF
    Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007. Editors: Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit. University of Tartu, Tartu, 2007. ISBN 978-9985-4-0513-0 (online) ISBN 978-9985-4-0514-7 (CD-ROM) pp. iii-viii

    Contents

    Get PDF
    Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), iii-vii. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Extracting bilingual terms from the Web

    Get PDF
    In this paper we make two contributions. First, we describe a multi-component system called BiTES (Bilingual Term Extraction System) designed to automatically gather domain-specific bilingual term pairs from Web data. BiTES components consist of data gathering tools, domain classifiers, monolingual text extraction systems and bilingual term aligners. BiTES is readily extendable to new language pairs and has been successfully used to gather bilingual terminology for 24 language pairs, including English and all official EU languages, save Irish. Second, we describe a novel set of methods for evaluating the main components of BiTES and present the results of our evaluation for six language pairs. Results show that the BiTES approach can be used to successfully harvest quality bilingual term pairs from the Web. Our evaluation method delivers significant insights about the strengths and weaknesses of our techniques. It can be straightforwardly reused to evaluate other bilingual term extraction systems and makes a novel contribution to the study of how to evaluate bilingual terminology extraction systems

    Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

    Get PDF
    Peer reviewe
    corecore