27 research outputs found
CLARIN in Latvia: current situation and future perspectives
Proceedings of the NODALIDA 2009 workshop
Nordic Perspectives on the CLARIN Infrastructure of Language Resources.
Editors: Rickard Domeij, Kimmo Koskenniemi, Steven Krauwer, Bente Maegaard,
Eiríkur Rögnvaldsson and Koenraad de Smedt.
NEALT Proceedings Series, Vol. 5 (2009), 33-37.
© 2009 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/9207
English-Latvian SMT: knowledge or data?
Proceedings of the 17th Nordic Conference of Computational Linguistics
NODALIDA 2009.
Editors: Kristiina Jokinen and Eckhard Bick.
NEALT Proceedings Series, Vol. 4 (2009), 242-245.
© 2009 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/9206
Pattern-based English-Latvian Toponym Translation
Proceedings of the 17th Nordic Conference of Computational Linguistics
NODALIDA 2009.
Editors: Kristiina Jokinen and Eckhard Bick.
NEALT Proceedings Series, Vol. 4 (2009), 41-47.
© 2009 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/9206
Portable extraction of partially structured facts from the web
A novel fact extraction task is defined to fill a gap between current information retrieval and information extraction technologies. It is shown that it is possible to extract useful partially structured facts about different kinds of entities in a broad domain, i.e. all kinds of places depicted in tourist images. Importantly the approach does not rely on existing linguistic resources (gazetteers, taggers, parsers, etc.) and it ported easily and cheaply between two very different languages (English and Latvian). Previous fact extraction from the web has focused on the extraction of structured data, e.g. (Building-LocatedIn-Town). In contrast we extract richer and more interesting facts, such as a fact explaining why a building was built. Enough structure is maintained to facilitate subsequent processing of the information. For example, this partial structure enables straightforward template-based text generation. We report positive results for the correctness and interest of English and Latvian facts and for the utility of the extracted facts in enhancing image captions
From Terminology Database to Platform for Terminology Services
Proceedings of the Workshop
CHAT 2011: Creation, Harmonization and Application of Terminology Resources.
Editors: Tatiana Gornostay and Andrejs Vasiļjevs.
NEALT Proceedings Series, Vol. 12 (2011), 16-21.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/16956
Preface
Proceedings of the 18th Nordic Conference of Computational Linguistics
NODALIDA 2011.
Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa.
NEALT Proceedings Series, Vol. 11 (2011), viii-ix.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/16955
Angļu-latviešu statistiskās mašīntulkošanas sistēmas izveide: metodes, resursi un pirmie rezultāti
<p class="Pa4"><strong>DEVELOPMENT OF ENGLISH-LATVIAN STATISTICAL MACHINE TRANSLATION SYSTEM: METHODS, RESOURCES AND FIRST RESULTS</strong></p><p class="Pa5"><em>Summary</em></p><p>This paper presents research and development of English-Latvian Statistical Machine Translation (SMT) prototypes for legal domain. Several methods have been investigated, i.e., phrase-based models and factored models. Translation quality has been evaluated using automated metrics (BLEU score) and human evaluation. In automatic evaluation the best score (46.44 BLEU points) was assigned to factored model trained on JRC Acquis corpus (version 3.0) which was also evaluated as the best from the human viewpoint. In addition, error analysis of SMT output was performed. This analysis showed that although the output of the best prototype demonstrated a reasonable quality, it had several frequent common errors, i.e., incorrect form, missing words and wrong word order. For the future, work on tree-based SMT and hybrid systems is proposed.</p
Comprehension Assistant for Languages of Baltic States
Proceedings of the 16th Nordic Conference
of Computational Linguistics NODALIDA-2007.
Editors: Joakim Nivre, Heiki-Jaan Kaalep, Kadri Muischnek and Mare Koit.
University of Tartu, Tartu, 2007.
ISBN 978-9985-4-0513-0 (online)
ISBN 978-9985-4-0514-7 (CD-ROM)
pp. 167-174
META-NORD: Baltic and Nordic Branch of the European Open Linguistic Infrastructure
Proceedings of the NODALIDA 2011 Workshop
Visibility and Availability of LT Resources.
Editors: Sjur Nørstebø Moshagen and Per Langgård.
NEALT Proceedings Series, Vol. 13 (2011), 18–22.
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/1697
Deep dive machine translation
Machine Translation (MT) is one of the oldest language technologies having
been researched for more than 70 years. However, it is only during the last decade
that it has been widely accepted by the general public, to the point where in many
cases it has become an indispensable tool for the global community, supporting communication
between nations and lowering language barriers. Still, there remain major
gaps in the technology that need addressing before it can be successfully applied
in under-resourced settings, can understand context and use world knowledge. This
chapter provides an overview of the current state-of-the-art in the field of MT, offers
technical and scientific forecasting for 2030, and provides recommendations for the
advancement of MT as a critical technology if the goal of digital language equality
in Europe is to be achieved