Search CORE

78 research outputs found

Automatic Extraction of Translation Resources from Parallel Corpora

Author: Simões Alberto Manuel
Publication venue
Publication date: 08/10/2009
Field of study

Repositório Comum

NATools -- A Statistical Word Aligner Workbench

Author: Almeida José João
Simões Alberto Manuel
Publication venue
Publication date: 25/09/2008
Field of study

Repositório Comum

Combinatory Examples Extraction for Machine Translation

Author: Almeida José João
Simões Alberto Manuel
Publication venue
Publication date: 06/11/2008
Field of study

One of the bottlenecks of example-based machine translation (EBMT) is to be able to amass automatically quantities of good examples. In our work in EBMT, we are investigating how far one can go by performing example extraction from parallel corpora using Probabilistic Translation Dictionaries to obtain example segmentation points. In fact, the success of EBMT highly depends on examples quality and quantity, but also in their length. Thus, we give special importance on methods to extract different size examples from the same translation unit. With this article we show that it is possible to extract quantities for examples from parallel corpora just using probabilistic translation dictionaries extracted from the same corpor

Repositório Comum

Expansión de wordnets mediante unidades pluriverbales extraídas de corpus paralelos

Author: Gómez Guinovart Xavier
Simões Alberto Manuel
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2020
Field of study

In this paper we present a method for enlarging wordnets focusing on multi-word terms and utilising data from parallel corpora. Our approach is validated using the Galician and Portuguese wordnets. The multi-word candidates obtained in this experiment were manually validated, obtaining a 73.2% accuracy for the Galician language and a 75.5% for the Portuguese language.Presentamos un método para la ampliación de wordnets en el ámbito de las unidades pluriverbales, usando datos de corpus paralelos y aplicando el método a la expansión de los wordnets del gallego y del portugués. Las unidades pluriverbales que se obtienen en este experimento se validaron manualmente, obteniendo una precisión del 73.2% para el gallego y del 75.5% para el portugués.This research has been carried out thanks to the project DeepReading (RTI2018-096846-B-C21) supported by the Ministry of Science, Innovation and Universities of the Spanish Government and the European Fund for Regional Development (MCIU/AEI/FEDER), and was partially funded by Portuguese National funds (PIDDAC), through the FCT – Fundação para a Ciência e Tecnologia and FCT/MCTES under the scope of the project UIDB/05549/2020

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas