Extracting Portuguese-Spanish word translations from aligned parallel texts

Ribeiro, António; Lopes, Gabriel Pereira; Mexia, João Tiago Praça Nunes

research

oai:rua.ua.es:10045/1873

Extracting Portuguese-Spanish word translations from aligned parallel texts

Authors: António Ribeiro
Gabriel Pereira Lopes
João Tiago Praça Nunes Mexia
Publication date: 1 September 2000
Publisher: Sociedad Española para el Procesamiento del Lenguaje Natural

Abstract

This paper describes a method for extracting Portuguese–Spanish word translation equivalents from aligned parallel texts. This method uses the standard loglikelihood statistics to measure the similarity between two words. Parallel texts are aligned using a simple method that extends previous work by Pascale Fung & Kathleen McKeown and Melamed. In contrast, the method in this paper does not use statistically unsupported heuristics to filter reliable correspondence points. Instead, it provides the statistical support those authors could not claim by using confidence bands of linear regressions. The points of the linear regression line are generated from the positions of homograph words which occur with the same frequency in parallel text segments. With this alignment method, we are able to extract word translation equivalents (about 90 of the best 100 are correct equivalents).This research was partially supported by a grant from Fundação para a Ciência e Tecnologia / Praxis XXI

Similar works

Full text

Open in the Core reader

Download PDF

Repositorio Institucional de la Universidad de Alicante

oai:rua.ua.es:10045/1873

Last time updated on 13/09/2013

This paper was published in Repositorio Institucional de la Universidad de Alicante.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.