Sociedad Española para el Procesamiento del Lenguaje Natural
Abstract
This paper describes a method
for extracting Portuguese–Spanish word
translation equivalents from aligned parallel
texts. This method uses the standard
loglikelihood statistics to measure the similarity
between two words. Parallel texts are
aligned using a simple method that extends
previous work by Pascale Fung & Kathleen
McKeown and Melamed. In contrast, the
method in this paper does not use statistically
unsupported heuristics to filter reliable
correspondence points. Instead, it provides
the statistical support those authors could
not claim by using confidence bands of linear
regressions. The points of the linear regression
line are generated from the positions
of homograph words which occur with
the same frequency in parallel text segments.
With this alignment method, we are
able to extract word translation equivalents
(about 90 of the best 100 are correct
equivalents).This research was partially supported by a grant
from Fundação para a Ciência e Tecnologia /
Praxis XXI
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.