2 research outputs found
Extracting Portuguese-Spanish word translations from aligned parallel texts
This paper describes a method
for extracting Portuguese–Spanish word
translation equivalents from aligned parallel
texts. This method uses the standard
loglikelihood statistics to measure the similarity
between two words. Parallel texts are
aligned using a simple method that extends
previous work by Pascale Fung & Kathleen
McKeown and Melamed. In contrast, the
method in this paper does not use statistically
unsupported heuristics to filter reliable
correspondence points. Instead, it provides
the statistical support those authors could
not claim by using confidence bands of linear
regressions. The points of the linear regression
line are generated from the positions
of homograph words which occur with
the same frequency in parallel text segments.
With this alignment method, we are
able to extract word translation equivalents
(about 90 of the best 100 are correct
equivalents).This research was partially supported by a grant
from Fundação para a Ciência e Tecnologia /
Praxis XXI