research

Portuguese-English word alignment: Some experiments

Abstract

In this paper we describe some studies of Portuguese-English word alignment, focusing on (i) measuring the importance of the coupling between dictionaries and corpus; (ii) assessing the relevance of using syntactic information (POS and lemma) or just word forms, and (iii) taking into account the direction of translation. We first provide some motivation for the studies, as well as insist in separating type from token alignment. We then briefly describe the resources employed: the EuroParl and COMPARA corpora, and the alignment tools, NATools, introducing some measures to evaluate the two kinds of dictionaries obtained. We then present the results of several experiments, comparing sizes, overlap, translation fertility and alignment density of the several bilingual resources built. We also describe preliminary data as far as quality of the resulting dictionaries or alignment results is concernedThis work was done in the scope of the Linguateca project, contract no. 339/1.3/C/NAC, jointly funded by the Portuguese government and the European Union. We thank Jose Joao Dias de Almeida for relevant comments during the development of these tools

    Similar works