Schloss Dagstuhl – Leibniz-Zentrum für Informatik GmbH
Doi
Abstract
Series : OASIcs - Open access series in informatics, ISSN 2190-6807, vol. 29In this document we describe the process of aligning two standard monolingual dictionaries: a Portuguese language dictionary and a Galician synonym dictionary. The main goal of the project is to provide an online dictionary that can show, in parallel, definitions and synonyms in Portuguese and Galician for a specific word, written in Portuguese or Galician.
These two languages are very close to each other, and that is the main reason we expect this idea to be viable. The main drawback is the lack of a good and free translation dictionary between these two languages, namely, a dictionary that can cover lexicons with more than one hundred thousand different words.
To solve this issue we defined a translation function, based on substitutions, that is able to achieve an F1 score of 0.88 on a manually verified dictionary of nine thousand words. Using this same translation function to align a Portuguese–Galician dictionary we obtained almost 50% of the dictionary lexicon (more than eighty thousand words) alignment.This work was partially supported by Grant TIN2012-38584-C06-04, supported by the Ministry of Economy and Competitiveness of the Spanish Government on “Adquisición de escenarios de conocimiento a través de la lectura de textos: Desarrollo y aplicación de recursos para el procesamiento lingüístico del gallego (SKATeR-UVIGO)”; by the Xunta de Galicia through the “Rede de Lexicografía (Relex)” (Grant CN 2012/290) and the “Rede de Tecnoloxías e análise dos datos lingüísticos” (Grant CN 2012/179); and by The Per-Fide project (grant reference no. PTDC/CLEL-LI/108948/2008, from the Portuguese Foundation for Science and Technology, and co-funded by the European Regional Development Fund)