7 research outputs found

    Matxin: moving towards language independence

    Get PDF
    This paper describes some of the issues found when adapting and extending the Matxin free-software machine translation system to other language pairs. It sketches out some of the characteristics of Matxin and offers some possible solutions to these issues.This research was supported in part by the Spanish Ministry of Education and Science (OpenMT, TIN2006-15307-C03-01)

    TXALA un analizador libre de dependencias para el castellano

    Get PDF
    Esta demostración presenta la primera versión de Txala, un analizador de dependencias para el castellano desarrollado bajo licencia LGPL. Este analizador se enmarca dentro de la generación de una plataforma de software libre para la traducción. La carencia de este tipo de analizadores sintácticos para el castellano, hace que ésta sea una herramienta necesaria para el progreso del PLN en castellano.In this demo we present the first version of Txala, a dependency parser for Spanish developed under LGPL license. This parser is framed in the development of a free-software platform for Machine Translation. Due to the lack of this kind of syntactic parsers for Spanish, this tool is essential for the development of NLP in Spanish.Esta investigación ha sido parcialmente financiada por el Ministerio de Industria, Turismo y Comercio PROFIT FIT-340101-2004-3

    Etiquetado semiautomático del rasgo semántico de animicidad para su uso en un sistema de traducción automática

    Get PDF
    The ambiguity related to the use of movement and localisation declension cases in Basque is a serious problem in the morphological generation phase in Machine Translation. We present the approach we have developed to solve this ambiguity. Information about the [±animate] semantic feature of the lemma we want to decline is necessary to choose the appropriate suffix. The lexical database used does not contain such information. Besides, it would be very hard to add it manually to the 28.000 substantives contained in it. For this reason, we made two experiments to obtain the [±animate] feature from other resources. First, our aim was to get automatically the required knowledge from corpora, but the results were not good. Secondly, after a minimal manual tagging and using semantic relations between words extracted from definitions of a monolingual dictionary, we tagged more than half of the words in real texts with the [±animate] feature.Este trabajo está subvencionado por la Universidad del País Vasco (UPV 141.226-G19/99)

    Matxin-Informatika, version of Matxin translation system adapted to the computer science domain

    No full text
    Presentamos Matxin-Informatika, una versión del traductor automático Matxin (de castellano a euskera) adaptada al dominio de la informática a partir de corpus bilingüe y recursos diccionariales. Esta versión va a ser utilizada para una tarea de postedición manual en un entorno colaborativo, a partir de la cual se obtendrá un corpus que servirá para obtener una nueva mejora del traductor mediante postedición estadística.We present Matxin-Informatika, a new version of Matxin translation system (from Spanish to Basque) that has been adapted to the domain of computer science using bilingual corpus and lexical resources. This version is being used in a manual post-editing task to allow further improvement of the translator by means of statistical post-editing

    An open-source shallow-transfer machine translation engine for the Romance languages of Spain

    Get PDF
    We present the current status of development of an open-source shallow-transfer machine translation engine for the Romance languages of Spain (the main ones being Spanish, Catalan and Galician) as part of a larger government-funded project which includes non-Romance languages such as Basque and involving both universities and linguistic technology companies. The machine translation architecture uses finite-state transducers for lexical processing, hidden Markov models for part-of-speech tagging, and finite-state based chunking for structural transfer, and is largely based upon that of systems already developed by the Transducens group at the Universitat d'Alacant, such as interNOSTRUM (Spanish—Catalan) and Traductor Universia (Spanish—Portuguese). The possible scope of the project, however, is wider, since it will be possible to use the resulting machine translation system with new pairs of languages; to that end, the project also aims at proposing standard formats to encode the linguistic data needed. This paper briefly describes the machine translation engine, the formats it uses for linguistic data, and the compilers that convert these data into an efficient format used by the engine.Work funded by projects FIT-340101-2004-3 (Spanish Ministry of Industry, Commerce and Tourism) and TIC2003-08681-C02-01 (Spanish Ministry of Science and Technology). Felipe Sánchez-Martínez is supported by the Spanish Ministry of Science and Education and the European Social Fund through grant BES-2004-4711

    Evaluation of a rule-based machine translation system or why BLEU is only useful for what it is meant to be used

    No full text
    Matxin es un sistema de traducción automática basado en reglas que traduce a euskera. Para su evaluación hemos usado la métrica HTER que calcula el coste de postedición, concluyendo que un editor necesitaría cambiar 4 de cada 10 palabras para corregir la salida del sistema. La calidad de las traducciones del sistema Matxin ha podido ser comparada con las de un sistema basado en corpus, obteniendo el segundo unos resultados significativamente peores. Debido al uso generalizado de BLEU, hemos querido estudiar los resultados BLEU conseguidos por ambos sistemas, constatando que esta métrica no es efectiva ni para medir la calidad absoluta de un sistema, ni para comparar sistemas que usan estrategias diferentes.Matxin is a rule-based machine translation system which translates to Basque. For its evaluation we have used the HTER metric which calculates the post-editing cost, concluding that 4 of each 10 words would have to be modified to correct the output generated by the system. We have compared the quality of Matxin translations with that of a corpus based system, and the results show that Matxin performs significantly better. Given the widespread use of BLEU, we have examined the BLEU scores for both systems, and we conclude that this metric is neither effective to measure the absolute quality of a system, nor suitable to compare systems based on different strategies.Esta investigación ha recibido ayuda del Ministerio de Educación y Ciencia a través de los proyectos OpenMT: Open Source Machine Translation using hybrid methods (TIN2006-15307-C03-01) y Ricoterm-3 (HUM2007-65966-CO2-02)
    corecore