536 research outputs found

    Identification of Fertile Translations in Medical Comparable Corpora: a Morpho-Compositional Approach

    Get PDF
    This paper defines a method for lexicon in the biomedical domain from comparable corpora. The method is based on compositional translation and exploits morpheme-level translation equivalences. It can generate translations for a large variety of morphologically constructed words and can also generate 'fertile' translations. We show that fertile translations increase the overall quality of the extracted lexicon for English to French translation

    Automatic medical term generation for a low-resource language: translation of SNOMED CT into Basque

    Get PDF
    211 p. (eusk.) 148 p. (eng.)Tesi-lan honetan, terminoak automatikoki euskaratzeko sistemak garatu eta ebaluatu ditugu. Horretarako,SNOMED CT, terminologia kliniko zabala barnebiltzen duen ontologia hartu dugu abiapuntutzat, etaEuSnomed deritzon sistema garatu dugu horren euskaratzea kudeatzeko. EuSnomedek lau urratsekoalgoritmoa inplementatzen du terminoen euskarazko ordainak lortzeko: Lehenengo urratsak baliabidelexikalak erabiltzen ditu SNOMED CTren terminoei euskarazko ordainak zuzenean esleitzeko. Besteakbeste, Euskalterm banku terminologikoa, Zientzia eta Teknologiaren Hiztegi Entziklopedikoa, eta GizaAnatomiako Atlasa erabili ditugu. Bigarren urratserako, ingelesezko termino neoklasikoak euskaratzekoNeoTerm sistema garatu dugu. Sistema horrek, afixu neoklasikoen baliokidetzak eta transliterazio erregelakerabiltzen ditu euskarazko ordainak sortzeko. Hirugarrenerako, ingelesezko termino konplexuak euskaratzendituen KabiTerm sistema garatu dugu. KabiTermek termino konplexuetan agertzen diren habiaratutakoterminoen egiturak erabiltzen ditu euskarazko egiturak sortzeko, eta horrela termino konplexuakosatzeko. Azken urratsean, erregeletan oinarritzen den Matxin itzultzaile automatikoa osasun-zientziendomeinura egokitu dugu, MatxinMed sortuz. Horretarako Matxin domeinura egokitzeko prestatu dugu,eta besteak beste, hiztegia zabaldu diogu osasun-zientzietako testuak itzuli ahal izateko. Garatutako lauurratsak ebaluatuak izan dira metodo ezberdinak erabiliz. Alde batetik, aditu talde txiki batekin egin dugulehenengo bi urratsen ebaluazioa, eta bestetik, osasun-zientzietako euskal komunitateari esker egin dugunMedbaluatoia kanpainaren baitan azkeneko bi urratsetako sistemen ebaluazioa egin da

    On the cross-linguistic equivalence of sentir(e) in Romance languages: a contrastive study in semantics

    Get PDF
    Recent linguistic studies on perception have focused mainly on verbs referring to the dominant visual and auditory modalities, (e.g. English see/look and hear/listen) and have largely ignored the minor verbs. The present paper seeks to fill this gap by comparing the complex semantics of the cognate verbs sentir(e) in three Romance languages, namely Spanish, French and Italian. Because the objective study of semantics is a problematic issue, we pay special attention to methodological problems and opt for a combined corpus approach involving both a translation corpus and comparable data. Evidence from both corpora indicates that, notwithstanding the fact that the rich polysemy of the three verbs partly coincides, each individual verb has undergone semantic specializations differentiating the morphological cognates

    A rule-based translation from written Spanish to Spanish Sign Language glosses

    Full text link
    This is the author’s version of a work that was accepted for publication in Computer Speech and Language. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Computer Speech and Language, 28, 3 (2015) DOI: 10.1016/j.csl.2013.10.003One of the aims of Assistive Technologies is to help people with disabilities to communicate with others and to provide means of access to information. As an aid to Deaf people, we present in this work a production-quality rule-based machine system for translating from Spanish to Spanish Sign Language (LSE) glosses, which is a necessary precursor to building a full machine translation system that eventually produces animation output. The system implements a transfer-based architecture from the syntactic functions of dependency analyses. A sketch of LSE is also presented. Several topics regarding translation to sign languages are addressed: the lexical gap, the bootstrapping of a bilingual lexicon, the generation of word order for topic-oriented languages, and the treatment of classifier predicates and classifier names. The system has been evaluated with an open-domain testbed, reporting a 0.30 BLEU (BiLingual Evaluation Understudy) and 42% TER (Translation Error Rate). These results show consistent improvements over a statistical machine translation baseline, and some improvements over the same system preserving the word order in the source sentence. Finally, the linguistic analysis of errors has identified some differences due to a certain degree of structural variation in LSE

    Comparing collocations in translated and learner language

    Get PDF
    This paper compares use of collocations by Italian learners writing in and translating into English, conceptualising the two tasks as different modes of constrained language production and adopting Halverson’s (2017) Revised Gravitational Pull hypothesis as a theoretical model. A particular focus is placed on identifying a method for comparing datasets containing translations and essays, assembled opportunistically and varying in size and structure. The study shows that lexical association scores for dependency-defined word pairs are significantly higher in translations than essays. A qualitative analysis of a subset of collocations shared and unique to either mode shows that the former set features more collocations with direct cross-linguistic links (connectivity), and that the source/first language seems to affect both modes similarly. We tentatively conclude that second/target language salience effects are more visible in translation than second language use, while connectivity and source language salience affect both modes of bilingual processing similarly, regardless of the mediation variable

    The Circle of Meaning: From Translation to Paraphrasing and Back

    Get PDF
    The preservation of meaning between inputs and outputs is perhaps the most ambitious and, often, the most elusive goal of systems that attempt to process natural language. Nowhere is this goal of more obvious importance than for the tasks of machine translation and paraphrase generation. Preserving meaning between the input and the output is paramount for both, the monolingual vs bilingual distinction notwithstanding. In this thesis, I present a novel, symbiotic relationship between these two tasks that I term the "circle of meaning''. Today's statistical machine translation (SMT) systems require high quality human translations for parameter tuning, in addition to large bi-texts for learning the translation units. This parameter tuning usually involves generating translations at different points in the parameter space and obtaining feedback against human-authored reference translations as to how good the translations. This feedback then dictates what point in the parameter space should be explored next. To measure this feedback, it is generally considered wise to have multiple (usually 4) reference translations to avoid unfair penalization of translation hypotheses which could easily happen given the large number of ways in which a sentence can be translated from one language to another. However, this reliance on multiple reference translations creates a problem since they are labor intensive and expensive to obtain. Therefore, most current MT datasets only contain a single reference. This leads to the problem of reference sparsity---the primary open problem that I address in this dissertation---one that has a serious effect on the SMT parameter tuning process. Bannard and Callison-Burch (2005) were the first to provide a practical connection between phrase-based statistical machine translation and paraphrase generation. However, their technique is restricted to generating phrasal paraphrases. I build upon their approach and augment a phrasal paraphrase extractor into a sentential paraphraser with extremely broad coverage. The novelty in this augmentation lies in the further strengthening of the connection between statistical machine translation and paraphrase generation; whereas Bannard and Callison-Burch only relied on SMT machinery to extract phrasal paraphrase rules and stopped there, I take it a few steps further and build a full English-to-English SMT system. This system can, as expected, ``translate'' any English input sentence into a new English sentence with the same degree of meaning preservation that exists in a bilingual SMT system. In fact, being a state-of-the-art SMT system, it is able to generate n-best "translations" for any given input sentence. This sentential paraphraser, built almost entirely from existing SMT machinery, represents the first 180 degrees of the circle of meaning. To complete the circle, I describe a novel connection in the other direction. I claim that the sentential paraphraser, once built in this fashion, can provide a solution to the reference sparsity problem and, hence, be used to improve the performance a bilingual SMT system. I discuss two different instantiations of the sentential paraphraser and show several results that provide empirical validation for this connection

    Extending the Galician Wordnet Using a Multilingual Bible Through Lexical Alignment and Semantic Annotation

    Get PDF
    In this paper we describe the methodology and evaluation of the expansion of Galnet - the Galician wordnet - using a multilingual Bible through lexical alignment and semantic annotation. For this experiment we used the Galician, Portuguese, Spanish, Catalan and English versions of the Bible. They were annotated with part-of-speech and WordNet sense using FreeLing. The resulting synsets were aligned, and new variants for the Galician language were extracted. After manual evaluation the approach presented a 96.8% accuracy

    Traducción automática basada en tectogramática para inglés-español e inglés-euskara

    Get PDF
    Presentamos los primeros sistemas de traducción automática para inglés-español e inglés-euskara basados en tectogramática. A partir del modelo ya existente inglés-checo, describimos las herramientas para el análisis y síntesis, y los recursos para la trasferencia. La evaluación muestra el potencial de estos sistemas para adaptarse a nuevas lenguas y dominios.We present the first attempt to build machine translation systems for the English-Spanish and English-Basque language pairs following the tectogrammar approach. Based on the English-Czech system, we describe the language-specific tools added in the analysis and synthesis steps, and the resources for bilingual transfer. Evaluation shows the potential of these systems for new languages and domains.The research leading to these results has received funding from FP7-ICT-2013-10-610516 (QTLeap project, qtleap.eu)

    Ebaluatoia: crowd evaluation of English-Basque machine translation

    Get PDF
    [EU]Lan honetan Ebaluatoia aurkezten da, eskala handiko ingelesa-euskara itzulpen automatikoko ebaluazio kanpaina, komunitate-elkarlanean oinarritua. Bost sistemaren itzulpen kalitatea konparatzea izan da kanpainaren helburua, zehazki, bi sistema estatistiko, erregeletan oinarritutako bat eta sistema hibrido bat (IXA taldean garatuak) eta Google Translate. Emaitzetan oinarrituta, sistemen sailkapen bat egin dugu, baita etorkizuneko ikerkuntza bideratuko duten zenbait analisi kualitatibo ere, hain zuzen, ebaluazio-bildumako azpi-multzoen analisia, iturburuko esaldien analisi estrukturala eta itzulpenen errore-analisia. Lanak analisi hauen hastapenak aurkezten ditu, etorkizunean zein motatako analisietan sakondu erakutsiko digutenak.[EN]This dissertation reports on the crowd-based large-scale English-Basque machine translation evaluation campaign, Ebaluatoia. This initiative aimed to compare system quality for five machine translation systems: two statistical systems, a rule- based system and a hybrid system developed within the IXA group, and an external system, Google Translate. We have established a ranking of the systems under study and performed qualitative analyses to guide further research. In particular, we have carried out initial subset evaluation, structural analysis and e rror analysis to help identify where we should place future analysis effort
    • …
    corecore