3,143 research outputs found

    XL-AMR: Enabling Cross-Lingual AMR Parsing with Transfer Learning Techniques

    Get PDF
    Abstract Meaning Representation (AMR) is a popular formalism of natural language that represents the meaning of a sentence as a semantic graph. It is agnostic about how to derive meanings from strings and for this reason it lends itself well to the encoding of semantics across languages. However, cross-lingual AMR parsing is a hard task, because training data are scarce in languages other than English and the existing English AMR parsers are not directly suited to being used in a cross-lingual setting. In this work we tackle these two problems so as to enable cross-lingual AMR parsing: we explore different transfer learning techniques for producing automatic AMR annotations across languages and develop a cross-lingual AMR parser, XL-AMR. This can be trained on the produced data and does not rely on AMR aligners or source-copy mechanisms as is commonly the case in English AMR parsing. The results of XL-AMR significantly surpass those previously reported in Chinese, German, Italian and Spanish. Finally we provide a qualitative analysis which sheds light on the suitability of AMR across languages. We release XL-AMR at github.com/SapienzaNLP/xl-amr

    A minimal transfer conception for Verbmobil

    Get PDF
    In this paper we introduce the transfer conception MinT that is currently being developed for the prototype of the face-to-face translation system verbmobil. The acronym MinT stands for Minimal Transfer. MinT is a semantic-oriented transfer model that is based on some central ideas of the MRS-based approach outlined in [Copestake et al., 1995], and the Shake-and-Bake approach to machine translation sketched in [Whitelock, 1992]. The central idea of minimal transfer is to relate the source and target language semantic descriptions on a maximal abstract level, without falling back into the well-known problems of the Interlingua approach. Minimal transfer results in simultaneously decreasing the number of transfer rules and leaving a maximal set of options for lexicalization and grammaticalization up to the generator. In sum, MinT can be characterized as a semantic-oriented, unification-based and lexicalist transfer model. Its main knowledge base are transfer statements which provide the correspondences between underspecied semantic predicates of the source and target language. Transfer statements comprise both bilingual and monolingual correspondences. Bilingual correspondences, on the one hand, establish the equivalence between sets of semantic predicates of the source and target languages. They are formulated in a strictly declarative way and can be applied bidirectionally. In order to solve translational ambiguities, the roles and instances of a predicate are typed with fine-grained sorts that are supplied by an elaborated sort hierarchy. Monolingual correspondences, on the other hand, provide a solution to divergences in the logical structure of the languages involved. The idea is to allow the transfer component to initiate further compositional processes if this is motivated by the contrastive situation. Thus, the input structure is transformed into a logically equivalent semantic representation that is shared by the target language. This way, all contrastive knowledge is contained in the transfer component, which allows strict modularity of analysis and generation

    Abstraction and underspecification in semantic transfer

    Get PDF
    This paper introduces the semantic transfer approach MinT (Minimal Transfer) that has been developed in the speech-to-speech MT system VERBMOBIL. As a unification-based and lexicalist semantic transfer model, it relies on some central ideas of the MRS-based transfer approach outlined in [Copestake et al., 1995]. It differs, however, from the latter in certain aspects: in MinT, the idea of abstraction and underspecification is worked out in much more detail and has been applied to a variety of translation phenomena. MinT relates SL and TL semantic descriptions on a maximally abstract level, which results in simultaneously decreasing the number of transfer rules and leaving a considerable amount of options for lexicalization and grammaticalization up to the generator. To preserve ambiguities that hold across the involved languages MinT processes underspecified semantic representations

    Localisation Training in Spain and Beyond: Towards a Consensus on Content and Approach

    Get PDF
    Since localisation emerged in the 1980s as an activity linked to the software industry, its evolution has gone hand in hand with technological advances. In the globalised market of the 21st century, an ever-increasing range of digital products must be localised. While academic institutions are aware of how the increasing demand for localisation is affecting the translation industry, there is no consensus regarding what and how courses and modules on localisation should be taught. This article reports the findings of a survey-based study that adopted a descriptive-interpretive methodology to collect both quantitative and qualitative data from a group of 16 localisation trainers teaching on undergraduate translation programmes at Spanish universities. To contextualise and help with the focus of the survey, a literature review on localiser education was carried out. The results of both the survey and the literature review reinforce the findings of an earlier unpublished study by the same authors that localisation training is keeping pace with technological evolution, despite its scarce presence in translation studies curricula. In addition, respondents noted that one of their main challenges is finding authentic teaching materials and recommended closer collaboration between academia and the localisation

    Combining Linguistic and Machine Learning Techniques for Word Alignment Improvement

    Get PDF
    Alignment of words, i.e., detection of corresponding units between two sentences that are translations of each other, has been shown to be crucial for the success of many NLP applications such as statistical machine translation (MT), construction of bilingual lexicons, word-sense disambiguation, and projection of resources between languages. With the availability of large parallel texts, statistical word alignment systems have proven to be quite successful on many language pairs. However, these systems are still faced with several challenges due to the complexity of the word alignment problem, lack of enough training data, difficulty learning statistics correctly, translation divergences, and lack of a means for incremental incorporation of linguistic knowledge. This thesis presents two new frameworks to improve existing word alignments using supervised learning techniques. In the first framework, two rule-based approaches are introduced. The first approach, Divergence Unraveling for Statistical MT (DUSTer), specifically targets translation divergences and corrects the alignment links related to them using a set of manually-crafted, linguistically-motivated rules. In the second approach, Alignment Link Projection (ALP), the rules are generated automatically by adapting transformation-based error-driven learning to the word alignment problem. By conditioning the rules on initial alignment and linguistic properties of the words, ALP manages to categorize the errors of the initial system and correct them. The second framework, Multi-Align, is an alignment combination framework based on classifier ensembles. The thesis presents a neural-network based implementation of Multi-Align, called NeurAlign. By treating individual alignments as classifiers, NeurAlign builds an additional model to learn how to combine the input alignments effectively. The evaluations show that the proposed techniques yield significant improvements (up to 40% relative error reduction) over existing word alignment systems on four different language pairs, even with limited manually annotated data. Moreover, all three systems allow an easy integration of linguistic knowledge into statistical models without the need for large modifications to existing systems. Finally, the improvements are analyzed using various measures, including the impact of improved word alignments in an external application---phrase-based MT

    Incompatibilidades intersemióticas em memes: um estudo a partir de resultados de tradução automática do inglês para o português

    Get PDF
    O objetivo deste trabalho é apresentar achados recentes acerca do uso de resultados do Google Tradutor em contextos multimodais. O desenvolvimento e avaliação de tradução automática frequentemente enfocam o componente linguístico, contudo há pouca exploração manual de relações texto-imagem em documentos multimodais. Assim, este trabalho busca descrever algumas relações texto-imagem em memes do inglês traduzidos automaticamente para o português. A metodologia envolve a seleção e análise de 100 memes, encontrados em páginas do Instagram e do Facebook e suas relações intersemióticas tanto em inglês (como texto-fonte) como em português (como texto-alvo). Dos memes analisados, 73% resultaram em traduções corretas, 17% tiveram erros sem qualquer tipo de incompatibilidade intersemiótica, e apenas 10% apresentaram um desvio linguístico que alterou a relação entre texto e imagem do meme. Desses 10% de incompatibilidades, emergiram amostras de incompatibilidades envolvendo, por exemplo i) palavras incorretas e relação aditiva; e ii) palavra desconhecida e homoespacialidade. Ao fim, os resultados encontrados demonstram que a tradução automática de alguns memes, cuja relação semântica entre texto e imagem compartilham maior congruência, apresentam um número maior de incompatibilidades em comparação com aqueles em que isso não acontece
    corecore