23 research outputs found

    Parse tree based machine translation for less-used languages

    Get PDF
    The article describes a method that enhances translation performance of language pairs with a less used source language and a widely used target language. We propose a method that enables the use of parse tree based statistical translation algorithms for language pairs with a less used source language and a widely used target language. Automatic part of speech (POS) tagging algorithms have become accurate to the extent of efficient use in many tasks. Most of these methods are quite easily implementable in most world languages. The method is divided in two partsthe first part constructs alignments between POS tags of source sentences and induced parse trees of target language. The second part searches through trained data and selects the best candidates for target sentences, the translations. The method was not fully implemented due to time constraintsthe training part was implemented and incorporated into a functional translation systemthe inclusion of a word alignment model into the translation part was not implemented. The empirical evaluation addressing the quality of trained data was carried out on a full implementation of the presented training algorithms and the results confirm the employability of the method

    N-gram-based statistical machine translation versus syntax augmented machine translation: comparison and system combination

    Get PDF
    In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual units related to word-to-word alignment and statistical modeling of the bilingual context following a maximumentropy framework. We provide a stepby- step comparison of the systems and report results in terms of automatic evaluation metrics and required computational resources for a smaller Arabic-to-English translation task (1.5M tokens in the training corpus). Human error analysis clarifies advantages and disadvantages of the systems under consideration. Finally, we combine the output of both systems to yield significant improvements in translation quality.Postprint (published version

    Statistical Machine Translation Using the Self-Organizing Map

    Get PDF
    The paper describes a contextual environment using the Self-Organizing Map, which can model a semantic agent (SOMAgent) that learns the correct meaning of a word used in context in order to deal with specific phenomena such as ambiguity, and to generate more precise alignments that can improve the first choice of the Statistical Machine Translation system giving linguistic knowledge

    Systematic evaluation of machine translation methods for image and video annotation

    Get PDF
    In this study, we present a systematic evaluation of machine translation methods applied to the image annotation problem. We used the well-studied Corel data set and the broadcast news videos used by TRECVID 2003 as our dataset. We experimented with different models of machine translation with different parameters. The results showed that the simplest model produces the best performance. Based on this experience, we also proposed a new method, based on cross-lingual information retrieval techniques, and obtained a better retrieval performance. © Springer-Verlag Berlin Heidelberg 2005

    Tree-to-string alignment template for statistical machine translation

    Full text link

    Approaches to Machine Translation: A Review

    Get PDF
    Translation is the transfer of the meaning of a text from one language to another. It is a means of sharing information across languages and therefore essential for addressing information inequalities. The work of translation was originally carried out by human translators and its limitations led to the development of machine translators. Machine Translation is a subfield of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. There are different approaches to machine translation. This paper reviews the two major approaches (single vs. hybrid) to machine translation and provides critique of existing machine translation systems with their merits and demerits. Several application areas of machine translation and various methods used in evaluating them were also discussed. Our conclusion from the reviewed literatures is that a single approach to machine translation fails to achieve satisfactory performance resulting in lower quality and fluency of the output. On the other hand, a hybrid approach combines the strength of two or more approaches to improve the overall quality and fluency of the translation

    Bimorphisms and synchronous grammars

    Get PDF
    We tend to think of the study of language as proceeding by characterizing the strings and structures of a language, and we think of natural language processing as using those structures to build systems of utility in manipulating the language. But many language-related problems are more fruitfully viewed as requiring the specification of a relation between two languages, rather than the specification of a single language. We provide a synthesis and extension of work that unifies two approaches to such language relations: the automata-theoretic approach based on tree transducers that transform trees to their counterparts in the relation, and the grammatical approach based on synchronous grammars that derive pairs of trees in the relation. In particular, we characterize synchronous tree-substitution grammars and synchronous tree-adjoining grammars in terms of bimorphisms, which have previously been used to characterize tree transducers. In the process, we provide new approaches to formalizing the various concepts: a metanotation for describing varieties of tree automata and transducers in equational terms; a rigorous formalization of tree-adjoining and tree-substitution grammars and their synchronous counterparts, using trees over ranked alphabets; and generalizations of tree-adjoining grammar allowing multiple adjunction.Engineering and Applied Science

    Getting Past the Language Gap: Innovations in Machine Translation

    Get PDF
    In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT

    A Formal Model of Ambiguity and its Applications in Machine Translation

    Get PDF
    Systems that process natural language must cope with and resolve ambiguity. In this dissertation, a model of language processing is advocated in which multiple inputs and multiple analyses of inputs are considered concurrently and a single analysis is only a last resort. Compared to conventional models, this approach can be understood as replacing single-element inputs and outputs with weighted sets of inputs and outputs. Although processing components must deal with sets (rather than individual elements), constraints are imposed on the elements of these sets, and the representations from existing models may be reused. However, to deal efficiently with large (or infinite) sets, compact representations of sets that share structure between elements, such as weighted finite-state transducers and synchronous context-free grammars, are necessary. These representations and algorithms for manipulating them are discussed in depth in depth. To establish the effectiveness and tractability of the proposed processing model, it is applied to several problems in machine translation. Starting with spoken language translation, it is shown that translating a set of transcription hypotheses yields better translations compared to a baseline in which a single (1-best) transcription hypothesis is selected and then translated, independent of the translation model formalism used. More subtle forms of ambiguity that arise even in text-only translation (such as decisions conventionally made during system development about how to preprocess text) are then discussed, and it is shown that the ambiguity-preserving paradigm can be employed in these cases as well, again leading to improved translation quality. A model for supervised learning that learns from training data where sets (rather than single elements) of correct labels are provided for each training instance and use it to learn a model of compound word segmentation is also introduced, which is used as a preprocessing step in machine translation
    corecore