155,658 research outputs found

    Hybrid rule-based - example-based MT: feeding apertium with sub-sentential translation units

    Get PDF
    This paper describes a hybrid machine translation (MT) approach that consists of integrating bilingual chunks (sub-sentential translation units) obtained from parallel corpora into an MT system built using the Apertium free/open-source rule-based machine translation platform, which uses a shallow-transfer translation approach. In the integration of bilingual chunks, special care has been taken so as not to break the application of the existing Apertium structural transfer rules, since this would increase the number of ungrammatical translations. The method consists of (i) the application of a dynamic-programming algorithm to compute the best translation coverage of the input sentence given the collection of bilingual chunks available; (ii) the translation of the input sentence as usual by Apertium; and (iii) the application of a language model to choose one of the possible translations for each of the bilingual chunks detected. Results are reported for the translation from English-to-Spanish, and vice versa, when marker-based bilingual chunks automatically obtained from parallel corpora are used

    Solving headswitching translation cases in LFG-DOT

    Get PDF
    It has been shown that LFG-MT (Kaplan et al., 1989) has difficulties with Headswitching data (Sadler et al., 1989, 1990; Sadler & Thompson, 1991). We revisit these arguments in this paper. Despite attempts at solving these problematic constructions using approaches based on linear logic (Van Genabith et al., 1998) and restriction (Kaplan & Wedekind, 1993), we point out further problems which are introduced. We then show how LFG-DOP (Bod & Kaplan, 1998) can be extended to serve as a novel hybrid model for MT, LFG-DOT (Way, 1999, 2001), which promises to improve upon the DOT model of translation (Poutsma 1998, 2000) as well as LFG-MT. LFG-DOT improves the robustness of LFG-MT through the use of the LFG-DOP Discard operator, which produces generalized fragments by discarding certain f-structure features. LFG-DOT can, therefore, deal with ill-formed or previously unseen input where LFG-MT cannot. Finally, we demonstrate that LFG-DOT can cope with such translational phenomena which prove problematic for other LFG-based models of translation

    Cross-Lingual Adaptation using Structural Correspondence Learning

    Full text link
    Cross-lingual adaptation, a special case of domain adaptation, refers to the transfer of classification knowledge between two languages. In this article we describe an extension of Structural Correspondence Learning (SCL), a recently proposed algorithm for domain adaptation, for cross-lingual adaptation. The proposed method uses unlabeled documents from both languages, along with a word translation oracle, to induce cross-lingual feature correspondences. From these correspondences a cross-lingual representation is created that enables the transfer of classification knowledge from the source to the target language. The main advantages of this approach over other approaches are its resource efficiency and task specificity. We conduct experiments in the area of cross-language topic and sentiment classification involving English as source language and German, French, and Japanese as target languages. The results show a significant improvement of the proposed method over a machine translation baseline, reducing the relative error due to cross-lingual adaptation by an average of 30% (topic classification) and 59% (sentiment classification). We further report on empirical analyses that reveal insights into the use of unlabeled data, the sensitivity with respect to important hyperparameters, and the nature of the induced cross-lingual correspondences

    Description of the Chinese-to-Spanish rule-based machine translation system developed with a hybrid combination of human annotation and statistical techniques

    Get PDF
    Two of the most popular Machine Translation (MT) paradigms are rule based (RBMT) and corpus based, which include the statistical systems (SMT). When scarce parallel corpus is available, RBMT becomes particularly attractive. This is the case of the Chinese--Spanish language pair. This article presents the first RBMT system for Chinese to Spanish. We describe a hybrid method for constructing this system taking advantage of available resources such as parallel corpora that are used to extract dictionaries and lexical and structural transfer rules. The final system is freely available online and open source. Although performance lags behind standard SMT systems for an in-domain test set, the results show that the RBMT’s coverage is competitive and it outperforms the SMT system in an out-of-domain test set. This RBMT system is available to the general public, it can be further enhanced, and it opens up the possibility of creating future hybrid MT systems.Peer ReviewedPostprint (author's final draft

    Influence tests I: ideal composite hypothesis tests, and causal semimeasures

    Full text link
    Ratios of universal enumerable semimeasures corresponding to hypotheses are investigated as a solution for statistical composite hypotheses testing if an unbounded amount of computation time can be assumed. Influence testing for discrete time series is defined using generalized structural equations. Several ideal tests are introduced, and it is argued that when Halting information is transmitted, in some cases, instantaneous cause and consequence can be inferred where this is not possible classically. The approach is contrasted with Bayesian definitions of influence, where it is left open whether all Bayesian causal associations of universal semimeasures are equal within a constant. Finally the approach is also contrasted with existing engineering procedures for influence and theoretical definitions of causation.Comment: 29 pages, 3 figures, draf
    • 

    corecore