56 research outputs found

    A Formal Model of Ambiguity and its Applications in Machine Translation

    Get PDF
    Systems that process natural language must cope with and resolve ambiguity. In this dissertation, a model of language processing is advocated in which multiple inputs and multiple analyses of inputs are considered concurrently and a single analysis is only a last resort. Compared to conventional models, this approach can be understood as replacing single-element inputs and outputs with weighted sets of inputs and outputs. Although processing components must deal with sets (rather than individual elements), constraints are imposed on the elements of these sets, and the representations from existing models may be reused. However, to deal efficiently with large (or infinite) sets, compact representations of sets that share structure between elements, such as weighted finite-state transducers and synchronous context-free grammars, are necessary. These representations and algorithms for manipulating them are discussed in depth in depth. To establish the effectiveness and tractability of the proposed processing model, it is applied to several problems in machine translation. Starting with spoken language translation, it is shown that translating a set of transcription hypotheses yields better translations compared to a baseline in which a single (1-best) transcription hypothesis is selected and then translated, independent of the translation model formalism used. More subtle forms of ambiguity that arise even in text-only translation (such as decisions conventionally made during system development about how to preprocess text) are then discussed, and it is shown that the ambiguity-preserving paradigm can be employed in these cases as well, again leading to improved translation quality. A model for supervised learning that learns from training data where sets (rather than single elements) of correct labels are provided for each training instance and use it to learn a model of compound word segmentation is also introduced, which is used as a preprocessing step in machine translation

    Substring-based Machine Translation

    Get PDF
    Abstract Machine translation is traditionally formulated as the transduction of strings of words from the source to the target language. As a result, additional lexical processing steps such as morphological analysis, transliteration, and tokenization are required to process the internal structure of words to help cope with data-sparsity issues that occur when simply dividing words according to white spaces. In this paper, we take a different approach: not dividing lexical processing and translation into two steps, but simply viewing translation as a single transduction between character strings in the source and target languages. In particular, we demonstrate that the key to achieving accuracies on a par with word-based translation in the character-based framework is the use of a many-to-many alignment strategy that can accurately capture correspondences between arbitrary substrings. We build on the alignment method proposed in Neubig et al (2011), improving its efficiency and accuracy with a focus on character-based translation. Using a many-to-many aligner imbued with these improvements, we demonstrate that the traditional framework of phrase-based machine translation sees large gains in accuracy over character-based translation with more naive alignment methods, and achieves comparable results to word-based translation for two distant language pairs

    Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication

    Get PDF
    We describe a matrix multiplication recognition algorithm for a subset of binary linear context-free rewriting systems (LCFRS) with running time O(nωd)O(n^{\omega d}) where M(m)=O(mω)M(m) = O(m^{\omega}) is the running time for m×mm \times m matrix multiplication and dd is the "contact rank" of the LCFRS -- the maximal number of combination and non-combination points that appear in the grammar rules. We also show that this algorithm can be used as a subroutine to get a recognition algorithm for general binary LCFRS with running time O(nωd+1)O(n^{\omega d + 1}). The currently best known ω\omega is smaller than 2.382.38. Our result provides another proof for the best known result for parsing mildly context sensitive formalisms such as combinatory categorial grammars, head grammars, linear indexed grammars, and tree adjoining grammars, which can be parsed in time O(n4.76)O(n^{4.76}). It also shows that inversion transduction grammars can be parsed in time O(n5.76)O(n^{5.76}). In addition, binary LCFRS subsumes many other formalisms and types of grammars, for some of which we also improve the asymptotic complexity of parsing

    Adjunction in hierarchical phrase-based translation

    Get PDF

    Parsing Inside-Out

    Full text link
    The inside-outside probabilities are typically used for reestimating Probabilistic Context Free Grammars (PCFGs), just as the forward-backward probabilities are typically used for reestimating HMMs. I show several novel uses, including improving parser accuracy by matching parsing algorithms to evaluation criteria; speeding up DOP parsing by 500 times; and 30 times faster PCFG thresholding at a given accuracy level. I also give an elegant, state-of-the-art grammar formalism, which can be used to compute inside-outside probabilities; and a parser description formalism, which makes it easy to derive inside-outside formulas and many others.Comment: Ph.D. Thesis, 257 pages, 40 postscript figure

    Rich Linguistic Structure from Large-Scale Web Data

    Get PDF
    The past two decades have shown an unexpected effectiveness of Web-scale data in natural language processing. Even the simplest models, when paired with unprecedented amounts of unstructured and unlabeled Web data, have been shown to outperform sophisticated ones. It has been argued that the effectiveness of Web-scale data has undermined the necessity of sophisticated modeling or laborious data set curation. In this thesis, we argue for and illustrate an alternative view, that Web-scale data not only serves to improve the performance of simple models, but also can allow the use of qualitatively more sophisticated models that would not be deployable otherwise, leading to even further performance gains.Engineering and Applied Science

    Proceedings of the 4th DIKU-IST Joint Workshop on the Foundations of Software

    Get PDF
    corecore