33 research outputs found

    Clause restructuring for statistical machine translation

    Full text link

    Composition of Tree Series Transformations

    Get PDF
    Tree series transformations computed by bottom-up and top-down tree series transducers are called bottom-up and top-down tree series transformations, respectively. (Functional) compositions of such transformations are investigated. It turns out that the class of bottomup tree series transformations over a commutative and complete semiring is closed under left-composition with linear bottom-up tree series transformations and right-composition with boolean deterministic bottom-up tree series transformations. Moreover, it is shown that the class of top-down tree series transformations over a commutative and complete semiring is closed under right-composition with linear, nondeleting top-down tree series transformations. Finally, the composition of a boolean, deterministic, total top-down tree series transformation with a linear top-down tree series transformation is shown to be a top-down tree series transformation

    Survey : Weighted extended top-down tree transducers part I. : basics and expressive power

    Get PDF
    Weighted extended top-down tree transducers (transducteurs généralisés descendants [Arnold, Dauchet: Bi-transductions de forêts. ICALP'76. Edinburgh University Press, 1976]) received renewed interest in the field of Natural Language Processing, where they are used in syntax-based machine translation. This survey presents the foundations for a theoretical analysis of weighted extended top-down tree transducers. In particular, it discusses essentially complete semirings, which are a novel concept that can be used to lift incomparability results from the unweighted case to the weighted case even in the presence of infinite sums. In addition, several equivalent ways to define weighted extended top-down tree transducers are presented and the individual benefits of each presentation is shown on a small result

    Consistency of Probabilistic Context-Free Grammars

    Get PDF
    We present an algorithm for deciding whether an arbitrary proper probabilistic context-free grammar is consistent, i.e., whether the probability that a derivation terminates is one. Our procedure has time complexity mathcalO(n3)\\\\mathcal O(n^3) in the unit-cost model of computation. Moreover, we develop a novel characterization of consistent probabilistic context-free grammars. A simple corollary of our result is that training methods for probabilistic context-free grammars that are based on maximum-likelihood estimation always yield consistent grammars

    Tantangan dan Peluang pada Question Generation

    Get PDF
    Abstrak Pada makalah ini, kami melakukan survey beberapa penelitian yang membahas mengenai question generation (QG). QG adalah sebuah teknik untuk membangkitkan pertanyaan yang berasal dari sebuah kalimat atau teks dalam bentuk bahasa alami. Kami mencoba menelaah garis besar konseptual question generation yang terdiri dari tiga kategori yaitu : berbasis sintaks, berbasis semantik, dan berbasis template. Sistem question generation dalam kategori sintaksis sering menggunakan unsur semantik dan sebaliknya. Sedangkan sistem yang berbasis template menggunakan beberapa tingkat sintaksis dan/atau informasi semantik. Hasil akhir dari survey ini adalah sebuah review berupa tantangan dan peluang dalam pengembangan penelitian di masa mendatang, yaitu berupa : (a) Tantangan pada isu semantik leksikal dan sintaktik, (b) penggunaan alternatif segitiga Vauquois, shallow parser dan (c) representasi sintaksis dengan struktur pohon frasa.Kata kunci : question generation, leksikal, sintaksis, transformasi kalimat, segitiga Vauquois.Abstract In this paper, we reviewed the current state of the art in the question generation (QG). Question Generation (QG) is the task of generating reasonable questions from a text or sentence of natural language. We attempted to examine the question of conceptual outline generation consisting of three categories: Syntax based, semantic-based and template-based. Question generation system in the syntactic category often uses semantic elements and vice versa. While the template-based system using multiple levels of syntactic and / or semantic information. The final results of this survey is a review in the form of challenges and opportunities in the development of future research, which are: (a) challenge on the issue of lexical semantic and syntactic, (b) the use of alternative Vauquois triangular, shallow parser, and (c) the syntactic representation phrase structure tree.Key word : question generation, leksikal, sintaksis, transformasi kalimat, segitiga Vauquoi

    Proactive Synthesis of Recursive Tree-to-String Functions from Examples

    Get PDF
    Synthesis from examples enables non-expert users to generate programs by specifying examples of their behavior. A domain-specific form of such synthesis has been recently deployed in a widely used spreadsheet software product. In this paper we contribute to foundations of such techniques and present a complete algorithm for synthesis of a class of recursive functions defined by structural recursion over a given algebraic data type definition. The functions we consider map an algebraic data type to a string; they are useful for, e.g., pretty printing and serialization of programs and data. We formalize our problem as learning deterministic sequential top-down tree-to-string transducers with a single state (1STS). The first problem we consider is learning a tree-to-string transducer from any set of input/output examples provided by the user. We show that, given a set of input/output examples, checking whether there exists a 1STS consistent with these examples is NP-complete in general. In contrast, the problem can be solved in polynomial time under a (practically useful) closure condition that each subtree of a tree in the input/output example set is also part of the input/output examples. Because coming up with relevant input/output examples may be difficult for the user while creating hard constraint problems for the synthesizer, we also study a more automated active learning scenario in which the algorithm chooses the inputs for which the user provides the outputs. Our algorithm asks a worst-case linear number of queries as a function of the size of the algebraic data type definition to determine a unique transducer. To construct our algorithms we present two new results on formal languages. First, we define a class of word equations, called sequential word equations, for which we prove that satisfiability can be solved in deterministic polynomial time. This is in contrast to the general word equations for which the best known complexity upper bound is in linear space. Second, we close a long-standing open problem about the asymptotic size of test sets for context-free languages. A test set of a language of words L is a subset T of L such that any two word homomorphisms equivalent on T are also equivalent on L. We prove that it is possible to build test sets of cubic size for context-free languages, matching for the first time the lower bound found 20 years ago
    corecore