2,500 research outputs found

    If the Current Clique Algorithms are Optimal, so is Valiant's Parser

    Full text link
    The CFG recognition problem is: given a context-free grammar G\mathcal{G} and a string ww of length nn, decide if ww can be obtained from G\mathcal{G}. This is the most basic parsing question and is a core computer science problem. Valiant's parser from 1975 solves the problem in O(nω)O(n^{\omega}) time, where ω<2.373\omega<2.373 is the matrix multiplication exponent. Dozens of parsing algorithms have been proposed over the years, yet Valiant's upper bound remains unbeaten. The best combinatorial algorithms have mildly subcubic O(n3/log3n)O(n^3/\log^3{n}) complexity. Lee (JACM'01) provided evidence that fast matrix multiplication is needed for CFG parsing, and that very efficient and practical algorithms might be hard or even impossible to obtain. Lee showed that any algorithm for a more general parsing problem with running time O(Gn3ε)O(|\mathcal{G}|\cdot n^{3-\varepsilon}) can be converted into a surprising subcubic algorithm for Boolean Matrix Multiplication. Unfortunately, Lee's hardness result required that the grammar size be G=Ω(n6)|\mathcal{G}|=\Omega(n^6). Nothing was known for the more relevant case of constant size grammars. In this work, we prove that any improvement on Valiant's algorithm, even for constant size grammars, either in terms of runtime or by avoiding the inefficiencies of fast matrix multiplication, would imply a breakthrough algorithm for the kk-Clique problem: given a graph on nn nodes, decide if there are kk that form a clique. Besides classifying the complexity of a fundamental problem, our reduction has led us to similar lower bounds for more modern and well-studied cubic time problems for which faster algorithms are highly desirable in practice: RNA Folding, a central problem in computational biology, and Dyck Language Edit Distance, answering an open question of Saha (FOCS'14)

    Approximate text generation from non-hierarchical representations in a declarative framework

    Get PDF
    This thesis is on Natural Language Generation. It describes a linguistic realisation system that translates the semantic information encoded in a conceptual graph into an English language sentence. The use of a non-hierarchically structured semantic representation (conceptual graphs) and an approximate matching between semantic structures allows us to investigate a more general version of the sentence generation problem where one is not pre-committed to a choice of the syntactically prominent elements in the initial semantics. We show clearly how the semantic structure is declaratively related to linguistically motivated syntactic representation — we use D-Tree Grammars which stem from work on Tree-Adjoining Grammars. The declarative specification of the mapping between semantics and syntax allows for different processing strategies to be exploited. A number of generation strategies have been considered: a pure topdown strategy and a chart-based generation technique which allows partially successful computations to be reused in other branches of the search space. Having a generator with increased paraphrasing power as a consequence of using non-hierarchical input and approximate matching raises the issue whether certain 'better' paraphrases can be generated before others. We investigate preference-based processing in the context of generation

    A prototype system for machine translation from English to South African Sign Language using synchronous tree adjoining grammars

    Get PDF
    Thesis (MSc)--University of Stellenbosch, 2007.ENGLISH ABSTRACT: Machine translation, especially machine translation for sign languages, remains an active research area. Sign language machine translation presents unique challenges to the whole machine translation process. In this thesis a prototype machine translation system is presented. This system is designed to translate English text into a gloss based representation of South African Sign Language (SASL). In order to perform the machine translation, a transfer based approach was taken. English text is parsed into an intermediate representation. Translation rules are then applied to this intermediate representation to transform it into an equivalent intermediate representation for the SASL glosses. For both these intermediate representations, a tree adjoining grammar (TAG) formalism is used. As part of the prototype machine translation system, a TAG parser was implemented. The translation rules used by the system were derived from a SASL phrase book. This phrase book was also used to create a small gloss based SASL TAG grammar. Lastly, some additional tools, for the editing of TAG trees, were also added to the prototype system.AFRIKAANSE OPSOMMING: Masjienvertaling, veral masjienvertaling vir gebaretale, bly ’n aktiewe navorsingsgebied. Masjienvertaling vir gebaretale bied unieke uitdagings tot die hele masjienvertalingproses. In hierdie tesis bied ons ’n prototipe masjienvertalingstelsel aan. Hierdie stelsel is ontwerp om Engelse teks te vertaal na ’n glos gebaseerde voorstelling van Suid-Afrikaanse Gebaretaal (SAG). Ons vertalingstelsel maak gebruik van ’n oorplasingsbenadering tot masjienvertaling. Engelse teks word ontleed na ’n intermediˆere vorm. Vertalingre¨els word toegepas op hierdie intermediˆere vorm om dit te transformeer na ’n ekwivalente intermediˆere vorm vir die SAG glosse. Vir beide hierdie intermediˆere vorms word boomkoppelingsgrammatikas (BKGs) gebruik. As deel van die prototipe masjienvertalingstelsel, is ’n BKG sintaksontleder ge¨ımplementeer. Die vertalingre¨els wat gebruik word deur die stelsel, is afgelei vanaf ’n SAG fraseboek. Hierdie fraseboek was ook gebruik om ’n klein BKG vir SAG glosse te ontwikkel. Laastens was addisionele nutsfasiliteite, vir die redigering van BKG bome, ontwikkel

    A prototype system for machine translation from English to South African Sign Language using synchronous tree adjoining grammars

    Get PDF
    Thesis (MSc)--University of Stellenbosch, 2007.ENGLISH ABSTRACT: Machine translation, especially machine translation for sign languages, remains an active research area. Sign language machine translation presents unique challenges to the whole machine translation process. In this thesis a prototype machine translation system is presented. This system is designed to translate English text into a gloss based representation of South African Sign Language (SASL). In order to perform the machine translation, a transfer based approach was taken. English text is parsed into an intermediate representation. Translation rules are then applied to this intermediate representation to transform it into an equivalent intermediate representation for the SASL glosses. For both these intermediate representations, a tree adjoining grammar (TAG) formalism is used. As part of the prototype machine translation system, a TAG parser was implemented. The translation rules used by the system were derived from a SASL phrase book. This phrase book was also used to create a small gloss based SASL TAG grammar. Lastly, some additional tools, for the editing of TAG trees, were also added to the prototype system.AFRIKAANSE OPSOMMING: Masjienvertaling, veral masjienvertaling vir gebaretale, bly ’n aktiewe navorsingsgebied. Masjienvertaling vir gebaretale bied unieke uitdagings tot die hele masjienvertalingproses. In hierdie tesis bied ons ’n prototipe masjienvertalingstelsel aan. Hierdie stelsel is ontwerp om Engelse teks te vertaal na ’n glos gebaseerde voorstelling van Suid-Afrikaanse Gebaretaal (SAG). Ons vertalingstelsel maak gebruik van ’n oorplasingsbenadering tot masjienvertaling. Engelse teks word ontleed na ’n intermediˆere vorm. Vertalingre¨els word toegepas op hierdie intermediˆere vorm om dit te transformeer na ’n ekwivalente intermediˆere vorm vir die SAG glosse. Vir beide hierdie intermediˆere vorms word boomkoppelingsgrammatikas (BKGs) gebruik. As deel van die prototipe masjienvertalingstelsel, is ’n BKG sintaksontleder ge¨ımplementeer. Die vertalingre¨els wat gebruik word deur die stelsel, is afgelei vanaf ’n SAG fraseboek. Hierdie fraseboek was ook gebruik om ’n klein BKG vir SAG glosse te ontwikkel. Laastens was addisionele nutsfasiliteite, vir die redigering van BKG bome, ontwikkel

    Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication

    Get PDF
    We describe a matrix multiplication recognition algorithm for a subset of binary linear context-free rewriting systems (LCFRS) with running time O(nωd)O(n^{\omega d}) where M(m)=O(mω)M(m) = O(m^{\omega}) is the running time for m×mm \times m matrix multiplication and dd is the "contact rank" of the LCFRS -- the maximal number of combination and non-combination points that appear in the grammar rules. We also show that this algorithm can be used as a subroutine to get a recognition algorithm for general binary LCFRS with running time O(nωd+1)O(n^{\omega d + 1}). The currently best known ω\omega is smaller than 2.382.38. Our result provides another proof for the best known result for parsing mildly context sensitive formalisms such as combinatory categorial grammars, head grammars, linear indexed grammars, and tree adjoining grammars, which can be parsed in time O(n4.76)O(n^{4.76}). It also shows that inversion transduction grammars can be parsed in time O(n5.76)O(n^{5.76}). In addition, binary LCFRS subsumes many other formalisms and types of grammars, for some of which we also improve the asymptotic complexity of parsing

    Principles and Implementation of Deductive Parsing

    Get PDF
    We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such deduction systems so as to implement the corresponding parser. The method generalizes easily to parsers for augmented phrase structure formalisms, such as definite-clause grammars and other logic grammar formalisms, and has been used for rapid prototyping of parsing algorithms for a variety of formalisms including variants of tree-adjoining grammars, categorial grammars, and lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod
    corecore