2,500 research outputs found
If the Current Clique Algorithms are Optimal, so is Valiant's Parser
The CFG recognition problem is: given a context-free grammar
and a string of length , decide if can be obtained from
. This is the most basic parsing question and is a core computer
science problem. Valiant's parser from 1975 solves the problem in
time, where is the matrix multiplication
exponent. Dozens of parsing algorithms have been proposed over the years, yet
Valiant's upper bound remains unbeaten. The best combinatorial algorithms have
mildly subcubic complexity.
Lee (JACM'01) provided evidence that fast matrix multiplication is needed for
CFG parsing, and that very efficient and practical algorithms might be hard or
even impossible to obtain. Lee showed that any algorithm for a more general
parsing problem with running time can
be converted into a surprising subcubic algorithm for Boolean Matrix
Multiplication. Unfortunately, Lee's hardness result required that the grammar
size be . Nothing was known for the more relevant
case of constant size grammars.
In this work, we prove that any improvement on Valiant's algorithm, even for
constant size grammars, either in terms of runtime or by avoiding the
inefficiencies of fast matrix multiplication, would imply a breakthrough
algorithm for the -Clique problem: given a graph on nodes, decide if
there are that form a clique.
Besides classifying the complexity of a fundamental problem, our reduction
has led us to similar lower bounds for more modern and well-studied cubic time
problems for which faster algorithms are highly desirable in practice: RNA
Folding, a central problem in computational biology, and Dyck Language Edit
Distance, answering an open question of Saha (FOCS'14)
Approximate text generation from non-hierarchical representations in a declarative framework
This thesis is on Natural Language Generation. It describes a linguistic realisation
system that translates the semantic information encoded in a conceptual graph into an
English language sentence. The use of a non-hierarchically structured semantic representation (conceptual graphs) and an approximate matching between semantic structures allows us to investigate a more general version of the sentence generation problem
where one is not pre-committed to a choice of the syntactically prominent elements in
the initial semantics. We show clearly how the semantic structure is declaratively related to linguistically motivated syntactic representation — we use D-Tree Grammars
which stem from work on Tree-Adjoining Grammars. The declarative specification of
the mapping between semantics and syntax allows for different processing strategies
to be exploited. A number of generation strategies have been considered: a pure topdown strategy and a chart-based generation technique which allows partially successful
computations to be reused in other branches of the search space. Having a generator
with increased paraphrasing power as a consequence of using non-hierarchical input
and approximate matching raises the issue whether certain 'better' paraphrases can be
generated before others. We investigate preference-based processing in the context of
generation
A prototype system for machine translation from English to South African Sign Language using synchronous tree adjoining grammars
Thesis (MSc)--University of Stellenbosch, 2007.ENGLISH ABSTRACT: Machine translation, especially machine translation for sign languages, remains an active research
area. Sign language machine translation presents unique challenges to the whole machine translation
process. In this thesis a prototype machine translation system is presented. This system is
designed to translate English text into a gloss based representation of South African Sign Language
(SASL).
In order to perform the machine translation, a transfer based approach was taken. English
text is parsed into an intermediate representation. Translation rules are then applied to this
intermediate representation to transform it into an equivalent intermediate representation for the
SASL glosses. For both these intermediate representations, a tree adjoining grammar (TAG)
formalism is used. As part of the prototype machine translation system, a TAG parser was
implemented.
The translation rules used by the system were derived from a SASL phrase book. This phrase
book was also used to create a small gloss based SASL TAG grammar. Lastly, some additional
tools, for the editing of TAG trees, were also added to the prototype system.AFRIKAANSE OPSOMMING: Masjienvertaling, veral masjienvertaling vir gebaretale, bly ’n aktiewe navorsingsgebied. Masjienvertaling
vir gebaretale bied unieke uitdagings tot die hele masjienvertalingproses. In hierdie tesis
bied ons ’n prototipe masjienvertalingstelsel aan. Hierdie stelsel is ontwerp om Engelse teks te
vertaal na ’n glos gebaseerde voorstelling van Suid-Afrikaanse Gebaretaal (SAG).
Ons vertalingstelsel maak gebruik van ’n oorplasingsbenadering tot masjienvertaling. Engelse
teks word ontleed na ’n intermediˆere vorm. Vertalingre¨els word toegepas op hierdie intermediˆere
vorm om dit te transformeer na ’n ekwivalente intermediˆere vorm vir die SAG glosse. Vir beide
hierdie intermediˆere vorms word boomkoppelingsgrammatikas (BKGs) gebruik. As deel van die
prototipe masjienvertalingstelsel, is ’n BKG sintaksontleder ge¨ımplementeer.
Die vertalingre¨els wat gebruik word deur die stelsel, is afgelei vanaf ’n SAG fraseboek. Hierdie
fraseboek was ook gebruik om ’n klein BKG vir SAG glosse te ontwikkel. Laastens was addisionele
nutsfasiliteite, vir die redigering van BKG bome, ontwikkel
A prototype system for machine translation from English to South African Sign Language using synchronous tree adjoining grammars
Thesis (MSc)--University of Stellenbosch, 2007.ENGLISH ABSTRACT: Machine translation, especially machine translation for sign languages, remains an active research
area. Sign language machine translation presents unique challenges to the whole machine translation
process. In this thesis a prototype machine translation system is presented. This system is
designed to translate English text into a gloss based representation of South African Sign Language
(SASL).
In order to perform the machine translation, a transfer based approach was taken. English
text is parsed into an intermediate representation. Translation rules are then applied to this
intermediate representation to transform it into an equivalent intermediate representation for the
SASL glosses. For both these intermediate representations, a tree adjoining grammar (TAG)
formalism is used. As part of the prototype machine translation system, a TAG parser was
implemented.
The translation rules used by the system were derived from a SASL phrase book. This phrase
book was also used to create a small gloss based SASL TAG grammar. Lastly, some additional
tools, for the editing of TAG trees, were also added to the prototype system.AFRIKAANSE OPSOMMING: Masjienvertaling, veral masjienvertaling vir gebaretale, bly ’n aktiewe navorsingsgebied. Masjienvertaling
vir gebaretale bied unieke uitdagings tot die hele masjienvertalingproses. In hierdie tesis
bied ons ’n prototipe masjienvertalingstelsel aan. Hierdie stelsel is ontwerp om Engelse teks te
vertaal na ’n glos gebaseerde voorstelling van Suid-Afrikaanse Gebaretaal (SAG).
Ons vertalingstelsel maak gebruik van ’n oorplasingsbenadering tot masjienvertaling. Engelse
teks word ontleed na ’n intermediˆere vorm. Vertalingre¨els word toegepas op hierdie intermediˆere
vorm om dit te transformeer na ’n ekwivalente intermediˆere vorm vir die SAG glosse. Vir beide
hierdie intermediˆere vorms word boomkoppelingsgrammatikas (BKGs) gebruik. As deel van die
prototipe masjienvertalingstelsel, is ’n BKG sintaksontleder ge¨ımplementeer.
Die vertalingre¨els wat gebruik word deur die stelsel, is afgelei vanaf ’n SAG fraseboek. Hierdie
fraseboek was ook gebruik om ’n klein BKG vir SAG glosse te ontwikkel. Laastens was addisionele
nutsfasiliteite, vir die redigering van BKG bome, ontwikkel
Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication
We describe a matrix multiplication recognition algorithm for a subset of
binary linear context-free rewriting systems (LCFRS) with running time
where is the running time for matrix multiplication and is the "contact rank" of the LCFRS --
the maximal number of combination and non-combination points that appear in the
grammar rules. We also show that this algorithm can be used as a subroutine to
get a recognition algorithm for general binary LCFRS with running time
. The currently best known is smaller than
. Our result provides another proof for the best known result for parsing
mildly context sensitive formalisms such as combinatory categorial grammars,
head grammars, linear indexed grammars, and tree adjoining grammars, which can
be parsed in time . It also shows that inversion transduction
grammars can be parsed in time . In addition, binary LCFRS
subsumes many other formalisms and types of grammars, for some of which we also
improve the asymptotic complexity of parsing
Principles and Implementation of Deductive Parsing
We present a system for generating parsers based directly on the metaphor of
parsing as deduction. Parsing algorithms can be represented directly as
deduction systems, and a single deduction engine can interpret such deduction
systems so as to implement the corresponding parser. The method generalizes
easily to parsers for augmented phrase structure formalisms, such as
definite-clause grammars and other logic grammar formalisms, and has been used
for rapid prototyping of parsing algorithms for a variety of formalisms
including variants of tree-adjoining grammars, categorial grammars, and
lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod
- …