5 research outputs found
The Mystro system: A comprehensive translator toolkit
Mystro is a system that facilities the construction of compilers, assemblers, code generators, query interpretors, and similar programs. It provides features to encourage the use of iterative enhancement. Mystro was developed in response to the needs of NASA Langley Research Center (LaRC) and enjoys a number of advantages over similar systems. There are other programs available that can be used in building translators. These typically build parser tables, usually supply the source of a parser and parts of a lexical analyzer, but provide little or no aid for code generation. In general, only the front end of the compiler is addressed. Mystro, on the other hand, emphasizes tools for both ends of a compiler
If the Current Clique Algorithms are Optimal, so is Valiant's Parser
The CFG recognition problem is: given a context-free grammar
and a string of length , decide if can be obtained from
. This is the most basic parsing question and is a core computer
science problem. Valiant's parser from 1975 solves the problem in
time, where is the matrix multiplication
exponent. Dozens of parsing algorithms have been proposed over the years, yet
Valiant's upper bound remains unbeaten. The best combinatorial algorithms have
mildly subcubic complexity.
Lee (JACM'01) provided evidence that fast matrix multiplication is needed for
CFG parsing, and that very efficient and practical algorithms might be hard or
even impossible to obtain. Lee showed that any algorithm for a more general
parsing problem with running time can
be converted into a surprising subcubic algorithm for Boolean Matrix
Multiplication. Unfortunately, Lee's hardness result required that the grammar
size be . Nothing was known for the more relevant
case of constant size grammars.
In this work, we prove that any improvement on Valiant's algorithm, even for
constant size grammars, either in terms of runtime or by avoiding the
inefficiencies of fast matrix multiplication, would imply a breakthrough
algorithm for the -Clique problem: given a graph on nodes, decide if
there are that form a clique.
Besides classifying the complexity of a fundamental problem, our reduction
has led us to similar lower bounds for more modern and well-studied cubic time
problems for which faster algorithms are highly desirable in practice: RNA
Folding, a central problem in computational biology, and Dyck Language Edit
Distance, answering an open question of Saha (FOCS'14)
Automatic error recovery for LR parsers in theory and practice
This thesis argues the need for good syntax error handling schemes in language
translation systems such as compilers, and for the automatic incorporation of such schemes
into parser-generators. Syntax errors are studied in a theoretical framework and practical
methods for handling syntax errors are presented.
The theoretical framework consists of a model for syntax errors based on the concept of
a minimum prefix-defined error correction,a sentence obtainable from an erroneous string by
performing edit operations at prefix-defined (parser defined) errors. It is shown that for an
arbitrary context-free language, it is undecidable whether a better than arbitrary choice of edit
operations can be made at a prefix-defined error. For common programming languages,it is
shown that minimum-distance errors and prefix-defined errors do not necessarily coincide,
and that there exists an infinite number of programs that differ in a single symbol only; sets
of equivalent insertions are exhibited.
Two methods for syntax error recovery are, presented. The methods are language
independent and suitable for automatic generation. The first method consists of two stages,
local repair followed if necessary by phrase-level repair. The second method consists of a
single stage in which a locally minimum-distance repair is computed. Both methods are
developed for use in the practical LR parser-generator yacc, requiring no additional
specifications from the user. A scheme for the automatic generation of diagnostic messages
in terms of the source input is presented. Performance of the methods in practice is evaluated
using a formal method based on minimum-distance and prefix-defined error correction. The
methods compare favourably with existing methods for error recovery