1,935 research outputs found
Recommended from our members
Some aspects of error correction of programming languages
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The thesis treats the problem of error correction in a context free language, and the design of an error correcting parser for the BASIC language.
Two important things can be said about this thesis. First, it presents the problem of error correction in a context free language, and the existing results in the field. The concept of a context free language as a model for a programming language, and the definitions and results used later are presented or reviewed. A distance between two strings is defined and used to develop a “minimum distance error correcting parser”.
Second, the thesis develops two global error correcting parsers. The first one is the top-down global error correcting parser, obtained by transforming Unger’s top-down parser into an error correcting one.
Then the idea of Graham and Rhodes, of condensing the surrounding context of error, is extended, and a global simple precedence error correcting parser is obtained by analysing the whole content of the error, available from the input string.
These parsers, and other known methods are then used to design and partially implement an error correcting parser for BASIC.Ministry of Learning and Education of Romani
If the Current Clique Algorithms are Optimal, so is Valiant's Parser
The CFG recognition problem is: given a context-free grammar
and a string of length , decide if can be obtained from
. This is the most basic parsing question and is a core computer
science problem. Valiant's parser from 1975 solves the problem in
time, where is the matrix multiplication
exponent. Dozens of parsing algorithms have been proposed over the years, yet
Valiant's upper bound remains unbeaten. The best combinatorial algorithms have
mildly subcubic complexity.
Lee (JACM'01) provided evidence that fast matrix multiplication is needed for
CFG parsing, and that very efficient and practical algorithms might be hard or
even impossible to obtain. Lee showed that any algorithm for a more general
parsing problem with running time can
be converted into a surprising subcubic algorithm for Boolean Matrix
Multiplication. Unfortunately, Lee's hardness result required that the grammar
size be . Nothing was known for the more relevant
case of constant size grammars.
In this work, we prove that any improvement on Valiant's algorithm, even for
constant size grammars, either in terms of runtime or by avoiding the
inefficiencies of fast matrix multiplication, would imply a breakthrough
algorithm for the -Clique problem: given a graph on nodes, decide if
there are that form a clique.
Besides classifying the complexity of a fundamental problem, our reduction
has led us to similar lower bounds for more modern and well-studied cubic time
problems for which faster algorithms are highly desirable in practice: RNA
Folding, a central problem in computational biology, and Dyck Language Edit
Distance, answering an open question of Saha (FOCS'14)
Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction
Error-tolerant recognition enables the recognition of strings that deviate
mildly from any string in the regular set recognized by the underlying finite
state recognizer. Such recognition has applications in error-tolerant
morphological processing, spelling correction, and approximate string matching
in information retrieval. After a description of the concepts and algorithms
involved, we give examples from two applications: In the context of
morphological analysis, error-tolerant recognition allows misspelled input word
forms to be corrected, and morphologically analyzed concurrently. We present an
application of this to error-tolerant analysis of agglutinative morphology of
Turkish words. The algorithm can be applied to morphological analysis of any
language whose morphology is fully captured by a single (and possibly very
large) finite state transducer, regardless of the word formation processes and
morphographemic phenomena involved. In the context of spelling correction,
error-tolerant recognition can be used to enumerate correct candidate forms
from a given misspelled string within a certain edit distance. Again, it can be
applied to any language with a word list comprising all inflected forms, or
whose morphology is fully described by a finite state transducer. We present
experimental results for spelling correction for a number of languages. These
results indicate that such recognition works very efficiently for candidate
generation in spelling correction for many European languages such as English,
Dutch, French, German, Italian (and others) with very large word lists of root
and inflected forms (some containing well over 200,000 forms), generating all
candidate solutions within 10 to 45 milliseconds (with edit distance 1) on a
SparcStation 10/41. For spelling correction in Turkish, error-tolerantComment: Replaces 9504031. gzipped, uuencoded postscript file. To appear in
Computational Linguistics Volume 22 No:1, 1996, Also available as
ftp://ftp.cs.bilkent.edu.tr/pub/ko/clpaper9512.ps.
Learning bidimensional context dependent models using a context sensitive language
International Conference on Pattern Recognition (ICPR), 1996, Viena (Austria)Automatic generation of models from a set of positive and negative samples and a-priori knowledge (if available) is a crucial issue for pattern recognition applications. Grammatical inference can play an important role in this issue since it can be used to generate the set of model classes, where each class consists on the rules to generate the models. In this paper we present the process of learning context dependent bidimensional objects from outdoors images as context sensitive languages. We show how the process is conceived to overcome the problem of generalizing rules based on a set of samples which have small differences due to noisy pixels. The learned models can be used to identify objects in outdoors images irrespectively of their size and partial occlusions. Some results of the inference procedure are shown in the paper.Peer Reviewe
Syntax Error Handling in Scannerless Generalized LR Parsers
This thesis is about a master's project as part of the one year master study
'Software-engineering'. This project is about methods for improving the quality
of reporting and handling of syntax errors that are produced by a scannerless
generalized left-to-right rightmost (SGLR) parser, and is done at Centrum voor
Wiskunde en Informatica (CWI) in Amsterdam.
SGLR is a parsing algorithm developed as part of Generic Language Technol-
ogy Project at SEN1, one of the themes at CWI. SGLR is based on the GLR
algorithm developed by Tomita.
SGLR parsers are able to recognize arbitrary context-free grammars, which
enables grammar modularization. Because SGLR does not use a separate scan-
ner, also layout and comments are incorporated into the parse tree. This makes
SGLR a powerful tool for code analysis and code transformations. A drawback
is the way SGLR handles syntax errors.
When a syntax error is detected, the current implementation of SGLR halts the
parsing process and reports back to the user the point of error detection only.
The text at the point of error detection is not necessarily the text that has to
be changed to repair the error.
This thesis describes three kinds of information that could be reported to the
user, and how they could be derived from the parse process when an error is
detected. These are:
- The structure of the already parsed part of the input in the form of a partial
parse tree.
- A listing of expected symbols; those tokens or token sequences that are accept-
able instead of the erroneous text.
- The current parser state which could be translated into language dependent
informative messages.
Also two ways of recovering from an error condition are described. These are
non-correcting recovery methods that enable SGLR to always return a parse
tree that can be unparsed into the original input sentence.
- A method that halts parsing but incorporates the remainder of the input into
the parse tree.
- A method that resumes parsing by means of substring parsing.
During the course of the project the described approaches have been imple-
mented and incorporated in the implementation of SGLR as used by the Meta-
Environment, some fully, some more or less prototyped
Automatic error recovery for LR parsers in theory and practice
This thesis argues the need for good syntax error handling schemes in language
translation systems such as compilers, and for the automatic incorporation of such schemes
into parser-generators. Syntax errors are studied in a theoretical framework and practical
methods for handling syntax errors are presented.
The theoretical framework consists of a model for syntax errors based on the concept of
a minimum prefix-defined error correction,a sentence obtainable from an erroneous string by
performing edit operations at prefix-defined (parser defined) errors. It is shown that for an
arbitrary context-free language, it is undecidable whether a better than arbitrary choice of edit
operations can be made at a prefix-defined error. For common programming languages,it is
shown that minimum-distance errors and prefix-defined errors do not necessarily coincide,
and that there exists an infinite number of programs that differ in a single symbol only; sets
of equivalent insertions are exhibited.
Two methods for syntax error recovery are, presented. The methods are language
independent and suitable for automatic generation. The first method consists of two stages,
local repair followed if necessary by phrase-level repair. The second method consists of a
single stage in which a locally minimum-distance repair is computed. Both methods are
developed for use in the practical LR parser-generator yacc, requiring no additional
specifications from the user. A scheme for the automatic generation of diagnostic messages
in terms of the source input is presented. Performance of the methods in practice is evaluated
using a formal method based on minimum-distance and prefix-defined error correction. The
methods compare favourably with existing methods for error recovery
Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!
In 1975, a breakthrough result of L. Valiant showed that parsing context free grammars can be reduced to Boolean matrix multiplication, resulting in a running time of O(n^omega) for parsing where omega <= 2.373 is the exponent of fast matrix multiplication, and n is the string length. Recently, Abboud, Backurs and V. Williams (FOCS 2015) demonstrated that this is likely optimal; moreover, a combinatorial o(n^3) algorithm is unlikely to exist for the general parsing problem. The language edit distance problem is a significant generalization of the parsing problem, which computes the minimum edit distance of a given string (using insertions, deletions, and substitutions) to any valid string in the language, and has received significant attention both in theory and practice since the seminal work of Aho and Peterson in 1972. Clearly, the lower bound for parsing rules out any algorithm running in o(n^omega) time that can return a nontrivial multiplicative approximation of the language edit distance problem. Furthermore, combinatorial algorithms with cubic running time or algorithms that use fast matrix multiplication are often not desirable in practice.
To break this n^omega hardness barrier, in this paper we study additive approximation algorithms for language edit distance. We provide two explicit combinatorial algorithms to obtain a string with minimum edit distance with performance dependencies on either the number of non-linear productions, k^*, or the number of nested non-linear production, k, used in the optimal derivation. Explicitly, we give an additive O(k^*gamma) approximation in time O(|G|(n^2 + (n/gamma)^3)) and an additive O(k gamma) approximation in time O(|G|(n^2 + (n^3/gamma^2))), where |G| is the grammar size and n is the string length. In particular, we obtain tight approximations for an important subclass of context free grammars known as ultralinear grammars, for which k and k^* are naturally bounded. Interestingly, we show that the same conditional lower bound for parsing context free grammars holds for the class of ultralinear grammars as well, clearly marking the boundary where parsing becomes hard
- …