885 research outputs found
Principles and Implementation of Deductive Parsing
We present a system for generating parsers based directly on the metaphor of
parsing as deduction. Parsing algorithms can be represented directly as
deduction systems, and a single deduction engine can interpret such deduction
systems so as to implement the corresponding parser. The method generalizes
easily to parsers for augmented phrase structure formalisms, such as
definite-clause grammars and other logic grammar formalisms, and has been used
for rapid prototyping of parsing algorithms for a variety of formalisms
including variants of tree-adjoining grammars, categorial grammars, and
lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod
An Abstract Machine for Unification Grammars
This work describes the design and implementation of an abstract machine,
Amalia, for the linguistic formalism ALE, which is based on typed feature
structures. This formalism is one of the most widely accepted in computational
linguistics and has been used for designing grammars in various linguistic
theories, most notably HPSG. Amalia is composed of data structures and a set of
instructions, augmented by a compiler from the grammatical formalism to the
abstract instructions, and a (portable) interpreter of the abstract
instructions. The effect of each instruction is defined using a low-level
language that can be executed on ordinary hardware.
The advantages of the abstract machine approach are twofold. From a
theoretical point of view, the abstract machine gives a well-defined
operational semantics to the grammatical formalism. This ensures that grammars
specified using our system are endowed with well defined meaning. It enables,
for example, to formally verify the correctness of a compiler for HPSG, given
an independent definition. From a practical point of view, Amalia is the first
system that employs a direct compilation scheme for unification grammars that
are based on typed feature structures. The use of amalia results in a much
improved performance over existing systems.
In order to test the machine on a realistic application, we have developed a
small-scale, HPSG-based grammar for a fragment of the Hebrew language, using
Amalia as the development platform. This is the first application of HPSG to a
Semitic language.Comment: Doctoral Thesis, 96 pages, many postscript figures, uses pstricks,
pst-node, psfig, fullname and a macros fil
Automatic Generation of Input Grammars Using Symbolic Execution
Invalid input often leads to unexpected behavior in a program and is behind a plethora of known and unknown vulnerabilities. To prevent improper input from being processed, the input needs to be validated before the rest of the program executes. Formal language theory facilitates the definition and recognition of proper inputs. We focus on the problem of defining valid input after the program has already been written. We construct a parser that infers the structure of inputs which avoid vulnerabilities while existing work focuses on inferring the structure of input the program anticipates. We present a tool that constructs an input language, given the program as input, using symbolic execution on symbolic arguments. This differs from existing work which tracks the execution of concrete inputs to infer a grammar. We test our tool on programs with known vulnerabilities, including programs in the GNU Coreutils library, and we demonstrate how the parser catches known invalid inputs. We conclude that the synthesis of the complete parser cannot be entirely automated due to limitations of symbolic execution tools and issues of computability. A more comprehensive parser must additionally be informed by examples and counterexamples of the input language
An Alternative Conception of Tree-Adjoining Derivation
The precise formulation of derivation for tree-adjoining grammars has
important ramifications for a wide variety of uses of the formalism, from
syntactic analysis to semantic interpretation and statistical language
modeling. We argue that the definition of tree-adjoining derivation must be
reformulated in order to manifest the proper linguistic dependencies in
derivations. The particular proposal is both precisely characterizable through
a definition of TAG derivations as equivalence classes of ordered derivation
trees, and computationally operational, by virtue of a compilation to linear
indexed grammars together with an efficient algorithm for recognition and
parsing according to the compiled grammar.Comment: 33 page
Description of the CUDF Format
This document contains several related specifications, together they describe
the document formats related to the solver competition which will be organized
by Mancoosi. In particular, this document describes: - DUDF (Distribution
Upgradeability Description Format), the document format to be used to submit
upgrade problem instances from user machines to a (distribution-specific)
database of upgrade problems; - CUDF (Common Upgradeability Description
Format), the document format used to encode upgrade problems, abstracting over
distribution-specific details. Solvers taking part in the competition will be
fed with input in CUDF format
- …