4,886 research outputs found
Pattern matching in compilers
In this thesis we develop tools for effective and flexible pattern matching.
We introduce a new pattern matching system called amethyst. Amethyst is not
only a generator of parsers of programming languages, but can also serve as an
alternative to tools for matching regular expressions.
Our framework also produces dynamic parsers. Its intended use is in the
context of IDE (accurate syntax highlighting and error detection on the fly).
Amethyst offers pattern matching of general data structures. This makes it a
useful tool for implementing compiler optimizations such as constant folding,
instruction scheduling, and dataflow analysis in general.
The parsers produced are essentially top-down parsers. Linear time complexity
is obtained by introducing the novel notion of structured grammars and
regularized regular expressions. Amethyst uses techniques known from compiler
optimizations to produce effective parsers.Comment: master thesi
Dependency parsing of Turkish
The suitability of different parsing methods for different languages is an important topic in
syntactic parsing. Especially lesser-studied languages, typologically different from the languages
for which methods have originally been developed, poses interesting challenges in this respect.
This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative
free constituent order language that can be seen as the representative of a wider class
of languages of similar type. Our investigations show that morphological structure plays an
essential role in finding syntactic relations in such a language. In particular, we show that
employing sublexical representations called inflectional groups, rather than word forms, as the
basic parsing units improves parsing accuracy. We compare two different parsing methods, one
based on a probabilistic model with beam search, the other based on discriminative classifiers and
a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless
of parsing method.We examine the impact of morphological and lexical information in detail and
show that, properly used, this kind of information can improve parsing accuracy substantially.
Applying the techniques presented in this article, we achieve the highest reported accuracy for
parsing the Turkish Treebank
Opportunities for a Truffle-based Golo Interpreter
Golo is a simple dynamically-typed language for the Java Virtual Machine.
Initially implemented as a ahead-of-time compiler to JVM bytecode, it leverages
invokedy-namic and JSR 292 method handles to implement a reasonably efficient
runtime. Truffle is emerging as a framework for building interpreters for JVM
languages with self-specializing AST nodes. Combined with the Graal compiler,
Truffle offers a simple path towards writing efficient interpreters while
keeping the engineering efforts balanced. The Golo project is interested in
experimenting with a Truffle interpreter in the future, as it would provides
interesting comparison elements between invokedynamic versus Truffle for
building a language runtime
Attempto - From Specifications in Controlled Natural Language towards Executable Specifications
Deriving formal specifications from informal requirements is difficult since
one has to take into account the disparate conceptual worlds of the application
domain and of software development. To bridge the conceptual gap we propose
controlled natural language as a textual view on formal specifications in
logic. The specification language Attempto Controlled English (ACE) is a subset
of natural language that can be accurately and efficiently processed by a
computer, but is expressive enough to allow natural usage. The Attempto system
translates specifications in ACE into discourse representation structures and
into Prolog. The resulting knowledge base can be queried in ACE for
verification, and it can be executed for simulation, prototyping and validation
of the specification.Comment: 15 pages, compressed, uuencoded Postscript, to be presented at EMISA
Workshop 'Naturlichsprachlicher Entwurf von Informationssystemen -
Grundlagen, Methoden, Werkzeuge, Anwendungen', May 28-30, 1996, Ev. Akademie
Tutzin
A Grammatical Inference Approach to Language-Based Anomaly Detection in XML
False-positives are a problem in anomaly-based intrusion detection systems.
To counter this issue, we discuss anomaly detection for the eXtensible Markup
Language (XML) in a language-theoretic view. We argue that many XML-based
attacks target the syntactic level, i.e. the tree structure or element content,
and syntax validation of XML documents reduces the attack surface. XML offers
so-called schemas for validation, but in real world, schemas are often
unavailable, ignored or too general. In this work-in-progress paper we describe
a grammatical inference approach to learn an automaton from example XML
documents for detecting documents with anomalous syntax.
We discuss properties and expressiveness of XML to understand limits of
learnability. Our contributions are an XML Schema compatible lexical datatype
system to abstract content in XML and an algorithm to learn visibly pushdown
automata (VPA) directly from a set of examples. The proposed algorithm does not
require the tree representation of XML, so it can process large documents or
streams. The resulting deterministic VPA then allows stream validation of
documents to recognize deviations in the underlying tree structure or
datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and
Countermeasures ECTCM 201
Robust Grammatical Analysis for Spoken Dialogue Systems
We argue that grammatical analysis is a viable alternative to concept
spotting for processing spoken input in a practical spoken dialogue system. We
discuss the structure of the grammar, and a model for robust parsing which
combines linguistic sources of information and statistical sources of
information. We discuss test results suggesting that grammatical processing
allows fast and accurate processing of spoken input.Comment: Accepted for JNL
- …