291 research outputs found
Derivative Based Extended Regular Expression Matching Supporting Intersection, Complement and Lookarounds
Regular expressions are widely used in software. Various regular expression
engines support different combinations of extensions to classical regular
constructs such as Kleene star, concatenation, nondeterministic choice (union
in terms of match semantics). The extensions include e.g. anchors, lookarounds,
counters, backreferences. The properties of combinations of such extensions
have been subject of active recent research.
In the current paper we present a symbolic derivatives based approach to
finding matches to regular expressions that, in addition to the classical
regular constructs, also support complement, intersection and lookarounds (both
negative and positive lookaheads and lookbacks). The theory of computing
symbolic derivatives and determining nullability given an input string is
presented that shows that such a combination of extensions yields a match
semantics that corresponds to an effective Boolean algebra, which in turn opens
up possibilities of applying various Boolean logic rewrite rules to optimize
the search for matches.
In addition to the theoretical framework we present an implementation of the
combination of extensions to demonstrate the efficacy of the approach
accompanied with practical examples
Value-Function Approximations for Partially Observable Markov Decision Processes
Partially observable Markov decision processes (POMDPs) provide an elegant
mathematical framework for modeling complex decision and planning problems in
stochastic domains in which states of the system are observable only
indirectly, via a set of imperfect or noisy observations. The modeling
advantage of POMDPs, however, comes at a price -- exact methods for solving
them are computationally very expensive and thus applicable in practice only to
very simple problems. We focus on efficient approximation (heuristic) methods
that attempt to alleviate the computational problem and trade off accuracy for
speed. We have two objectives here. First, we survey various approximation
methods, analyze their properties and relations and provide some new insights
into their differences. Second, we present a number of new approximation
methods and novel refinements of existing techniques. The theoretical results
are supported by experiments on a problem from the agent navigation domain
LL(1) Parsing with Derivatives and Zippers
In this paper, we present an efficient, functional, and formally verified
parsing algorithm for LL(1) context-free expressions based on the concept of
derivatives of formal languages. Parsing with derivatives is an elegant parsing
technique, which, in the general case, suffers from cubic worst-case time
complexity and slow performance in practice. We specialise the parsing with
derivatives algorithm to LL(1) context-free expressions, where alternatives can
be chosen given a single token of lookahead. We formalise the notion of LL(1)
expressions and show how to efficiently check the LL(1) property. Next, we
present a novel linear-time parsing with derivatives algorithm for LL(1)
expressions operating on a zipper-inspired data structure. We prove the
algorithm correct in Coq and present an implementation as a parser combinators
framework in Scala, with enumeration and pretty printing capabilities.Comment: Appeared at PLDI'20 under the title "Zippy LL(1) Parsing with
Derivatives
flap: A Deterministic Parser with Fused Lexing
Lexers and parsers are typically defined separately and connected by a token
stream. This separate definition is important for modularity and reduces the
potential for parsing ambiguity. However, materializing tokens as data
structures and case-switching on tokens comes with a cost. We show how to fuse
separately-defined lexers and parsers, drastically improving performance
without compromising modularity or increasing ambiguity. We propose a
deterministic variant of Greibach Normal Form that ensures deterministic
parsing with a single token of lookahead and makes fusion strikingly simple,
and prove that normalizing context free expressions into the deterministic
normal form is semantics-preserving. Our staged parser combinator library,
flap, provides a standard interface, but generates specialized token-free code
that runs two to six times faster than ocamlyacc on a range of benchmarks.Comment: PLDI 2023 with appendi
Stream Processing using Grammars and Regular Expressions
In this dissertation we study regular expression based parsing and the use of
grammatical specifications for the synthesis of fast, streaming
string-processing programs.
In the first part we develop two linear-time algorithms for regular
expression based parsing with Perl-style greedy disambiguation. The first
algorithm operates in two passes in a semi-streaming fashion, using a constant
amount of working memory and an auxiliary tape storage which is written in the
first pass and consumed by the second. The second algorithm is a single-pass
and optimally streaming algorithm which outputs as much of the parse tree as is
semantically possible based on the input prefix read so far, and resorts to
buffering as many symbols as is required to resolve the next choice. Optimality
is obtained by performing a PSPACE-complete pre-analysis on the regular
expression.
In the second part we present Kleenex, a language for expressing
high-performance streaming string processing programs as regular grammars with
embedded semantic actions, and its compilation to streaming string transducers
with worst-case linear-time performance. Its underlying theory is based on
transducer decomposition into oracle and action machines, and a finite-state
specialization of the streaming parsing algorithm presented in the first part.
In the second part we also develop a new linear-time streaming parsing
algorithm for parsing expression grammars (PEG) which generalizes the regular
grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm
reformulated using least fixed points and evaluated using an instance of the
chaotic iteration scheme by Cousot and Cousot
- …