355 research outputs found
Regular Expressions and Transducers over Alphabet-invariant and User-defined Labels
We are interested in regular expressions and transducers that represent word
relations in an alphabet-invariant way---for example, the set of all word pairs
u,v where v is a prefix of u independently of what the alphabet is. Current
software systems of formal language objects do not have a mechanism to define
such objects. We define transducers in which transition labels involve what we
call set specifications, some of which are alphabet invariant. In fact, we give
a more broad definition of automata-type objects, called labelled graphs, where
each transition label can be any string, as long as that string represents a
subset of a certain monoid. Then, the behaviour of the labelled graph is a
subset of that monoid. We do the same for regular expressions. We obtain
extensions of a few classic algorithmic constructions on ordinary regular
expressions and transducers at the broad level of labelled graphs and in such a
way that the computational efficiency of the extended constructions is not
sacrificed. For regular expressions with set specs we obtain the corresponding
partial derivative automata. For transducers with set specs we obtain further
algorithms that can be applied to questions about independent regular
languages, in particular the witness version of the independent property
satisfaction question
Liveness of Randomised Parameterised Systems under Arbitrary Schedulers (Technical Report)
We consider the problem of verifying liveness for systems with a finite, but
unbounded, number of processes, commonly known as parameterised systems.
Typical examples of such systems include distributed protocols (e.g. for the
dining philosopher problem). Unlike the case of verifying safety, proving
liveness is still considered extremely challenging, especially in the presence
of randomness in the system. In this paper we consider liveness under arbitrary
(including unfair) schedulers, which is often considered a desirable property
in the literature of self-stabilising systems. We introduce an automatic method
of proving liveness for randomised parameterised systems under arbitrary
schedulers. Viewing liveness as a two-player reachability game (between
Scheduler and Process), our method is a CEGAR approach that synthesises a
progress relation for Process that can be symbolically represented as a
finite-state automaton. The method is incremental and exploits both
Angluin-style L*-learning and SAT-solvers. Our experiments show that our
algorithm is able to prove liveness automatically for well-known randomised
distributed protocols, including Lehmann-Rabin Randomised Dining Philosopher
Protocol and randomised self-stabilising protocols (such as the Israeli-Jalfon
Protocol). To the best of our knowledge, this is the first fully-automatic
method that can prove liveness for randomised protocols.Comment: Full version of CAV'16 pape
Stream Processing using Grammars and Regular Expressions
In this dissertation we study regular expression based parsing and the use of
grammatical specifications for the synthesis of fast, streaming
string-processing programs.
In the first part we develop two linear-time algorithms for regular
expression based parsing with Perl-style greedy disambiguation. The first
algorithm operates in two passes in a semi-streaming fashion, using a constant
amount of working memory and an auxiliary tape storage which is written in the
first pass and consumed by the second. The second algorithm is a single-pass
and optimally streaming algorithm which outputs as much of the parse tree as is
semantically possible based on the input prefix read so far, and resorts to
buffering as many symbols as is required to resolve the next choice. Optimality
is obtained by performing a PSPACE-complete pre-analysis on the regular
expression.
In the second part we present Kleenex, a language for expressing
high-performance streaming string processing programs as regular grammars with
embedded semantic actions, and its compilation to streaming string transducers
with worst-case linear-time performance. Its underlying theory is based on
transducer decomposition into oracle and action machines, and a finite-state
specialization of the streaming parsing algorithm presented in the first part.
In the second part we also develop a new linear-time streaming parsing
algorithm for parsing expression grammars (PEG) which generalizes the regular
grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm
reformulated using least fixed points and evaluated using an instance of the
chaotic iteration scheme by Cousot and Cousot
Verifying Programs with Dynamic 1-Selector-Linked Structures in Regular Model Checking
International audienceWe address the problem of automatic verification of programs with dynamic data structures. We consider the case of sequential, non-recursive programs manipulating 1-selector-linked structures such as traditional linked lists (possibly sharing their tails) and circular lists. We propose an automata-based approach for a symbolic verification of such programs using the regular model checking framework. Given a program, the configurations of the memory are systematically encoded as words over a suitable finite alphabet, potentially infinite sets of configurations are represented by finite-state automata, and statements of the program are automatically translated into finite-state transducers defining regular relations between configurations. Then, abstract regular model checking techniques are applied in order to automatically check safety properties concerning the shape of the computed configurations or relating the input and output configurations. For that, we introduce new techniques for the computation of abstractions of the set of reachable configurations, and to refine these abstractions if spurious counterexamples are detected. Finally, we present experimental results showing the applicability of the approach and its efficiency
Recommended from our members
Symbolic Model Learning: New Algorithms and Applications
In this thesis, we study algorithms which can be used to extract, or learn, formal mathematical models from software systems and then using these models to test whether the given software systems satisfy certain security properties such as robustness against code injection attacks. Specifically, we focus on studying learning algorithms for automata and transducers and the symbolic extensions of these models, namely symbolic finite automata (SFAs). In a high level, this thesis contributes the following results:
1. In the first part of the thesis, we present a unified treatment of many common variations of the seminal L* algorithm for learning deterministic finite automata (DFAs) as a congruence learning algorithm for the underlying Nerode congruence which forms the basis of automata theory. Under this formulation the basic data structures used by different variations are unified as different ways to implement the Nerode congruence using queries.
2. Next, building on the new formulation of L*-style algorithms we proceed to develop new algorithms for learning transducer models. Firstly, we present the first algorithm for learning deterministic partial transducers. Furthermore, we extend my algorithm into non-deterministic models by introducing a novel, generalized congruence relation over string transformations which is able to capture a subclass of string transformations with regular lookahead. We demonstrate that this class is able to capture many practical string transformation from the domain of string sanitizers in Web applications.
3. Classical learning algorithms for automata and transducers operate over finite alphabets and have a query complexity that scales linearly with the size of the alphabet. However, in practice, this dependence on the alphabet size hinders the performance of the algorithms. To address this issue, we develop the MAT* algorithm for learning symbolic finite state automata (SFAs) which operate over infinite alphabets. In practice, the MAT* learning algorithm allow us to plug custom transition learning algorithms which will efficiently infer the predicates in the transitions of the SFA without querying the whole alphabet set.
4. Finally, we use our learning algorithm toolbox as the basis for the development of a set of black-box testing algorithms. More specifically, we present Grammar Oriented Filter Auditing (GOFA), a novel technique which allows one to utilize my learning algorithms to evaluate the robustness of a string sanitizer or filter against a set of attack strings given as a context-free grammar. Furthermore, because such grammars are many times unavailable, we developed sfadiff a differential testing technique based on symbolic automata learning which can be used in order to perform differential testing of two different parser implementations using SFA learning algorithms and we demonstrate how our algorithm can be used to develop program fingerprints. We evaluate our algorithms against state-of-the-art Web Application Firewalls and discover over 15 previously unknown vulnerabilities which result in evading the firewalls and performing code injection attacks in the backend Web application. Finally, we show how our learning algorithms can uncover vulnerabilities which are missed by other black-box methods such as fuzzing and grammar-based testing
On the Spectrum of Hecke Type Operators related to some Fractal Groups
We give the first example of a connected 4-regular graph whose Laplace
operator's spectrum is a Cantor set, as well as several other computations of
spectra following a common ``finite approximation'' method. These spectra are
simple transforms of the Julia sets associated to some quadratic maps. The
graphs involved are Schreier graphs of fractal groups of intermediate growth,
and are also ``substitutional graphs''. We also formulate our results in terms
of Hecke type operators related to some irreducible quasi-regular
representations of fractal groups and in terms of the Markovian operator
associated to noncommutative dynamical systems via which these fractal groups
were originally defined. In the computations we performed, the self-similarity
of the groups is reflected in the self-similarity of some operators; they are
approximated by finite counterparts whose spectrum is computed by an ad hoc
factorization process.Comment: 1 color figure, 2 color diagrams, many figure
Logics for Unranked Trees: An Overview
Labeled unranked trees are used as a model of XML documents, and logical
languages for them have been studied actively over the past several years. Such
logics have different purposes: some are better suited for extracting data,
some for expressing navigational properties, and some make it easy to relate
complex properties of trees to the existence of tree automata for those
properties. Furthermore, logics differ significantly in their model-checking
properties, their automata models, and their behavior on ordered and unordered
trees. In this paper we present a survey of logics for unranked trees
Programming Using Automata and Transducers
Automata, the simplest model of computation, have proven to be an effective tool in reasoning about programs that operate over strings. Transducers augment automata to produce outputs and have been used to model string and tree transformations such as natural language translations. The success of these models is primarily due to their closure properties and decidable procedures, but good properties come at the price of limited expressiveness. Concretely, most models only support finite alphabets and can only represent small classes of languages and transformations. We focus on addressing these limitations and bridge the gap between the theory of automata and transducers and complex real-world applications: Can we extend automata and transducer models to operate over structured and infinite alphabets? Can we design languages that hide the complexity of these formalisms? Can we define executable models that can process the input efficiently? First, we introduce succinct models of transducers that can operate over large alphabets and design BEX, a language for analysing string coders. We use BEX to prove the correctness of UTF and BASE64 encoders and decoders. Next, we develop a theory of tree transducers over infinite alphabets and design FAST, a language for analysing tree-manipulating programs. We use FAST to detect vulnerabilities in HTML sanitizers, check whether augmented reality taggers conflict, and optimize and analyze functional programs that operate over lists and trees. Finally, we focus on laying the foundations of stream processing of hierarchical data such as XML files and program traces. We introduce two new efficient and executable models that can process the input in a left-to-right linear pass: symbolic visibly pushdown automata and streaming tree transducers. Symbolic visibly pushdown automata are closed under Boolean operations and can specify and efficiently monitor complex properties for hierarchical structures over infinite alphabets. Streaming tree transducers can express and efficiently process complex XML transformations while enjoying decidable procedures
Chain-Free String Constraints (Technical Report)
We address the satisfiability problem for string constraints that combine
relational constraints represented by transducers, word equations, and string
length constraints. This problem is undecidable in general. Therefore, we
propose a new decidable fragment of string constraints, called weakly chaining
string constraints, for which we show that the satisfiability problem is
decidable. This fragment pushes the borders of decidability of string
constraints by generalising the existing straight-line as well as the acyclic
fragment of the string logic. We have developed a prototype implementation of
our new decision procedure, and integrated it into in an existing framework
that uses CEGAR with under-approximation of string constraints based on
flattening. Our experimental results show the competitiveness and accuracy of
the new framework
- …