42,689 research outputs found
A Theory of Formal Synthesis via Inductive Learning
Formal synthesis is the process of generating a program satisfying a
high-level formal specification. In recent times, effective formal synthesis
methods have been proposed based on the use of inductive learning. We refer to
this class of methods that learn programs from examples as formal inductive
synthesis. In this paper, we present a theoretical framework for formal
inductive synthesis. We discuss how formal inductive synthesis differs from
traditional machine learning. We then describe oracle-guided inductive
synthesis (OGIS), a framework that captures a family of synthesizers that
operate by iteratively querying an oracle. An instance of OGIS that has had
much practical impact is counterexample-guided inductive synthesis (CEGIS). We
present a theoretical characterization of CEGIS for learning any program that
computes a recursive language. In particular, we analyze the relative power of
CEGIS variants where the types of counterexamples generated by the oracle
varies. We also consider the impact of bounded versus unbounded memory
available to the learning algorithm. In the special case where the universe of
candidate programs is finite, we relate the speed of convergence to the notion
of teaching dimension studied in machine learning theory. Altogether, the
results of the paper take a first step towards a theoretical foundation for the
emerging field of formal inductive synthesis
Classifying the Arithmetical Complexity of Teaching Models
This paper classifies the complexity of various teaching models by their
position in the arithmetical hierarchy. In particular, we determine the
arithmetical complexity of the index sets of the following classes: (1) the
class of uniformly r.e. families with finite teaching dimension, and (2) the
class of uniformly r.e. families with finite positive recursive teaching
dimension witnessed by a uniformly r.e. teaching sequence. We also derive the
arithmetical complexity of several other decision problems in teaching, such as
the problem of deciding, given an effective coding of all uniformly r.e. families, any such that
, any and , whether or not the
teaching dimension of with respect to is upper bounded
by .Comment: 15 pages in International Conference on Algorithmic Learning Theory,
201
Identification of biRFSA languages
International audienceThe task of identifying a language from a set of its words is not an easy one. For instance, it is not feasible to identify regular languages in the general case. Therefore, looking for subclasses of regular languages that can be identi?ed in this framework is an interesting problem. One of the most classical identi?able classes is the class of reversible languages, introduced by D. Angluin, also called bideterministic languages as they can be represented by deterministic automata (DFA) whose reverse is also deterministic. Residual Finite State Automata (RFSA) on the other hand is a class of non deterministic automata that shares some properties with DFA. In particular, DFA are RFSA and RFSA can be much smaller. We study here learnability of the class of languages that can be represented by biRFSA: RFSA whose reverse are RFSA. We prove that this class is not identi?able in general but we present two subclasses that are learnable, the second one being identi?able in polynomial time
Learning Residual Finite-State Automata Using Observation Tables
We define a two-step learner for RFSAs based on an observation table by using
an algorithm for minimal DFAs to build a table for the reversal of the language
in question and showing that we can derive the minimal RFSA from it after some
simple modifications. We compare the algorithm to two other table-based ones of
which one (by Bollig et al. 2009) infers a RFSA directly, and the other is
another two-step learner proposed by the author. We focus on the criterion of
query complexity.Comment: In Proceedings DCFS 2010, arXiv:1008.127
Are There Good Mistakes? A Theoretical Analysis of CEGIS
Counterexample-guided inductive synthesis CEGIS is used to synthesize
programs from a candidate space of programs. The technique is guaranteed to
terminate and synthesize the correct program if the space of candidate programs
is finite. But the technique may or may not terminate with the correct program
if the candidate space of programs is infinite. In this paper, we perform a
theoretical analysis of counterexample-guided inductive synthesis technique. We
investigate whether the set of candidate spaces for which the correct program
can be synthesized using CEGIS depends on the counterexamples used in inductive
synthesis, that is, whether there are good mistakes which would increase the
synthesis power. We investigate whether the use of minimal counterexamples
instead of arbitrary counterexamples expands the set of candidate spaces of
programs for which inductive synthesis can successfully synthesize a correct
program. We consider two kinds of counterexamples: minimal counterexamples and
history bounded counterexamples. The history bounded counterexample used in any
iteration of CEGIS is bounded by the examples used in previous iterations of
inductive synthesis. We examine the relative change in power of inductive
synthesis in both cases. We show that the synthesis technique using minimal
counterexamples MinCEGIS has the same synthesis power as CEGIS but the
synthesis technique using history bounded counterexamples HCEGIS has different
power than that of CEGIS, but none dominates the other.Comment: In Proceedings SYNT 2014, arXiv:1407.493
A Grammatical Inference Approach to Language-Based Anomaly Detection in XML
False-positives are a problem in anomaly-based intrusion detection systems.
To counter this issue, we discuss anomaly detection for the eXtensible Markup
Language (XML) in a language-theoretic view. We argue that many XML-based
attacks target the syntactic level, i.e. the tree structure or element content,
and syntax validation of XML documents reduces the attack surface. XML offers
so-called schemas for validation, but in real world, schemas are often
unavailable, ignored or too general. In this work-in-progress paper we describe
a grammatical inference approach to learn an automaton from example XML
documents for detecting documents with anomalous syntax.
We discuss properties and expressiveness of XML to understand limits of
learnability. Our contributions are an XML Schema compatible lexical datatype
system to abstract content in XML and an algorithm to learn visibly pushdown
automata (VPA) directly from a set of examples. The proposed algorithm does not
require the tree representation of XML, so it can process large documents or
streams. The resulting deterministic VPA then allows stream validation of
documents to recognize deviations in the underlying tree structure or
datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and
Countermeasures ECTCM 201
Inferring Symbolic Automata
We study the learnability of symbolic finite state automata, a model shown useful in many applications in software verification. The state-of-the-art literature on this topic follows the query learning paradigm, and so far all obtained results are positive. We provide a necessary condition for efficient learnability of SFAs in this paradigm, from which we obtain the first negative result. The main focus of our work lies in the learnability of SFAs under the paradigm of identification in the limit using polynomial time and data. We provide a necessary condition and a sufficient condition for efficient learnability of SFAs in this paradigm, from which we derive a positive and a negative result
Polynomial Identification of omega-Automata
We study identification in the limit using polynomial time and data for
models of omega-automata. On the negative side we show that non-deterministic
omega-automata (of types Buchi, coBuchi, Parity, Rabin, Street, or Muller)
cannot be polynomially learned in the limit. On the positive side we show that
the omega-language classes IB, IC, IP, IR, IS, and IM, which are defined by
deterministic Buchi, coBuchi, Parity, Rabin, Streett, and Muller acceptors that
are isomorphic to their right-congruence automata, are identifiable in the
limit using polynomial time and data.
We give polynomial time inclusion and equivalence algorithms for
deterministic Buchi, coBuchi, Parity, Rabin, Streett, and Muller acceptors,
which are used to show that the characteristic samples for IB, IC, IP, IR, IS,
and IM can be constructed in polynomial time.
We also provide polynomial time algorithms to test whether a given
deterministic automaton of type X (for X in {B, C, P, R, S, M})is in the class
IX (i.e. recognizes a language that has a deterministic automaton that is
isomorphic to its right congruence automaton).Comment: This is an extended version of a paper with the same name that
appeared in TACAS2
- …