3,901 research outputs found
A Grammatical Inference Approach to Language-Based Anomaly Detection in XML
False-positives are a problem in anomaly-based intrusion detection systems.
To counter this issue, we discuss anomaly detection for the eXtensible Markup
Language (XML) in a language-theoretic view. We argue that many XML-based
attacks target the syntactic level, i.e. the tree structure or element content,
and syntax validation of XML documents reduces the attack surface. XML offers
so-called schemas for validation, but in real world, schemas are often
unavailable, ignored or too general. In this work-in-progress paper we describe
a grammatical inference approach to learn an automaton from example XML
documents for detecting documents with anomalous syntax.
We discuss properties and expressiveness of XML to understand limits of
learnability. Our contributions are an XML Schema compatible lexical datatype
system to abstract content in XML and an algorithm to learn visibly pushdown
automata (VPA) directly from a set of examples. The proposed algorithm does not
require the tree representation of XML, so it can process large documents or
streams. The resulting deterministic VPA then allows stream validation of
documents to recognize deviations in the underlying tree structure or
datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and
Countermeasures ECTCM 201
Some Combinatorial Operators in Language Theory
Multitildes are regular operators that were introduced by Caron et al. in
order to increase the number of Glushkov automata. In this paper, we study the
family of the multitilde operators from an algebraic point of view using the
notion of operad. This leads to a combinatorial description of already known
results as well as new results on compositions, actions and enumerations.Comment: 21 page
Symbolic Algorithms for Language Equivalence and Kleene Algebra with Tests
We first propose algorithms for checking language equivalence of finite
automata over a large alphabet. We use symbolic automata, where the transition
function is compactly represented using a (multi-terminal) binary decision
diagrams (BDD). The key idea consists in computing a bisimulation by exploring
reachable pairs symbolically, so as to avoid redundancies. This idea can be
combined with already existing optimisations, and we show in particular a nice
integration with the disjoint sets forest data-structure from Hopcroft and
Karp's standard algorithm. Then we consider Kleene algebra with tests (KAT), an
algebraic theory that can be used for verification in various domains ranging
from compiler optimisation to network programming analysis. This theory is
decidable by reduction to language equivalence of automata on guarded strings,
a particular kind of automata that have exponentially large alphabets. We
propose several methods allowing to construct symbolic automata out of KAT
expressions, based either on Brzozowski's derivatives or standard automata
constructions. All in all, this results in efficient algorithms for deciding
equivalence of KAT expressions
Deterministic Automata for Unordered Trees
Automata for unordered unranked trees are relevant for defining schemas and
queries for data trees in Json or Xml format. While the existing notions are
well-investigated concerning expressiveness, they all lack a proper notion of
determinism, which makes it difficult to distinguish subclasses of automata for
which problems such as inclusion, equivalence, and minimization can be solved
efficiently. In this paper, we propose and investigate different notions of
"horizontal determinism", starting from automata for unranked trees in which
the horizontal evaluation is performed by finite state automata. We show that a
restriction to confluent horizontal evaluation leads to polynomial-time
emptiness and universality, but still suffers from coNP-completeness of the
emptiness of binary intersections. Finally, efficient algorithms can be
obtained by imposing an order of horizontal evaluation globally for all
automata in the class. Depending on the choice of the order, we obtain
different classes of automata, each of which has the same expressiveness as
CMso.Comment: In Proceedings GandALF 2014, arXiv:1408.556
Translation from Classical Two-Way Automata to Pebble Two-Way Automata
We study the relation between the standard two-way automata and more powerful
devices, namely, two-way finite automata with an additional "pebble" movable
along the input tape. Similarly as in the case of the classical two-way
machines, it is not known whether there exists a polynomial trade-off, in the
number of states, between the nondeterministic and deterministic pebble two-way
automata. However, we show that these two machine models are not independent:
if there exists a polynomial trade-off for the classical two-way automata, then
there must also exist a polynomial trade-off for the pebble two-way automata.
Thus, we have an upward collapse (or a downward separation) from the classical
two-way automata to more powerful pebble automata, still staying within the
class of regular languages. The same upward collapse holds for complementation
of nondeterministic two-way machines.
These results are obtained by showing that each pebble machine can be, by
using suitable inputs, simulated by a classical two-way automaton with a linear
number of states (and vice versa), despite the existing exponential blow-up
between the classical and pebble two-way machines
- …