3,901 research outputs found

    A Grammatical Inference Approach to Language-Based Anomaly Detection in XML

    Full text link
    False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and Countermeasures ECTCM 201

    Some Combinatorial Operators in Language Theory

    Full text link
    Multitildes are regular operators that were introduced by Caron et al. in order to increase the number of Glushkov automata. In this paper, we study the family of the multitilde operators from an algebraic point of view using the notion of operad. This leads to a combinatorial description of already known results as well as new results on compositions, actions and enumerations.Comment: 21 page

    Symbolic Algorithms for Language Equivalence and Kleene Algebra with Tests

    Get PDF
    We first propose algorithms for checking language equivalence of finite automata over a large alphabet. We use symbolic automata, where the transition function is compactly represented using a (multi-terminal) binary decision diagrams (BDD). The key idea consists in computing a bisimulation by exploring reachable pairs symbolically, so as to avoid redundancies. This idea can be combined with already existing optimisations, and we show in particular a nice integration with the disjoint sets forest data-structure from Hopcroft and Karp's standard algorithm. Then we consider Kleene algebra with tests (KAT), an algebraic theory that can be used for verification in various domains ranging from compiler optimisation to network programming analysis. This theory is decidable by reduction to language equivalence of automata on guarded strings, a particular kind of automata that have exponentially large alphabets. We propose several methods allowing to construct symbolic automata out of KAT expressions, based either on Brzozowski's derivatives or standard automata constructions. All in all, this results in efficient algorithms for deciding equivalence of KAT expressions

    Deterministic Automata for Unordered Trees

    Get PDF
    Automata for unordered unranked trees are relevant for defining schemas and queries for data trees in Json or Xml format. While the existing notions are well-investigated concerning expressiveness, they all lack a proper notion of determinism, which makes it difficult to distinguish subclasses of automata for which problems such as inclusion, equivalence, and minimization can be solved efficiently. In this paper, we propose and investigate different notions of "horizontal determinism", starting from automata for unranked trees in which the horizontal evaluation is performed by finite state automata. We show that a restriction to confluent horizontal evaluation leads to polynomial-time emptiness and universality, but still suffers from coNP-completeness of the emptiness of binary intersections. Finally, efficient algorithms can be obtained by imposing an order of horizontal evaluation globally for all automata in the class. Depending on the choice of the order, we obtain different classes of automata, each of which has the same expressiveness as CMso.Comment: In Proceedings GandALF 2014, arXiv:1408.556

    Translation from Classical Two-Way Automata to Pebble Two-Way Automata

    Get PDF
    We study the relation between the standard two-way automata and more powerful devices, namely, two-way finite automata with an additional "pebble" movable along the input tape. Similarly as in the case of the classical two-way machines, it is not known whether there exists a polynomial trade-off, in the number of states, between the nondeterministic and deterministic pebble two-way automata. However, we show that these two machine models are not independent: if there exists a polynomial trade-off for the classical two-way automata, then there must also exist a polynomial trade-off for the pebble two-way automata. Thus, we have an upward collapse (or a downward separation) from the classical two-way automata to more powerful pebble automata, still staying within the class of regular languages. The same upward collapse holds for complementation of nondeterministic two-way machines. These results are obtained by showing that each pebble machine can be, by using suitable inputs, simulated by a classical two-way automaton with a linear number of states (and vice versa), despite the existing exponential blow-up between the classical and pebble two-way machines
    • …
    corecore