370 research outputs found

    Efficient Inclusion Checking for Deterministic Tree Automata and XML Schemas

    Get PDF
    Special issue of LATA'08.International audienceWe present algorithms for testing language inclusion L(A) ⊆ L(B) between tree automata in time O(|A| |B|) where B is deterministic (bottom-up or top-down). We extend our algorithms for testing inclusion of automata for unranked trees A in deterministic DTDs or deterministic EDTDs with restrained competition D in time O(|A| |Σ| |D|). Previous algorithms were less efficient or less general

    A Grammatical Inference Approach to Language-Based Anomaly Detection in XML

    Full text link
    False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and Countermeasures ECTCM 201

    Efficient Inclusion Checking for Deterministic Tree Automata and DTDs

    Get PDF
    International audienceWe present a new algorithm for testing language inclusion L(A) ⊆ L(B)L(A) between tree automata in time O(|A| |B|) where B is deterministic. We extend this algorithm for testing inclusion between automata for unranked trees A and deterministic DTDs D in time O(|A| |Σ| |D|). No previous algorithms with these complexities exist. A journal extension is available at http://hal.inria.fr/inria-00366082

    XML Schema subtyping.

    Get PDF

    Query Induction with Schema-Guided Pruning Strategies

    Get PDF
    International audienceInference algorithms for tree automata that define node selecting queries in unranked trees rely on tree pruning strategies. These impose additional assumptions on node selection that are needed to compensate for small numbers of annotated examples. Pruning-based heuristics in query learning algorithms for Web information extraction often boost the learning quality and speed up the learning process. We will distinguish the class of regular queries that are stable under a given schema-guided pruning strategy, and show that this class is learnable with polynomial time and data. Our learning algorithm is obtained by adding pruning heuristics to the traditional learning algorithm for tree automata from positive and negative examples. While justified by a formal learning model, our learning algorithm for stable queries also performs very well in practice of XML information extraction

    Earliest Query Answering for Deterministic Nested Word Automata

    Get PDF
    International audienceEarliest query answering (EQA) is an objective of many recent streaming algorithms for XML query answering, that aim for close to optimal memory management. In this paper, we show that EQA is infeasible even for a small fragment of Forward XPath except if P=NP. We then present an EQA algorithm for queries and schemas defined by deterministic nested word automata (dNWAs) and distinguish a large class of dNWAs for which streaming query answering is feasible in polynomial space and time

    Transformations Between Different Types of Unranked Bottom-Up Tree Automata

    Full text link
    We consider the representational state complexity of unranked tree automata. The bottom-up computation of an unranked tree automaton may be either deterministic or nondeterministic, and further variants arise depending on whether the horizontal string languages defining the transitions are represented by a DFA or an NFA. Also, we consider for unranked tree automata the alternative syntactic definition of determinism introduced by Cristau et al. (FCT'05, Lect. Notes Comput. Sci. 3623, pp. 68-79). We establish upper and lower bounds for the state complexity of conversions between different types of unranked tree automata.Comment: In Proceedings DCFS 2010, arXiv:1008.127

    Deterministic Automata for Unordered Trees

    Get PDF
    Automata for unordered unranked trees are relevant for defining schemas and queries for data trees in Json or Xml format. While the existing notions are well-investigated concerning expressiveness, they all lack a proper notion of determinism, which makes it difficult to distinguish subclasses of automata for which problems such as inclusion, equivalence, and minimization can be solved efficiently. In this paper, we propose and investigate different notions of "horizontal determinism", starting from automata for unranked trees in which the horizontal evaluation is performed by finite state automata. We show that a restriction to confluent horizontal evaluation leads to polynomial-time emptiness and universality, but still suffers from coNP-completeness of the emptiness of binary intersections. Finally, efficient algorithms can be obtained by imposing an order of horizontal evaluation globally for all automata in the class. Depending on the choice of the order, we obtain different classes of automata, each of which has the same expressiveness as CMso.Comment: In Proceedings GandALF 2014, arXiv:1408.556

    Schema-Guided Induction of Monadic Queries

    Get PDF
    International audienceThe induction of monadic node selecting queries from partially annotated XML-trees is a key task in Web information extraction. We show how to integrate schema guidance into an RPNI-based learning algorithm, in which monadic queries are represented by pruning node selecting tree transducers. We present experimental results on schema guidance by the DTD of HTML
    • …
    corecore