Search CORE

370 research outputs found

Efficient Inclusion Checking for Deterministic Tree Automata and XML Schemas

Author: Champavère Jérôme
Gilleron Rémi
Lemay Aurélien
Niehren Joachim
Publication venue: 'Elsevier BV'
Publication date: 01/11/2009
Field of study

Special issue of LATA'08.International audienceWe present algorithms for testing language inclusion L(A) ⊆ L(B) between tree automata in time O(|A| |B|) where B is deterministic (bottom-up or top-down). We extend our algorithms for testing inclusion of automata for unranked trees A in deterministic DTDs or deterministic EDTDs with restrained competition D in time O(|A| |Σ| |D|). Previous algorithms were less efficient or less general

HAL - Lille 3

Elsevier - Publisher Connector

INRIA a CCSD electronic archive server

A Grammatical Inference Approach to Language-Based Anomaly Detection in XML

Author: Lampesberger Harald
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and Countermeasures ECTCM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Efficient Inclusion Checking for Deterministic Tree Automata and DTDs

Author: Champavère Jérôme
Gilleron Rémi
Lemay Aurélien
Niehren Joachim
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/03/2008
Field of study

International audienceWe present a new algorithm for testing language inclusion L(A) ⊆ L(B)L(A) between tree automata in time O(|A| |B|) where B is deterministic. We extend this algorithm for testing inclusion between automata for unranked trees A and deterministic DTDs D in time O(|A| |Σ| |D|). No previous algorithms with these complexities exist. A journal extension is available at http://hal.inria.fr/inria-00366082

HAL - Lille 3

INRIA a CCSD electronic archive server

XML Schema subtyping.

Author: Li Yun
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2006
Field of study

Scholarship at UWindsor

Query Induction with Schema-Guided Pruning Strategies

Author: Champavère Jérôme
Gilleron Rémi
Lemay Aurélien
Niehren Joachim
Publication venue: Microtome Publishing
Publication date: 01/01/2013
Field of study

International audienceInference algorithms for tree automata that define node selecting queries in unranked trees rely on tree pruning strategies. These impose additional assumptions on node selection that are needed to compensate for small numbers of annotated examples. Pruning-based heuristics in query learning algorithms for Web information extraction often boost the learning quality and speed up the learning process. We will distinguish the class of regular queries that are stable under a given schema-guided pruning strategy, and show that this class is learnable with polynomial time and data. Our learning algorithm is obtained by adding pruning heuristics to the traditional learning algorithm for tree automata from positive and negative examples. While justified by a formal learning model, our learning algorithm for stable queries also performs very well in practice of XML information extraction

HAL - Lille 3

CiteSeerX

INRIA a CCSD electronic archive server

Earliest Query Answering for Deterministic Nested Word Automata

Author: A. Berlea
A. Neumann
D. Olteanu
G. Miklau
H. Seidl
L. Segoufin
M. Benedikt
M. Grohe
O. Gauwin
O. Gauwin
R. Alur
W. Martens
Publication venue: 'Nordic Pulp and Paper Research Journal'
Publication date: 01/01/2009
Field of study

International audienceEarliest query answering (EQA) is an objective of many recent streaming algorithms for XML query answering, that aim for close to optimal memory management. In this paper, we show that EQA is infeasible even for a small fragment of Forward XPath except if P=NP. We then present an EQA algorithm for queries and schemas defined by deterministic nested word automata (dNWAs) and distinguish a large class of dNWAs for which streaming query answering is feasible in polynomial space and time

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Transformations Between Different Types of Unranked Bottom-Up Tree Automata

Author: Giovanni Pighizzini
Ian McQuillan
Kai Salomaa
Xiaoxue Piao
Publication venue: 'Open Publishing Association'
Publication date: 10/08/2010
Field of study

We consider the representational state complexity of unranked tree automata. The bottom-up computation of an unranked tree automaton may be either deterministic or nondeterministic, and further variants arise depending on whether the horizontal string languages defining the transitions are represented by a DFA or an NFA. Also, we consider for unranked tree automata the alternative syntactic definition of determinism introduced by Cristau et al. (FCT'05, Lect. Notes Comput. Sci. 3623, pp. 68-79). We establish upper and lower bounds for the state complexity of conversions between different types of unranked tree automata.Comment: In Proceedings DCFS 2010, arXiv:1008.127

arXiv.org e-Print Archive

Crossref

Deterministic Automata for Unordered Trees

Author: Boiret Adrien
Hugot Vincent
Niehren Joachim
Treinen Ralf
Publication venue: 'Open Publishing Association'
Publication date: 01/08/2014
Field of study

Automata for unordered unranked trees are relevant for defining schemas and queries for data trees in Json or Xml format. While the existing notions are well-investigated concerning expressiveness, they all lack a proper notion of determinism, which makes it difficult to distinguish subclasses of automata for which problems such as inclusion, equivalence, and minimization can be solved efficiently. In this paper, we propose and investigate different notions of "horizontal determinism", starting from automata for unranked trees in which the horizontal evaluation is performed by finite state automata. We show that a restriction to confluent horizontal evaluation leads to polynomial-time emptiness and universality, but still suffers from coNP-completeness of the emptiness of binary intersections. Finally, efficient algorithms can be obtained by imposing an order of horizontal evaluation globally for all automata in the class. Depending on the choice of the order, we obtain different classes of automata, each of which has the same expressiveness as CMso.Comment: In Proceedings GandALF 2014, arXiv:1408.556

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

Directory of Open Access Journals

Schema-Guided Induction of Monadic Queries

Author: Champavère Jérôme
Gilleron Rémi
Lemay Aurélien
Niehren Joachim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2008
Field of study

International audienceThe induction of monadic node selecting queries from partially annotated XML-trees is a key task in Web information extraction. We show how to integrate schema guidance into an RPNI-based learning algorithm, in which monadic queries are represented by pruning node selecting tree transducers. We present experimental results on schema guidance by the DTD of HTML

HAL - Lille 3

INRIA a CCSD electronic archive server