878 research outputs found
A Grammatical Inference Approach to Language-Based Anomaly Detection in XML
False-positives are a problem in anomaly-based intrusion detection systems.
To counter this issue, we discuss anomaly detection for the eXtensible Markup
Language (XML) in a language-theoretic view. We argue that many XML-based
attacks target the syntactic level, i.e. the tree structure or element content,
and syntax validation of XML documents reduces the attack surface. XML offers
so-called schemas for validation, but in real world, schemas are often
unavailable, ignored or too general. In this work-in-progress paper we describe
a grammatical inference approach to learn an automaton from example XML
documents for detecting documents with anomalous syntax.
We discuss properties and expressiveness of XML to understand limits of
learnability. Our contributions are an XML Schema compatible lexical datatype
system to abstract content in XML and an algorithm to learn visibly pushdown
automata (VPA) directly from a set of examples. The proposed algorithm does not
require the tree representation of XML, so it can process large documents or
streams. The resulting deterministic VPA then allows stream validation of
documents to recognize deviations in the underlying tree structure or
datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and
Countermeasures ECTCM 201
A Direct Translation from XPath to Nondeterministic Automata
Abstract. Since navigational aspects of XPath correspond to first-order definability, it has been proposed to use the analogy with the very successful technique of translating LTL into automata, and produce efficient translations of XPath queries into automata on unranked trees. These translations can then be used for a variety of reasoning tasks such as XPath consistency, or optimization, under XML schema constraints. In the verification scenarios, translations into both nondeterministic and alternating automata are used. But while a direct translation from XPath into alternating automata is known, only an indirect translation into nondeterministic automata- going via intermediate logics- exists. A direct translation is desirable as most XML specifications have particularly nice translations into nondeterministic automata and it is natural to use such automata to reason about XPath and schemas. The goal of the paper is to produce such a direct translation of XPath into nondeterministic automata.
Reasoning about XML with temporal logics and automata
We show that problems arising in static analysis of XML specifications and transformations can be dealt with using techniques similar to those developed for static analysis of programs. Many properties of interest in the XML context are related to navigation, and can be formulated in temporal logics for trees. We choose a logic that admits a simple single-exponential translation into unranked tree automata, in the spirit of the classical LTL-to-BĂŒchi automata translation. Automata arising from this translation have a number of additional properties; in particular, they are convenient for reasoning about unary node-selecting queries, which are important in the XML context. We give two applications of such reasoning: one deals with a classical XML problem of reasoning about navigation in the presence of schemas, and the other relates to verifying security properties of XML views
Global Numerical Constraints on Trees
We introduce a logical foundation to reason on tree structures with
constraints on the number of node occurrences. Related formalisms are limited
to express occurrence constraints on particular tree regions, as for instance
the children of a given node. By contrast, the logic introduced in the present
work can concisely express numerical bounds on any region, descendants or
ancestors for instance. We prove that the logic is decidable in single
exponential time even if the numerical constraints are in binary form. We also
illustrate the usage of the logic in the description of numerical constraints
on multi-directional path queries on XML documents. Furthermore, numerical
restrictions on regular languages (XML schemas) can also be concisely described
by the logic. This implies a characterization of decidable counting extensions
of XPath queries and XML schemas. Moreover, as the logic is closed under
negation, it can thus be used as an optimal reasoning framework for testing
emptiness, containment and equivalence
Rewrite based Verification of XML Updates
We consider problems of access control for update of XML documents. In the
context of XML programming, types can be viewed as hedge automata, and static
type checking amounts to verify that a program always converts valid source
documents into also valid output documents. Given a set of update operations we
are particularly interested by checking safety properties such as preservation
of document types along any sequence of updates. We are also interested by the
related policy consistency problem, that is detecting whether a sequence of
authorized operations can simulate a forbidden one. We reduce these questions
to type checking problems, solved by computing variants of hedge automata
characterizing the set of ancestors and descendants of the initial document
type for the closure of parameterized rewrite rules
Transformations Between Different Types of Unranked Bottom-Up Tree Automata
We consider the representational state complexity of unranked tree automata.
The bottom-up computation of an unranked tree automaton may be either
deterministic or nondeterministic, and further variants arise depending on
whether the horizontal string languages defining the transitions are
represented by a DFA or an NFA. Also, we consider for unranked tree automata
the alternative syntactic definition of determinism introduced by Cristau et
al. (FCT'05, Lect. Notes Comput. Sci. 3623, pp. 68-79).
We establish upper and lower bounds for the state complexity of conversions
between different types of unranked tree automata.Comment: In Proceedings DCFS 2010, arXiv:1008.127
- âŠ