Search CORE

Operational State Complexity of Deterministic Unranked Tree Automata

Author: Giovanni Pighizzini
Ian McQuillan
Kai Salomaa
Xiaoxue Piao
Publication venue: 'Open Publishing Association'
Publication date: 01/08/2010
Field of study

We consider the state complexity of basic operations on tree languages recognized by deterministic unranked tree automata. For the operations of union and intersection the upper and lower bounds of both weakly and strongly deterministic tree automata are obtained. For tree concatenation we establish a tight upper bound that is of a different order than the known state complexity of concatenation of regular string languages. We show that (n+1) ( (m+1)2^n-2^(n-1) )-1 vertical states are sufficient, and necessary in the worst case, to recognize the concatenation of tree languages recognized by (strongly or weakly) deterministic automata with, respectively, m and n vertical states.Comment: In Proceedings DCFS 2010, arXiv:1008.127

Pattern matching in compilers

Author: Bílka Ondřej
Publication venue
Publication date: 01/01/2012
Field of study

In this thesis we develop tools for effective and flexible pattern matching. We introduce a new pattern matching system called amethyst. Amethyst is not only a generator of parsers of programming languages, but can also serve as an alternative to tools for matching regular expressions. Our framework also produces dynamic parsers. Its intended use is in the context of IDE (accurate syntax highlighting and error detection on the fly). Amethyst offers pattern matching of general data structures. This makes it a useful tool for implementing compiler optimizations such as constant folding, instruction scheduling, and dataflow analysis in general. The parsers produced are essentially top-down parsers. Linear time complexity is obtained by introducing the novel notion of structured grammars and regularized regular expressions. Amethyst uses techniques known from compiler optimizations to produce effective parsers.Comment: master thesi

National Repository of Grey Literature

CU Digital Repository

Finite Countermodel Based Verification for Program Transformation (A Case Study)

Author: Lisitsa Alexei P.
Nemytykh Andrei P.
Publication venue: 'Open Publishing Association'
Publication date: 01/12/2015
Field of study

Both automatic program verification and program transformation are based on program analysis. In the past decade a number of approaches using various automatic general-purpose program transformation techniques (partial deduction, specialization, supercompilation) for verification of unreachability properties of computing systems were introduced and demonstrated. On the other hand, the semantics based unfold-fold program transformation methods pose themselves diverse kinds of reachability tasks and try to solve them, aiming at improving the semantics tree of the program being transformed. That means some general-purpose verification methods may be used for strengthening program transformation techniques. This paper considers the question how finite countermodels for safety verification method might be used in Turchin's supercompilation method. We extract a number of supercompilation sub-algorithms trying to solve reachability problems and demonstrate use of an external countermodel finder for solving some of the problems.Comment: In Proceedings VPT 2015, arXiv:1512.0221

Query Containment for Highly Expressive Datalog Fragments

Author: Bourhis Pierre
Krötzsch Markus
Rudolph Sebastian
Publication venue
Publication date: 30/06/2014
Field of study

The containment problem of Datalog queries is well known to be undecidable. There are, however, several Datalog fragments for which containment is known to be decidable, most notably monadic Datalog and several "regular" query languages on graphs. Monadically Defined Queries (MQs) have been introduced recently as a joint generalization of these query languages. In this paper, we study a wide range of Datalog fragments with decidable query containment and determine exact complexity results for this problem. We generalize MQs to (Frontier-)Guarded Queries (GQs), and show that the containment problem is 3ExpTime-complete in either case, even if we allow arbitrary Datalog in the sub-query. If we focus on graph query languages, i.e., fragments of linear Datalog, then this complexity is reduced to 2ExpSpace. We also consider nested queries, which gain further expressivity by using predicates that are defined by inner queries. We show that nesting leads to an exponentially increasing hierarchy for the complexity of query containment, both in the linear and in the general case. Our results settle open problems for (nested) MQs, and they paint a comprehensive picture of the state of the art in Datalog query containment.Comment: 20 page

INRIA a CCSD electronic archive server

HAL - Lille 3

A Grammatical Inference Approach to Language-Based Anomaly Detection in XML

Author: Lampesberger Harald
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and Countermeasures ECTCM 201

CiteSeerX

Dynamic Complexity of Formal Languages

Author: Gelade Wouter
Marquardt Marcel
Schwentick Thomas
Publication venue
Publication date: 10/12/2008
Field of study

The paper investigates the power of the dynamic complexity classes DynFO, DynQF and DynPROP over string languages. The latter two classes contain problems that can be maintained using quantifier-free first-order updates, with and without auxiliary functions, respectively. It is shown that the languages maintainable in DynPROP exactly are the regular languages, even when allowing arbitrary precomputation. This enables lower bounds for DynPROP and separates DynPROP from DynQF and DynFO. Further, it is shown that any context-free language can be maintained in DynFO and a number of specific context-free languages, for example all Dyck-languages, are maintainable in DynQF. Furthermore, the dynamic complexity of regular tree languages is investigated and some results concerning arbitrary structures are obtained: there exist first-order definable properties which are not maintainable in DynPROP. On the other hand any existential first-order property can be maintained in DynQF when allowing precomputation.Comment: Contains the material presenten at STACS 2009, extendes with proofs and examples which were omitted due lack of spac

Dagstuhl Research Online Publication Server

CiteSeerX

Deterministic Automata for Unordered Trees

Author: Boiret Adrien
Hugot Vincent
Niehren Joachim
Treinen Ralf
Publication venue: 'Open Publishing Association'
Publication date: 01/08/2014
Field of study

Automata for unordered unranked trees are relevant for defining schemas and queries for data trees in Json or Xml format. While the existing notions are well-investigated concerning expressiveness, they all lack a proper notion of determinism, which makes it difficult to distinguish subclasses of automata for which problems such as inclusion, equivalence, and minimization can be solved efficiently. In this paper, we propose and investigate different notions of "horizontal determinism", starting from automata for unranked trees in which the horizontal evaluation is performed by finite state automata. We show that a restriction to confluent horizontal evaluation leads to polynomial-time emptiness and universality, but still suffers from coNP-completeness of the emptiness of binary intersections. Finally, efficient algorithms can be obtained by imposing an order of horizontal evaluation globally for all automata in the class. Depending on the choice of the order, we obtain different classes of automata, each of which has the same expressiveness as CMso.Comment: In Proceedings GandALF 2014, arXiv:1408.556

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Rennes 1

Generalizing input-driven languages: theoretical and practical benefits

Author: Mandrioli Dino
Pradella Matteo
Publication venue
Publication date: 02/05/2017
Field of study

Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks to their simplicity they enjoy various nice algebraic and logic properties that have been successfully exploited in many application fields. Practically all of their related problems are decidable, so that they support automatic verification algorithms. Also, they can be recognized in real-time. Context-free languages (CFL) are another major family well-suited to formalize programming, natural, and many other classes of languages; their increased generative power w.r.t. RL, however, causes the loss of several closure properties and of the decidability of important problems; furthermore they need complex parsing algorithms. Thus, various subclasses thereof have been defined with different goals, spanning from efficient, deterministic parsing to closure properties, logic characterization and automatic verification techniques. Among CFL subclasses, so-called structured ones, i.e., those where the typical tree-structure is visible in the sentences, exhibit many of the algebraic and logic properties of RL, whereas deterministic CFL have been thoroughly exploited in compiler construction and other application fields. After surveying and comparing the main properties of those various language families, we go back to operator precedence languages (OPL), an old family through which R. Floyd pioneered deterministic parsing, and we show that they offer unexpected properties in two fields so far investigated in totally independent ways: they enable parsing parallelization in a more effective way than traditional sequential parsers, and exhibit the same algebraic and logic properties so far obtained only for less expressive language families

Archivio istituzionale della ricerca - Politecnico di Milano

Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching

Author: Berglund Martin
Drewes Frank
van der Merwe Brink
Publication venue: 'Open Publishing Association'
Publication date: 01/05/2014
Field of study

We develop a formal perspective on how regular expression matching works in Java, a popular representative of the category of regex-directed matching engines. In particular, we define an automata model which captures all the aspects needed to study such matching engines in a formal way. Based on this, we propose two types of static analysis, which take a regular expression and tell whether there exists a family of strings which makes Java-style matching run in exponential time.Comment: In Proceedings AFL 2014, arXiv:1405.527