39,616 research outputs found
State Complexity of Regular Tree Languages for Tree Matching
We study the state complexity of regular tree languages for tree matching problem. Given a tree t and a set of pattern trees L, we can decide whether or not there exists a subtree occurrence of trees in L from the tree t by considering the new language L′ which accepts all trees containing trees in L as subtrees. We consider the case when we are given a set of pattern trees as a regular tree language and investigate the state complexity. Based on the sequential and parallel tree concatenation, we define three types of tree languages for deciding the existence of different types of subtree occurrences. We also study the deterministic top-down state complexity of path-closed languages for the same problem.</jats:p
Pattern matching in compilers
In this thesis we develop tools for effective and flexible pattern matching.
We introduce a new pattern matching system called amethyst. Amethyst is not
only a generator of parsers of programming languages, but can also serve as an
alternative to tools for matching regular expressions.
Our framework also produces dynamic parsers. Its intended use is in the
context of IDE (accurate syntax highlighting and error detection on the fly).
Amethyst offers pattern matching of general data structures. This makes it a
useful tool for implementing compiler optimizations such as constant folding,
instruction scheduling, and dataflow analysis in general.
The parsers produced are essentially top-down parsers. Linear time complexity
is obtained by introducing the novel notion of structured grammars and
regularized regular expressions. Amethyst uses techniques known from compiler
optimizations to produce effective parsers.Comment: master thesi
Finite Countermodel Based Verification for Program Transformation (A Case Study)
Both automatic program verification and program transformation are based on
program analysis. In the past decade a number of approaches using various
automatic general-purpose program transformation techniques (partial deduction,
specialization, supercompilation) for verification of unreachability properties
of computing systems were introduced and demonstrated. On the other hand, the
semantics based unfold-fold program transformation methods pose themselves
diverse kinds of reachability tasks and try to solve them, aiming at improving
the semantics tree of the program being transformed. That means some
general-purpose verification methods may be used for strengthening program
transformation techniques. This paper considers the question how finite
countermodels for safety verification method might be used in Turchin's
supercompilation method. We extract a number of supercompilation sub-algorithms
trying to solve reachability problems and demonstrate use of an external
countermodel finder for solving some of the problems.Comment: In Proceedings VPT 2015, arXiv:1512.0221
Deterministic Automata for Unordered Trees
Automata for unordered unranked trees are relevant for defining schemas and
queries for data trees in Json or Xml format. While the existing notions are
well-investigated concerning expressiveness, they all lack a proper notion of
determinism, which makes it difficult to distinguish subclasses of automata for
which problems such as inclusion, equivalence, and minimization can be solved
efficiently. In this paper, we propose and investigate different notions of
"horizontal determinism", starting from automata for unranked trees in which
the horizontal evaluation is performed by finite state automata. We show that a
restriction to confluent horizontal evaluation leads to polynomial-time
emptiness and universality, but still suffers from coNP-completeness of the
emptiness of binary intersections. Finally, efficient algorithms can be
obtained by imposing an order of horizontal evaluation globally for all
automata in the class. Depending on the choice of the order, we obtain
different classes of automata, each of which has the same expressiveness as
CMso.Comment: In Proceedings GandALF 2014, arXiv:1408.556
A Grammatical Inference Approach to Language-Based Anomaly Detection in XML
False-positives are a problem in anomaly-based intrusion detection systems.
To counter this issue, we discuss anomaly detection for the eXtensible Markup
Language (XML) in a language-theoretic view. We argue that many XML-based
attacks target the syntactic level, i.e. the tree structure or element content,
and syntax validation of XML documents reduces the attack surface. XML offers
so-called schemas for validation, but in real world, schemas are often
unavailable, ignored or too general. In this work-in-progress paper we describe
a grammatical inference approach to learn an automaton from example XML
documents for detecting documents with anomalous syntax.
We discuss properties and expressiveness of XML to understand limits of
learnability. Our contributions are an XML Schema compatible lexical datatype
system to abstract content in XML and an algorithm to learn visibly pushdown
automata (VPA) directly from a set of examples. The proposed algorithm does not
require the tree representation of XML, so it can process large documents or
streams. The resulting deterministic VPA then allows stream validation of
documents to recognize deviations in the underlying tree structure or
datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and
Countermeasures ECTCM 201
Four Lessons in Versatility or How Query Languages Adapt to the Web
Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”
Reasoning & Querying – State of the Art
Various query languages for Web and Semantic Web data, both for practical use and as an area of research in the scientific community, have emerged in recent years. At the same time, the broad adoption of the internet where keyword search is used in many applications, e.g. search engines, has familiarized casual users with using keyword queries to retrieve information on the internet. Unlike this easy-to-use querying, traditional query languages require knowledge of the language itself as well as of the data to be queried. Keyword-based query languages for XML and RDF bridge the gap between the two, aiming at enabling simple querying of semi-structured data, which is relevant e.g. in the context of the emerging Semantic Web. This article presents an overview of the field of keyword querying for XML and RDF
Regular Languages meet Prefix Sorting
Indexing strings via prefix (or suffix) sorting is, arguably, one of the most
successful algorithmic techniques developed in the last decades. Can indexing
be extended to languages? The main contribution of this paper is to initiate
the study of the sub-class of regular languages accepted by an automaton whose
states can be prefix-sorted. Starting from the recent notion of Wheeler graph
[Gagie et al., TCS 2017]-which extends naturally the concept of prefix sorting
to labeled graphs-we investigate the properties of Wheeler languages, that is,
regular languages admitting an accepting Wheeler finite automaton.
Interestingly, we characterize this family as the natural extension of regular
languages endowed with the co-lexicographic ordering: when sorted, the strings
belonging to a Wheeler language are partitioned into a finite number of
co-lexicographic intervals, each formed by elements from a single Myhill-Nerode
equivalence class. Moreover: (i) We show that every Wheeler NFA (WNFA) with
states admits an equivalent Wheeler DFA (WDFA) with at most
states that can be computed in time. This is in sharp contrast with
general NFAs. (ii) We describe a quadratic algorithm to prefix-sort a proper
superset of the WDFAs, a -time online algorithm to sort acyclic
WDFAs, and an optimal linear-time offline algorithm to sort general WDFAs. By
contribution (i), our algorithms can also be used to index any WNFA at the
moderate price of doubling the automaton's size. (iii) We provide a
minimization theorem that characterizes the smallest WDFA recognizing the same
language of any input WDFA. The corresponding constructive algorithm runs in
optimal linear time in the acyclic case, and in time in the
general case. (iv) We show how to compute the smallest WDFA equivalent to any
acyclic DFA in nearly-optimal time.Comment: added minimization theorems; uploaded submitted version; New version
with new results (W-MH theorem, linear determinization), added author:
Giovanna D'Agostin
- …