347 research outputs found

    The complexity of acyclic conjunctive queries revisited

    Get PDF
    In this paper, we consider first-order logic over unary functions and study the complexity of the evaluation problem for conjunctive queries described by such kind of formulas. A natural notion of query acyclicity for this language is introduced and we study the complexity of a large number of variants or generalizations of acyclic query problems in that context (Boolean or not Boolean, with or without inequalities, comparisons, etc...). Our main results show that all those problems are \textit{fixed-parameter linear} i.e. they can be evaluated in time f(Q).db.Q(db)f(|Q|).|\textbf{db}|.|Q(\textbf{db})| where Q|Q| is the size of the query QQ, db|\textbf{db}| the database size, Q(db)|Q(\textbf{db})| is the size of the output and ff is some function whose value depends on the specific variant of the query problem (in some cases, ff is the identity function). Our results have two kinds of consequences. First, they can be easily translated in the relational (i.e., classical) setting. Previously known bounds for some query problems are improved and new tractable cases are then exhibited. Among others, as an immediate corollary, we improve a result of \~\cite{PapadimitriouY-99} by showing that any (relational) acyclic conjunctive query with inequalities can be evaluated in time f(Q).db.Q(db)f(|Q|).|\textbf{db}|.|Q(\textbf{db})|. A second consequence of our method is that it provides a very natural descriptive approach to the complexity of well-known algorithmic problems. A number of examples (such as acyclic subgraph problems, multidimensional matching, etc...) are considered for which new insights of their complexity are given.Comment: 30 page

    I/O efficient bisimulation partitioning on very large directed acyclic graphs

    Get PDF
    In this paper we introduce the first efficient external-memory algorithm to compute the bisimilarity equivalence classes of a directed acyclic graph (DAG). DAGs are commonly used to model data in a wide variety of practical applications, ranging from XML documents and data provenance models, to web taxonomies and scientific workflows. In the study of efficient reasoning over massive graphs, the notion of node bisimilarity plays a central role. For example, grouping together bisimilar nodes in an XML data set is the first step in many sophisticated approaches to building indexing data structures for efficient XPath query evaluation. To date, however, only internal-memory bisimulation algorithms have been investigated. As the size of real-world DAG data sets often exceeds available main memory, storage in external memory becomes necessary. Hence, there is a practical need for an efficient approach to computing bisimulation in external memory. Our general algorithm has a worst-case IO-complexity of O(Sort(|N| + |E|)), where |N| and |E| are the numbers of nodes and edges, resp., in the data graph and Sort(n) is the number of accesses to external memory needed to sort an input of size n. We also study specializations of this algorithm to common variations of bisimulation for tree-structured XML data sets. We empirically verify efficient performance of the algorithms on graphs and XML documents having billions of nodes and edges, and find that the algorithms can process such graphs efficiently even when very limited internal memory is available. The proposed algorithms are simple enough for practical implementation and use, and open the door for further study of external-memory bisimulation algorithms. To this end, the full open-source C++ implementation has been made freely available

    Pattern matching in compilers

    Get PDF
    In this thesis we develop tools for effective and flexible pattern matching. We introduce a new pattern matching system called amethyst. Amethyst is not only a generator of parsers of programming languages, but can also serve as an alternative to tools for matching regular expressions. Our framework also produces dynamic parsers. Its intended use is in the context of IDE (accurate syntax highlighting and error detection on the fly). Amethyst offers pattern matching of general data structures. This makes it a useful tool for implementing compiler optimizations such as constant folding, instruction scheduling, and dataflow analysis in general. The parsers produced are essentially top-down parsers. Linear time complexity is obtained by introducing the novel notion of structured grammars and regularized regular expressions. Amethyst uses techniques known from compiler optimizations to produce effective parsers.Comment: master thesi

    Stream Processing using Grammars and Regular Expressions

    Full text link
    In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot
    corecore