30 research outputs found

    Monadic Queries over Tree-Structured Data

    Get PDF
    Monadic query languages over trees currently receive considerable interest in the database community, as the problem of selecting nodes from a tree is the most basic and widespread database query problem in the context of XML. Partly a survey of recent work done by the authors and their group on logical query languages for this problem and their expressiveness, this paper provides a number of new results related to the complexity of such languages over so-called axis relations (such as "child" or "descendant") which are motivated by their presence in the XPath standard or by their utility for data extraction (wrapping)

    Compression vs Queryability - A Case Study

    Get PDF
    International audienceSome compromise on compression is known to be necessary, if the relative positions of the information stored by semi-structured documents are to remain accessible under queries. With this in view, we compare, on an example, the `query-friendliness' of XML documents, when compressed into straightline tree grammars which are either regular or context-free. The queries considered are in a limited fragment of XPath, corresponding to a type of patterns; each such query defines naturally a non-deterministic, bottom-up `query automaton' that runs just as well on a tree as on its compressed dag

    STRUCTURED DOCUMENT LOGIC

    Get PDF
    This paper describes some practical and theoretical foundations of Structured Document Logic (SDL), which is a logical methodology for analyzing properties of Web documents, like XML or HTML. SDL can make benefits in searching of HTML pages, or in defining filters for web documents. Both syntax and semantics of SDL are described, and an efficient evaluation algorithm is also introduced

    Learning n-ary Node Selecting Tree Transducers from Completely Annotated Examples

    Get PDF
    International audienceWe present the first algorithm for learning n-ary node selection queries in trees from completely annotated examples by methods of grammatical inference. We propose to represent n-ary queries by deterministic n-ary node selecting tree transducers (NSTTs), that are known to capture the class of MSO-definable n-ary queries. Despite of this highly expressive, we show that n-aryy queries, selecting a polynomially bounded number of tuples per tree, represented by deterministic NSTTs can be learned from polynomial time and data while allowing for efficient enumeration of query answers. An application to wrapper induction in Web information extraction yields encouraging results

    Model Checking Parse Trees

    Full text link
    Parse trees are fundamental syntactic structures in both computational linguistics and compilers construction. We argue in this paper that, in both fields, there are good incentives for model-checking sets of parse trees for some word according to a context-free grammar. We put forward the adequacy of propositional dynamic logic (PDL) on trees in these applications, and study as a sanity check the complexity of the corresponding model-checking problem: although complete for exponential time in the general case, we find natural restrictions on grammars for our applications and establish complexities ranging from nondeterministic polynomial time to polynomial space in the relevant cases.Comment: 21 + x page

    Logics for Unranked Trees: An Overview

    Get PDF
    Labeled unranked trees are used as a model of XML documents, and logical languages for them have been studied actively over the past several years. Such logics have different purposes: some are better suited for extracting data, some for expressing navigational properties, and some make it easy to relate complex properties of trees to the existence of tree automata for those properties. Furthermore, logics differ significantly in their model-checking properties, their automata models, and their behavior on ordered and unordered trees. In this paper we present a survey of logics for unranked trees

    Logic-based Web Information Extraction

    Get PDF

    Structural characterizations of the navigational expressiveness of relation algebras on a tree

    Full text link
    Given a document D in the form of an unordered node-labeled tree, we study the expressiveness on D of various basic fragments of XPath, the core navigational language on XML documents. Working from the perspective of these languages as fragments of Tarski's relation algebra, we give characterizations, in terms of the structure of D, for when a binary relation on its nodes is definable by an expression in these algebras. Since each pair of nodes in such a relation represents a unique path in D, our results therefore capture the sets of paths in D definable in each of the fragments. We refer to this perspective on language semantics as the "global view." In contrast with this global view, there is also a "local view" where one is interested in the nodes to which one can navigate starting from a particular node in the document. In this view, we characterize when a set of nodes in D can be defined as the result of applying an expression to a given node of D. All these definability results, both in the global and the local view, are obtained by using a robust two-step methodology, which consists of first characterizing when two nodes cannot be distinguished by an expression in the respective fragments of XPath, and then bootstrapping these characterizations to the desired results.Comment: 58 Page

    Logic, Languages, and Rules for Web Data Extraction and Reasoning over Data

    Get PDF
    This paper gives a short overview of specific logical approaches to data extraction, data management, and reasoning about data. In particular, we survey theoretical results and formalisms that have been obtained and used in the context of the Lixto Project at TU Wien, the DIADEM project at the University of Oxford, and the VADA project, which is currently being carried out jointly by the universities of Edinburgh, Manchester, and Oxford. We start with a formal approach to web data extraction rooted in monadic second order logic and monadic Datalog, which gave rise to the Lixto data extraction system. We then present some complexity results for monadic Datalog over trees and for XPath query evaluation. We further argue that for value creation and for ontological reasoning over data, we need existential quantifiers (or Skolem terms) in rule heads, and introduce the Datalog± family. We give an overview of important members of this family and discuss related complexity issues
    corecore