62 research outputs found

    Characterizing XML Twig Queries with Examples

    Get PDF
    International audienceTypically, a (Boolean) query is a finite formula that defines a possibly infinite set of database instances that satisfy it (positive examples), and implicitly, the set of instances that do not satisfy the query (negative examples). We investigate the following natural question: for a given class of queries, is it possible to characterize every query with a finite set of positive and negative examples that no other query is consistent with.We study this question for twig queries and XML databases. We show that while twig queries are characterizable, they generally require exponential sets of examples. Consequently, we focus on a practical subclass of anchored twig queries and show that not only are they characterizable but also with polynomially-sized sets of examples. This result is obtained with the use of generalization operations on twig queries, whose application to an anchored twig query yields a properly contained and minimally different query. Our results illustrate further interesting and strong connections between the structure and the semantics of anchored twig queries that the class of arbitrary twig queries does not enjoy. Finally, we show that the class of unions of twig queries is not characterizable

    : Méthodes d'Inférence Symbolique pour les Bases de Données

    Get PDF
    This dissertation is a summary of a line of research, that I wasactively involved in, on learning in databases from examples. Thisresearch focused on traditional as well as novel database models andlanguages for querying, transforming, and describing the schema of adatabase. In case of schemas our contributions involve proposing anoriginal languages for the emerging data models of Unordered XML andRDF. We have studied learning from examples of schemas for UnorderedXML, schemas for RDF, twig queries for XML, join queries forrelational databases, and XML transformations defined with a novelmodel of tree-to-word transducers.Investigating learnability of the proposed languages required us toexamine closely a number of their fundamental properties, often ofindependent interest, including normal forms, minimization,containment and equivalence, consistency of a set of examples, andfinite characterizability. Good understanding of these propertiesallowed us to devise learning algorithms that explore a possibly largesearch space with the help of a diligently designed set ofgeneralization operations in search of an appropriate solution.Learning (or inference) is a problem that has two parameters: theprecise class of languages we wish to infer and the type of input thatthe user can provide. We focused on the setting where the user inputconsists of positive examples i.e., elements that belong to the goallanguage, and negative examples i.e., elements that do not belong tothe goal language. In general using both negative and positiveexamples allows to learn richer classes of goal languages than usingpositive examples alone. However, using negative examples is oftendifficult because together with positive examples they may cause thesearch space to take a very complex shape and its exploration may turnout to be computationally challenging.Ce mĂ©moire est une courte prĂ©sentation d’une direction de recherche, Ă  laquelle j’ai activementparticipĂ©, sur l’apprentissage pour les bases de donnĂ©es Ă  partir d’exemples. Cette recherches’est concentrĂ©e sur les modĂšles et les langages, aussi bien traditionnels qu’émergents, pourl’interrogation, la transformation et la description du schĂ©ma d’une base de donnĂ©es. Concernantles schĂ©mas, nos contributions consistent en plusieurs langages de schĂ©mas pour les nouveaumodĂšles de bases de donnĂ©es que sont XML non-ordonnĂ© et RDF. Nous avons ainsi Ă©tudiĂ©l’apprentissage Ă  partir d’exemples des schĂ©mas pour XML non-ordonnĂ©, des schĂ©mas pour RDF,des requĂȘtes twig pour XML, les requĂȘtes de jointure pour bases de donnĂ©es relationnelles et lestransformations XML dĂ©finies par un nouveau modĂšle de transducteurs arbre-Ă -mot.Pour explorer si les langages proposĂ©s peuvent ĂȘtre appris, nous avons Ă©tĂ© obligĂ©s d’examinerde prĂšs un certain nombre de leurs propriĂ©tĂ©s fondamentales, souvent souvent intĂ©ressantespar elles-mĂȘmes, y compris les formes normales, la minimisation, l’inclusion et l’équivalence, lacohĂ©rence d’un ensemble d’exemples et la caractĂ©risation finie. Une bonne comprĂ©hension de cespropriĂ©tĂ©s nous a permis de concevoir des algorithmes d’apprentissage qui explorent un espace derecherche potentiellement trĂšs vaste grĂące Ă  un ensemble d’opĂ©rations de gĂ©nĂ©ralisation adaptĂ© Ă la recherche d’une solution appropriĂ©e.L’apprentissage (ou l’infĂ©rence) est un problĂšme Ă  deux paramĂštres : la classe prĂ©cise delangage que nous souhaitons infĂ©rer et le type d’informations que l’utilisateur peut fournir. Nousnous sommes placĂ©s dans le cas oĂč l’utilisateur fournit des exemples positifs, c’est-Ă -dire desĂ©lĂ©ments qui appartiennent au langage cible, ainsi que des exemples nĂ©gatifs, c’est-Ă -dire qui n’enfont pas partie. En gĂ©nĂ©ral l’utilisation Ă  la fois d’exemples positifs et nĂ©gatifs permet d’apprendredes classes de langages plus riches que l’utilisation uniquement d’exemples positifs. Toutefois,l’utilisation des exemples nĂ©gatifs est souvent difficile parce que les exemples positifs et nĂ©gatifspeuvent rendre la forme de l’espace de recherche trĂšs complexe, et par consĂ©quent, son explorationinfaisable

    Schemas for Unordered XML on a DIME

    Get PDF
    We investigate schema languages for unordered XML having no relative order among siblings. First, we propose unordered regular expressions (UREs), essentially regular expressions with unordered concatenation instead of standard concatenation, that define languages of unordered words to model the allowed content of a node (i.e., collections of the labels of children). However, unrestricted UREs are computationally too expensive as we show the intractability of two fundamental decision problems for UREs: membership of an unordered word to the language of a URE and containment of two UREs. Consequently, we propose a practical and tractable restriction of UREs, disjunctive interval multiplicity expressions (DIMEs). Next, we employ DIMEs to define languages of unordered trees and propose two schema languages: disjunctive interval multiplicity schema (DIMS), and its restriction, disjunction-free interval multiplicity schema (IMS). We study the complexity of the following static analysis problems: schema satisfiability, membership of a tree to the language of a schema, schema containment, as well as twig query satisfiability, implication, and containment in the presence of schema. Finally, we study the expressive power of the proposed schema languages and compare them with yardstick languages of unordered trees (FO, MSO, and Presburger constraints) and DTDs under commutative closure. Our results show that the proposed schema languages are capable of expressing many practical languages of unordered trees and enjoy desirable computational properties.Comment: Theory of Computing System

    Homomorphic Pattern Mining from a Single Large Data Tree

    Get PDF

    Accelerating data retrieval steps in XML documents

    Get PDF

    Learning Schemas for Unordered XML

    Get PDF
    We consider unordered XML, where the relative order among siblings is ignored, and we investigate the problem of learning schemas from examples given by the user. We focus on the schema formalisms proposed in [10]: disjunctive multiplicity schemas (DMS) and its restriction, disjunction-free multiplicity schemas (MS). A learning algorithm takes as input a set of XML documents which must satisfy the schema (i.e., positive examples) and a set of XML documents which must not satisfy the schema (i.e., negative examples), and returns a schema consistent with the examples. We investigate a learning framework inspired by Gold [18], where a learning algorithm should be sound i.e., always return a schema consistent with the examples given by the user, and complete i.e., able to produce every schema with a sufficiently rich set of examples. Additionally, the algorithm should be efficient i.e., polynomial in the size of the input. We prove that the DMS are learnable from positive examples only, but they are not learnable when we also allow negative examples. Moreover, we show that the MS are learnable in the presence of positive examples only, and also in the presence of both positive and negative examples. Furthermore, for the learnable cases, the proposed learning algorithms return minimal schemas consistent with the examples.Comment: Proceedings of the 14th International Symposium on Database Programming Languages (DBPL 2013), August 30, 2013, Riva del Garda, Trento, Ital

    Characterising Modal Formulas with Examples

    Full text link
    We study the existence of finite characterisations for modal formulas. A finite characterisation of a modal formula φ\varphi is a finite collection of positive and negative examples that distinguishes φ\varphi from every other, non-equivalent modal formula, where an example is a finite pointed Kripke structure. This definition can be restricted to specific frame classes and to fragments of the modal language: a modal fragment LL admits finite characterisations with respect to a frame class FF if every formula φ∈L\varphi\in L has a finite characterisation with respect to LL consting of examples that are based on frames in FF. Finite characterisations are useful for illustration, interactive specification, and debugging of formal specifications, and their existence is a precondition for exact learnability with membership queries. We show that the full modal language admits finite characterisations with respect to a frame class FF only when the modal logic of FF is locally tabular. We then study which modal fragments, freely generated by some set of connectives, admit finite characterisations. Our main result is that the positive modal language without the truth-constants ⊀\top and ⊄\bot admits finite characterisations w.r.t. the class of all frames. This result is essentially optimal: finite characterizability fails when the language is extended with the truth constant ⊄\bot or with all but very limited forms of negation.Comment: Expanded version of material from Raoul Koudijs's MSc thesis (2022

    Inference of Shape Graphs for Graph Databases

    Get PDF
    We investigate the problem of constructing a shape graph that describes the structure of a given graph database. We employ the framework of grammatical inference, where the objective is to find an inference algorithm that is both sound, i.e., always producing a schema that validates the input graph, and complete, i.e., able to produce any schema, within a given class of schemas, provided that a sufficiently informative input graph is presented. We identify a number of fundamental limitations that preclude feasible inference. We present inference algorithms based on natural approaches that allow to infer schemas that we argue to be of practical importance
    • 

    corecore