116 research outputs found

    Faster Query Answering in Probabilistic Databases using Read-Once Functions

    Full text link
    A boolean expression is in read-once form if each of its variables appears exactly once. When the variables denote independent events in a probability space, the probability of the event denoted by the whole expression in read-once form can be computed in polynomial time (whereas the general problem for arbitrary expressions is #P-complete). Known approaches to checking read-once property seem to require putting these expressions in disjunctive normal form. In this paper, we tell a better story for a large subclass of boolean event expressions: those that are generated by conjunctive queries without self-joins and on tuple-independent probabilistic databases. We first show that given a tuple-independent representation and the provenance graph of an SPJ query plan without self-joins, we can, without using the DNF of a result event expression, efficiently compute its co-occurrence graph. From this, the read-once form can already, if it exists, be computed efficiently using existing techniques. Our second and key contribution is a complete, efficient, and simple to implement algorithm for computing the read-once forms (whenever they exist) directly, using a new concept, that of co-table graph, which can be significantly smaller than the co-occurrence graph.Comment: Accepted in ICDT 201

    Provenance and Probabilities in Relational Databases: From Theory to Practice

    Get PDF
    International audienceWe review the basics of data provenance in relational databases. We describe different provenance formalisms, from Boolean provenance to provenance semirings and beyond, that can be used for a wide variety of purposes, to obtain additional information on the output of a query. We discuss representation systems for data provenance, circuits in particular, with a focus on practical implementation. Finally, we explain how provenance is practically used for probabilistic query evaluation in probabilistic databases

    Connecting Width and Structure in Knowledge Compilation

    Get PDF
    Several query evaluation tasks can be done via knowledge compilation: the query result is compiled as a lineage circuit from which the answer can be determined. For such tasks, it is important to leverage some width parameters of the circuit, such as bounded treewidth or pathwidth, to convert the circuit to structured classes, e.g., deterministic structured NNFs (d-SDNNFs) or OBDDs. In this work, we show how to connect the width of circuits to the size of their structured representation, through upper and lower bounds. For the upper bound, we show how bounded-treewidth circuits can be converted to a d-SDNNF, in time linear in the circuit size. Our bound, unlike existing results, is constructive and only singly exponential in the treewidth. We show a related lower bound on monotone DNF or CNF formulas, assuming a constant bound on the arity (size of clauses) and degree (number of occurrences of each variable). Specifically, any d-SDNNF (resp., SDNNF) for such a DNF (resp., CNF) must be of exponential size in its treewidth; and the same holds for pathwidth when compiling to OBDDs. Our lower bounds, in contrast with most previous work, apply to any formula of this class, not just a well-chosen family. Hence, for our language of DNF and CNF, pathwidth and treewidth respectively characterize the efficiency of compiling to OBDDs and (d-)SDNNFs, that is, compilation is singly exponential in the width parameter. We conclude by applying our lower bound results to the task of query evaluation

    Provenance à base de semi-anneaux pour les bases de données graphes

    Get PDF
    The growing amount of data collected by sensors or generated by human interaction has led to an increasing use of graph databases, an efficient model for representing intricate data.Techniques to keep track of the history of computations applied to the data inside classical relational database systems are also topical because of their application to enforce Data Protection Regulations (e.g., GDPR).Our research work mixes the two by considering a semiring-based provenance model for navigational queries over graph databases.We first present a comprehensive survey on semiring theory and their applications in different fields of computer sciences, geared towards their relevance for our context. From the richness of the literature, we notably obtain a lower bound for the complexity of the full provenance computation in our setting.In a second part, we focus on the model itself by introducing a toolkit of provenance-aware algorithms, each targeting specific properties of the semiring of use.We notably introduce a new method based on lattice theory permitting an efficient provenance computation for complex graph queries.We propose an open-source implementation of the above-mentioned algorithms, and we conduct an experimental study over real transportation networks of large size, witnessing the practical efficiency of our approach in practical scenarios.We finally consider how this framework is positioned compared to other provenance models such as the semiring-based Datalog provenance model.We make explicit how the methods we applied for graph databases can be extended to Datalog queries, and we show how they can be seen as an extension of the semi-naïve evaluation strategy.To leverage this fact, we extend the capabilities of Soufflé, a state-of-the-art Datalog solver, to design an efficient provenance-aware Datalog evaluator. Experimental results based on our open-source implementation entail the fact this approach stays competitive with dedicated graph solutions, despite being more general.In a final round, we discuss on some research ideas for improving the model, and state open questions raised by our work.L'augmentation du volume de données collectées par des capteurs et générées par des interactions humaines a mené à l'utilisation des bases de données orientées graphes en tant que modèle de représentation efficace pour les données complexes.Les techniques permettant de tracer les calculs qui ont été appliqués aux données au sein d'une base de données relationnelle classique sont sur le devant de la scène, notamment grâce à leur utilité pourfaire respecter les régulations sur les données privées telles que le RGPD en Union Européenne.Notre travail de recherche croise ces deux problématiques en s'intéressant à un modèle de provenance à base de semi-anneaux pour les requêtes navigationnelles.Nous commençons par présenter une étude approfondie de la théorie des semi-anneaux et de leurs applications au sein des sciences informatiques en se concentrant sur les résultats ayant un intérêt direct pour notre travail de recherche.La richesse de la littérature sur le domaine nous a notamment permis d'obtenir une borne inférieure sur la complexité de notre modèle.Dans une seconde partie, nous étudions le modèle en lui-même et introduisons un ensemble cohérent d'algorithmes permettant d'effectuer des calculs de provenance et adaptés aux propriétés des semi-anneaux utilisés.Nous introduisons notablement une nouvelle méthode basée sur la théorie des treillis permettant de calculer la provenance pour des requêtes complexes.Nous proposons une implémentation open-source de ces algorithmes et faisons une étude expérimentale sur de larges réseaux de transport issus de la vie réelle pour attester de l'efficacité pratique de notre approche.On s'intéresse finalement au positionnement de ce cadre de travail par rapport à d'autres modèles de provenance à base de semi-anneaux. Nous nous intéressons à Datalog en particulier.Nous démontrons que les méthodes que nous avons développées pour les bases de données orientées graphes peuvent se généraliser sur des requêtes Datalog. Nous montrons de plus qu'elles peuvent être vues comme des généralisations de la méthode semi-naïve.En se basant sur ce fait-là, nous étendons les capacités de Soufflé, un évaluateur Datalog appartenant à l'état de l'art, afin d'effectuer des calculs de provenance pour des requêtes Datalog.Les études expérimentales basées sur cette implémentation open-source confirment que cette approche reste compétitive avec les solutions spécifiques pour les graphes, mais tout en étant plus générale.Nous terminons par une discussion sur les améliorations possibles du modèle et énonçons les questions ouvertes qui ont été soulevées au cours de ce travail

    Building Rules on Top of Ontologies for the Semantic Web with Inductive Logic Programming

    Full text link
    Building rules on top of ontologies is the ultimate goal of the logical layer of the Semantic Web. To this aim an ad-hoc mark-up language for this layer is currently under discussion. It is intended to follow the tradition of hybrid knowledge representation and reasoning systems such as AL\mathcal{AL}-log that integrates the description logic ALC\mathcal{ALC} and the function-free Horn clausal language \textsc{Datalog}. In this paper we consider the problem of automating the acquisition of these rules for the Semantic Web. We propose a general framework for rule induction that adopts the methodological apparatus of Inductive Logic Programming and relies on the expressive and deductive power of AL\mathcal{AL}-log. The framework is valid whatever the scope of induction (description vs. prediction) is. Yet, for illustrative purposes, we also discuss an instantiation of the framework which aims at description and turns out to be useful in Ontology Refinement. Keywords: Inductive Logic Programming, Hybrid Knowledge Representation and Reasoning Systems, Ontologies, Semantic Web. Note: To appear in Theory and Practice of Logic Programming (TPLP)Comment: 30 pages, 6 figure

    A New Rational Algorithm for View Updating in Relational Databases

    Full text link
    The dynamics of belief and knowledge is one of the major components of any autonomous system that should be able to incorporate new pieces of information. In order to apply the rationality result of belief dynamics theory to various practical problems, it should be generalized in two respects: first it should allow a certain part of belief to be declared as immutable; and second, the belief state need not be deductively closed. Such a generalization of belief dynamics, referred to as base dynamics, is presented in this paper, along with the concept of a generalized revision algorithm for knowledge bases (Horn or Horn logic with stratified negation). We show that knowledge base dynamics has an interesting connection with kernel change via hitting set and abduction. In this paper, we show how techniques from disjunctive logic programming can be used for efficient (deductive) database updates. The key idea is to transform the given database together with the update request into a disjunctive (datalog) logic program and apply disjunctive techniques (such as minimal model reasoning) to solve the original update problem. The approach extends and integrates standard techniques for efficient query answering and integrity checking. The generation of a hitting set is carried out through a hyper tableaux calculus and magic set that is focused on the goal of minimality.Comment: arXiv admin note: substantial text overlap with arXiv:1301.515

    Approximate Lifted Inference with Probabilistic Databases

    Full text link
    This paper proposes a new approach for approximate evaluation of #P-hard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerate only the minimal necessary plans among all possible plans. Importantly, this algorithm is a strict generalization of all known results of PTIME self-join-free conjunctive queries: A query is safe if and only if our algorithm returns one single plan. We also apply three relational query optimization techniques to evaluate all minimal safe plans very fast. We give a detailed experimental evaluation of our approach and, in the process, provide a new way of thinking about the value of probabilistic methods over non-probabilistic methods for ranking query answers.Comment: 12 pages, 5 figures, pre-print for a paper appearing in VLDB 2015. arXiv admin note: text overlap with arXiv:1310.625

    Ontology population for open-source intelligence: A GATE-based solution

    Get PDF
    Open-Source INTelligence is intelligence based on publicly available sources such as news sites, blogs, forums, etc. The Web is the primary source of information, but once data are crawled, they need to be interpreted and structured. Ontologies may play a crucial role in this process, but because of the vast amount of documents available, automatic mechanisms for their population are needed, starting from the crawled text. This paper presents an approach for the automatic population of predefined ontologies with data extracted from text and discusses the design and realization of a pipeline based on the General Architecture for Text Engineering system, which is interesting for both researchers and practitioners in the field. Some experimental results that are encouraging in terms of extracted correct instances of the ontology are also reported. Furthermore, the paper also describes an alternative approach and provides additional experiments for one of the phases of our pipeline, which requires the use of predefined dictionaries for relevant entities. Through such a variant, the manual workload required in this phase was reduced, still obtaining promising results
    • …
    corecore