17 research outputs found

    EFQ: Why-Not Answer Polynomials in Action

    Get PDF
    International audienceOne important issue in modern database applications is supporting the user with efficient tools to debug and fix queries because such tasks are both time and skill demanding. One particular problem is known as Why-Not question and focusses on the reasons for missing tuples from query results. The EFQ platform demonstrated here has been designed in this context to efficiently leverage Why-Not Answers polynomials, a novel approach that provides the user with complete explanations to Why-Not questions and allows for automatic, relevant query refinements

    Explainable and Resource-Efficient Stream Processing Through Provenance and Scheduling

    Get PDF
    In our era of big data, information is captured at unprecedented volumes and velocities, with technologies such as Cyber-Physical Systems making quick decisions based on the processing of streaming, unbounded datasets. In such scenarios, it can be beneficial to process the data in an online manner, using the stream processing paradigm implemented by Stream Processing Engines (SPEs). While SPEs enable high-throughput, low-latency analysis, they are faced with challenges connected to evolving deployment scenarios, like the increasing use of heterogeneous, resource-constrained edge devices together with cloud resources and the increasing user expectations for usability, control, and resource-efficiency, on par with features provided by traditional databases.This thesis tackles open challenges regarding making stream processing more user-friendly, customizable, and resource-efficient. The first part outlines our work, providing high-level background information, descriptions of the research problems, and our contributions. The second part presents our three state-of-the-art frameworks for explainable data streaming using data provenance, which can help users of streaming queries to identify important data points, explain unexpected behaviors, and aid query understanding and debugging. (A) GeneaLog provides backward provenance allowing users to identify the inputs that contributed to the generation of each output of a streaming query. (B) Ananke is the first framework to provide a duplicate-free graph of live forward provenance, enabling easy bidirectional tracing of input-output relationships in streaming queries and identifying data points that have finished contributing to results. (C) Erebus is the first framework that allows users to define expectations about the results of a streaming query, validating whether these expectations are met or providing explanations in the form of why-not provenance otherwise. The third part presents techniques for execution efficiency through custom scheduling, introducing our state-of-the-art scheduling frameworks that control resource allocation and achieve user-defined performance goals. (D) Haren is an SPE-agnostic user-level scheduler that can efficiently enforce user-defined scheduling policies. (E) Lachesis is a standalone scheduling middleware that requires no changes to SPEs but, instead, directly guides the scheduling decisions of the underlying Operating System. Our extensive evaluations using real-world SPEs and workloads show that our work significantly improves over the state-of-the-art while introducing only small performance overheads

    ErklÀrung fehlender Ergebnisse bei der Verarbeitung hierarchischer Daten in Spark

    Get PDF
    Es existieren einige Algorithmen, die Entwicklern bei der Fehlersuche bei einer Datenbankanfrage helfen. Diese Arbeiten beantworten, wieso bestimmte Daten nicht in der Ergebnismenge fĂŒr eine Anfrage vorhanden sind oder bestimmte nicht erwartete Daten in der Ergebnismenge erscheinen (Why-not-Frage). FĂŒr Anfragesprachen, die hierarchische Daten unterstĂŒtzen, bestehen bisher aber nur wenige Arbeiten. In dieser Arbeit wird untersucht, welche Besonderheiten es fĂŒr Why-not-Fragen bei hierarchischen Daten gibt. Dazu wird betrachtet, welche besonderen Fragestellungen dafĂŒr möglich sind und wie diese geeignet beantwortet werden können. Dabei wird auch ein konkreter Algorithmus fĂŒr Python entworfen und implementiert. Anhand von diesem kann mit Hilfe eines Beispiels untersucht werden, ob der Algorithmus effizient und effektiv genug ist Why-not-Fragen zu beantworten

    Immutably Answering Why-Not Questions for Equivalent Conjunctive Queries

    No full text
    International audienceAnswering Why-Not questions consists in explaining to developers of complex data transformations or manipulations why their data transformation did not produce some specific results, although they expected them to do so. Different types of explanations that serve as Why-Not answers have been proposed in the past and are either based on the available data, the query tree, or both. Solutions (partially) based on the query tree are generally more efficient and easier to interpret by developers than solutions solely based on data. However, algorithms producing such query-based explanations so far may return different results for reordered conjunctive query trees, and even worse, these results may be incomplete. Clearly, this represents a significant usability problem, as the explanations developers get may be partial and developers have to worry about the query tree representation of their query, losing the advantage of using a declarative query language. As remedy to this problem, we propose the Ted algorithm that produces the same complete query-based explanations for reordered conjunctive query trees

    Immutably Answering Why-Not Questions for Equivalent Conjunctive Queries

    Get PDF
    National audienceDans le contexte de dĂ©veloppement de transformations complexes, les rĂ©ponses Ă  une question de type 'Why-Not' ont pour objectif d'expliquer au dĂ©veloppeur les raisons de l'absence de certaines rĂ©ponses dans le rĂ©sultat d'une transformation. Plusieurs types d'explications ont Ă©tĂ© proposĂ©es et Ă©tudiĂ©es : des explications basĂ©es sur les donnĂ©es, des explications basĂ©es sur l'arbre de la requĂȘte, des expli-cations hybrides. Les explications qui s'appuient sur l'arbre de la requĂȘte, appelĂ©es explications 'query-based' (query-based explanations) peuvent ĂȘtre calculĂ©es plus efficacement et sont aussi plus faciles Ă  interprĂ©ter par le dĂ©veloppeur. Cependant, les algorithmes connus produisant des explications 'query-based' donnent des rĂ©sultats (1) qui sont dĂ©pendants des arbres de requĂȘtes considĂ©rĂ©s, (2) qui ne sont pas toujours complets. À l'Ă©vidence, cela pose un problĂšme d'utilisation important, parce que le dĂ©veloppeur doit interprĂ©ter les explications en fonction d'un arbre de requĂȘte perdant ainsi le bĂ©nĂ©fice de l'utilisation d'un langage de requĂȘtes dĂ©claratif et savoir que ces explications sont insuffisantes pour expliquer l'absence de rĂ©ponse. 1 Cet article propose de remĂ©dier Ă  ce problĂšme avec un algorithme appelĂ© Ted, qui produit des explications 'query-based' complĂštes et Ă©quivalentes pour des ar-bres de requĂȘtes conjonctives rĂ©ordonnĂ©s.Answering Why-Not questions consists in explaining to developers of complex data transformations or manipulations why their data transformation did not pro-duce some specific results, although they expected them to do so. Different types of explanations that serve as Why-Not answers have been proposed in the past and are either based on the available data, the query tree, or both. Solutions (partially) based on the query tree are generally more efficient and easier to interpret by de-velopers than solutions solely based on data. However, algorithms producing such query-based explanations so far may return different results for reordered conjunc-tive query trees, and even worse, these results may be incomplete. Clearly, this represents a significant usability problem, as the explanations developers get may be partial and developers have to worry about the query tree representation of their query, losing the advantage of using a declarative query language. As remedy to this problem, we propose the Ted algorithm that produces the same complete query-based explanations for reordered conjunctive query trees

    Why-Query Support in Graph Databases

    Get PDF
    In the last few decades, database management systems became powerful tools for storing large amount of data and executing complex queries over them. In addition to extended functionality, novel types of databases appear like triple stores, distributed databases, etc. Graph databases implementing the property-graph model belong to this development branch and provide a new way for storing and processing data in the form of a graph with nodes representing some entities and edges describing connections between them. This consideration makes them suitable for keeping data without a rigid schema for use cases like social-network processing or data integration. In addition to a flexible storage, graph databases provide new querying possibilities in the form of path queries, detection of connected components, pattern matching, etc. However, the schema flexibility and graph queries come with additional costs. With limited knowledge about data and little experience in constructing the complex queries, users can create such ones, which deliver unexpected results. Forced to debug queries manually and overwhelmed by the amount of query constraints, users can get frustrated by using graph databases. What is really needed, is to improve usability of graph databases by providing debugging and explaining functionality for such situations. We have to assist users in the discovery of what were the reasons of unexpected results and what can be done in order to fix them. The unexpectedness of result sets can be expressed in terms of their size or content. In the first case, users have to solve the empty-answer, too-many-, or too-few-answers problems. In the second case, users care about the result content and miss some expected answers or wonder about presence of some unexpected ones. Considering the typical problems of receiving no or too many results by querying graph databases, in this thesis we focus on investigating the problems of the first group, whose solutions are usually represented by why-empty, why-so-few, and why-so-many queries. Our objective is to extend graph databases with debugging functionality in the form of why-queries for unexpected query results on the example of pattern matching queries, which are one of general graph-query types. We present a comprehensive analysis of existing debugging tools in the state-of-the-art research and identify their common properties. From them, we formulate the following features of why-queries, which we discuss in this thesis, namely: holistic support of different cardinality-based problems, explanation of unexpected results and query reformulation, comprehensive analysis of explanations, and non-intrusive user integration. To support different cardinality-based problems, we develop methods for explaining no, too few, and too many results. To cover different kinds of explanations, we present two types: subgraph- and modification-based explanations. The first type identifies the reasons of unexpectedness in terms of query subgraphs and delivers differential graphs as answers. The second one reformulates queries in such a way that they produce better results. Considering graph queries to be complex structures with multiple constraints, we investigate different ways of generating explanations starting from the most general one that considers only a query topology through coarse-grained rewriting up to fine-grained modification that allows fine changes of predicates and topology. To provide a comprehensive analysis of explanations, we propose to compare them on three levels including a syntactic description, a content, and a size of a result set. In order to deliver user-aware explanations, we discuss two models for non-intrusive user integration in the generation process. With the techniques proposed in this thesis, we are able to provide fundamentals for debugging of pattern-matching queries, which deliver no, too few, or too many results, in graph databases implementing the property-graph model
    corecore