Search CORE

17 research outputs found

EFQ: Why-Not Answer Polynomials in Action

Author: Bidoit Nicole
Herschel Melanie
Tzompanaki Katerina
Publication venue: HAL CCSD
Publication date: 31/08/2015
Field of study

International audienceOne important issue in modern database applications is supporting the user with efficient tools to debug and fix queries because such tasks are both time and skill demanding. One particular problem is known as Why-Not question and focusses on the reasons for missing tuples from query results. The EFQ platform demonstrated here has been designed in this context to efficiently leverage Why-Not Answers polynomials, a novel approach that provides the user with complete explanations to Why-Not questions and allows for automatic, relevant query refinements

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Explainable and Resource-Efficient Stream Processing Through Provenance and Scheduling

Author: Palyvos-Giannas Dimitrios
Publication venue
Publication date: 01/01/2022
Field of study

In our era of big data, information is captured at unprecedented volumes and velocities, with technologies such as Cyber-Physical Systems making quick decisions based on the processing of streaming, unbounded datasets. In such scenarios, it can be beneficial to process the data in an online manner, using the stream processing paradigm implemented by Stream Processing Engines (SPEs). While SPEs enable high-throughput, low-latency analysis, they are faced with challenges connected to evolving deployment scenarios, like the increasing use of heterogeneous, resource-constrained edge devices together with cloud resources and the increasing user expectations for usability, control, and resource-efficiency, on par with features provided by traditional databases.This thesis tackles open challenges regarding making stream processing more user-friendly, customizable, and resource-efficient. The first part outlines our work, providing high-level background information, descriptions of the research problems, and our contributions. The second part presents our three state-of-the-art frameworks for explainable data streaming using data provenance, which can help users of streaming queries to identify important data points, explain unexpected behaviors, and aid query understanding and debugging. (A) GeneaLog provides backward provenance allowing users to identify the inputs that contributed to the generation of each output of a streaming query. (B) Ananke is the first framework to provide a duplicate-free graph of live forward provenance, enabling easy bidirectional tracing of input-output relationships in streaming queries and identifying data points that have finished contributing to results. (C) Erebus is the first framework that allows users to define expectations about the results of a streaming query, validating whether these expectations are met or providing explanations in the form of why-not provenance otherwise. The third part presents techniques for execution efficiency through custom scheduling, introducing our state-of-the-art scheduling frameworks that control resource allocation and achieve user-defined performance goals. (D) Haren is an SPE-agnostic user-level scheduler that can efficiently enforce user-defined scheduling policies. (E) Lachesis is a standalone scheduling middleware that requires no changes to SPEs but, instead, directly guides the scheduling decisions of the underlying Operating System. Our extensive evaluations using real-world SPEs and workloads show that our work significantly improves over the state-of-the-art while introducing only small performance overheads

Chalmers Research

Erklärung fehlender Ergebnisse bei der Verarbeitung hierarchischer Daten in Spark

Author: Mayer Karsten
Publication venue
Publication date: 01/01/2016
Field of study

Es existieren einige Algorithmen, die Entwicklern bei der Fehlersuche bei einer Datenbankanfrage helfen. Diese Arbeiten beantworten, wieso bestimmte Daten nicht in der Ergebnismenge für eine Anfrage vorhanden sind oder bestimmte nicht erwartete Daten in der Ergebnismenge erscheinen (Why-not-Frage). Für Anfragesprachen, die hierarchische Daten unterstützen, bestehen bisher aber nur wenige Arbeiten. In dieser Arbeit wird untersucht, welche Besonderheiten es für Why-not-Fragen bei hierarchischen Daten gibt. Dazu wird betrachtet, welche besonderen Fragestellungen dafür möglich sind und wie diese geeignet beantwortet werden können. Dabei wird auch ein konkreter Algorithmus für Python entworfen und implementiert. Anhand von diesem kann mit Hilfe eines Beispiels untersucht werden, ob der Algorithmus effizient und effektiv genug ist Why-not-Fragen zu beantworten

Immutably Answering Why-Not Questions for Equivalent Conjunctive Queries

Author: Bidoit Nicole
Herschel Melanie
Tzompanaki Katerina
Publication venue: HAL CCSD
Publication date: 12/06/2014
Field of study

International audienceAnswering Why-Not questions consists in explaining to developers of complex data transformations or manipulations why their data transformation did not produce some specific results, although they expected them to do so. Different types of explanations that serve as Why-Not answers have been proposed in the past and are either based on the available data, the query tree, or both. Solutions (partially) based on the query tree are generally more efficient and easier to interpret by developers than solutions solely based on data. However, algorithms producing such query-based explanations so far may return different results for reordered conjunctive query trees, and even worse, these results may be incomplete. Clearly, this represents a significant usability problem, as the explanations developers get may be partial and developers have to worry about the query tree representation of their query, losing the advantage of using a declarative query language. As remedy to this problem, we propose the Ted algorithm that produces the same complete query-based explanations for reordered conjunctive query trees

Hal-Diderot

Immutably Answering Why-Not Questions for Equivalent Conjunctive Queries

Author: Bidoit Nicole
Herschel Melanie
Tzompanaki Katerina
Publication venue: HAL CCSD
Publication date: 12/06/2014
Field of study

National audienceDans le contexte de développement de transformations complexes, les réponses à une question de type 'Why-Not' ont pour objectif d'expliquer au développeur les raisons de l'absence de certaines réponses dans le résultat d'une transformation. Plusieurs types d'explications ont été proposées et étudiées : des explications basées sur les données, des explications basées sur l'arbre de la requête, des expli-cations hybrides. Les explications qui s'appuient sur l'arbre de la requête, appelées explications 'query-based' (query-based explanations) peuvent être calculées plus efficacement et sont aussi plus faciles à interpréter par le développeur. Cependant, les algorithmes connus produisant des explications 'query-based' donnent des résultats (1) qui sont dépendants des arbres de requêtes considérés, (2) qui ne sont pas toujours complets. À l'évidence, cela pose un problème d'utilisation important, parce que le développeur doit interpréter les explications en fonction d'un arbre de requête perdant ainsi le bénéfice de l'utilisation d'un langage de requêtes déclaratif et savoir que ces explications sont insuffisantes pour expliquer l'absence de réponse. 1 Cet article propose de remédier à ce problème avec un algorithme appelé Ted, qui produit des explications 'query-based' complètes et équivalentes pour des ar-bres de requêtes conjonctives réordonnés.Answering Why-Not questions consists in explaining to developers of complex data transformations or manipulations why their data transformation did not pro-duce some specific results, although they expected them to do so. Different types of explanations that serve as Why-Not answers have been proposed in the past and are either based on the available data, the query tree, or both. Solutions (partially) based on the query tree are generally more efficient and easier to interpret by de-velopers than solutions solely based on data. However, algorithms producing such query-based explanations so far may return different results for reordered conjunc-tive query trees, and even worse, these results may be incomplete. Clearly, this represents a significant usability problem, as the explanations developers get may be partial and developers have to worry about the query tree representation of their query, losing the advantage of using a declarative query language. As remedy to this problem, we propose the Ted algorithm that produces the same complete query-based explanations for reordered conjunctive query trees

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Why-Query Support in Graph Databases

Author: Vasilyeva Elena
Publication venue
Publication date: 08/11/2016
Field of study

In the last few decades, database management systems became powerful tools for storing large amount of data and executing complex queries over them. In addition to extended functionality, novel types of databases appear like triple stores, distributed databases, etc. Graph databases implementing the property-graph model belong to this development branch and provide a new way for storing and processing data in the form of a graph with nodes representing some entities and edges describing connections between them. This consideration makes them suitable for keeping data without a rigid schema for use cases like social-network processing or data integration. In addition to a flexible storage, graph databases provide new querying possibilities in the form of path queries, detection of connected components, pattern matching, etc. However, the schema flexibility and graph queries come with additional costs. With limited knowledge about data and little experience in constructing the complex queries, users can create such ones, which deliver unexpected results. Forced to debug queries manually and overwhelmed by the amount of query constraints, users can get frustrated by using graph databases. What is really needed, is to improve usability of graph databases by providing debugging and explaining functionality for such situations. We have to assist users in the discovery of what were the reasons of unexpected results and what can be done in order to fix them. The unexpectedness of result sets can be expressed in terms of their size or content. In the first case, users have to solve the empty-answer, too-many-, or too-few-answers problems. In the second case, users care about the result content and miss some expected answers or wonder about presence of some unexpected ones. Considering the typical problems of receiving no or too many results by querying graph databases, in this thesis we focus on investigating the problems of the first group, whose solutions are usually represented by why-empty, why-so-few, and why-so-many queries. Our objective is to extend graph databases with debugging functionality in the form of why-queries for unexpected query results on the example of pattern matching queries, which are one of general graph-query types. We present a comprehensive analysis of existing debugging tools in the state-of-the-art research and identify their common properties. From them, we formulate the following features of why-queries, which we discuss in this thesis, namely: holistic support of different cardinality-based problems, explanation of unexpected results and query reformulation, comprehensive analysis of explanations, and non-intrusive user integration. To support different cardinality-based problems, we develop methods for explaining no, too few, and too many results. To cover different kinds of explanations, we present two types: subgraph- and modification-based explanations. The first type identifies the reasons of unexpectedness in terms of query subgraphs and delivers differential graphs as answers. The second one reformulates queries in such a way that they produce better results. Considering graph queries to be complex structures with multiple constraints, we investigate different ways of generating explanations starting from the most general one that considers only a query topology through coarse-grained rewriting up to fine-grained modification that allows fine changes of predicates and topology. To provide a comprehensive analysis of explanations, we propose to compare them on three levels including a syntactic description, a content, and a size of a result set. In order to deliver user-aware explanations, we discuss two models for non-intrusive user integration in the generation process. With the techniques proposed in this thesis, we are able to provide fundamentals for debugging of pattern-matching queries, which deliver no, too few, or too many results, in graph databases implementing the property-graph model

Technische Universität Dresden: Qucosa