30,089 research outputs found

    Causality and explanations in databases

    Get PDF
    ABSTRACT With the surge in the availability of information, there is a great demand for tools that assist users in understanding their data. While today's exploration tools rely mostly on data visualization, users often want to go deeper and understand the underlying causes of a particular observation. This tutorial surveys research on causality and explanation for data-oriented applications. We will review and summarize the research thus far into causality and explanation in the database and AI communities, giving researchers a snapshot of the current state of the art on this topic, and propose a unified framework as well as directions for future research. We will cover both the theory of causality/explanation and some applications; we also discuss the connections with other topics in database research like provenance, deletion propagation, why-not queries, and OLAP techniques. MOTIVATION With the surge in the availability of information, there is great need for tools that help users understand data. There are several examples of systems that offer some kind of assistance for users to understand and explore datasets. Humans typically observe the data at a high level of abstraction, by aggregating or by visualizing it in a graph, but often they want to go deeper and understand the ultimate causes of their observations. Over the last few years there have been several efforts in the Database and AI communities to develop general techniques to model causes, or explanations for observations on the data, some of them enabled by Judea Pearl's seminal book on Causality 1 . Causality has been formalized both for AI applications and for database queries, and formal definitions of explanations have also been proposed both in the AI and the Database literature. Given the importance of developing general purpose tools to assist * Partially supported by NSF Awards IIS-0911036 and CCF-1349784. 1 All references are omitted and will appear in the tutorial due to space limitations. users in understanding data, it is likely that research in this space will continue, perhaps even intensify. Depth and Coverage. This 1.5-hour tutorial aims at establishing a research checkpoint: its goal is to review, summarize, and systematize the research so far into causality and explanation in databases, giving researchers a snapshot of the current state of the art on this topic, and at the same time propose a unified framework for future research. We will cover a wide range of work on causality and explanation from the database and AI communities, and we will discuss the connections with other topics in database research. Intended audience. The tutorial is aimed both at active researchers in databases, and at graduate students and young researchers seeking a new research topic. Practitioners from industry might find the tutorial useful as a preview of plausible future trends in data analysis tools. Assumed Background. Basic knowledge in databases will be sufficient to follow the tutorial. Some background in Datalog, provenance, and/or OLAP would be useful, but is not necessary. COVERED TOPICS Our tutorial is divided in three thematic sections. First, we discuss the notion of causality, its foundations in AI and philosophy, and its applications in the database field. Second, we discuss how the intuition of causality can be used to explain query results. Third, we relate these notions to several other topics of database research, including provenance, missing results, and view updates. Causality Understanding causality in a broad sense is of vital importance in many practical settings, e.g., in determining legal responsibility in multi-car accidents, in diagnosing malfunction of complex systems, or in scientific inquiry. The notion of causality and causation is a topic in philosophy, studied and argued over by philosophers over the centuries. On a high level, causality characterizes the relationship between an event and an outcome: the event is a cause if the outcome is a consequence of the event. The notion of counterfactual causes, which can be traced back to Hume (1748) and is analyzed later by Lewis (1973), explains causality in an intuitive way: if the first event (cause) had not occurred, then the second event (effect) would not have occurred. Several philosophers explored an alternative approach to counterfactuals that employs structural equations. Judea Pearl's landmark book on causality defined the state-of-theart formulation of this framework. Pearl's and Halper

    Query-Answer Causality in Databases: Abductive Diagnosis and View-Updates

    Full text link
    Causality has been recently introduced in databases, to model, characterize and possibly compute causes for query results (answers). Connections between query causality and consistency-based diagnosis and database repairs (wrt. integrity constrain violations) have been established in the literature. In this work we establish connections between query causality and abductive diagnosis and the view-update problem. The unveiled relationships allow us to obtain new complexity results for query causality -the main focus of our work- and also for the two other areas.Comment: To appear in Proc. UAI Causal Inference Workshop, 2015. One example was fixe

    From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back

    Get PDF
    In this work we establish and investigate connections between causes for query answers in databases, database repairs wrt. denial constraints, and consistency-based diagnosis. The first two are relatively new research areas in databases, and the third one is an established subject in knowledge representation. We show how to obtain database repairs from causes, and the other way around. Causality problems are formulated as diagnosis problems, and the diagnoses provide causes and their responsibilities. The vast body of research on database repairs can be applied to the newer problems of computing actual causes for query answers and their responsibilities. These connections, which are interesting per se, allow us, after a transition -inspired by consistency-based diagnosis- to computational problems on hitting sets and vertex covers in hypergraphs, to obtain several new algorithmic and complexity results for database causality.Comment: To appear in Theory of Computing Systems. By invitation to special issue with extended papers from ICDT 2015 (paper arXiv:1412.4311

    Causality and the semantics of provenance

    Full text link
    Provenance, or information about the sources, derivation, custody or history of data, has been studied recently in a number of contexts, including databases, scientific workflows and the Semantic Web. Many provenance mechanisms have been developed, motivated by informal notions such as influence, dependence, explanation and causality. However, there has been little study of whether these mechanisms formally satisfy appropriate policies or even how to formalize relevant motivating concepts such as causality. We contend that mathematical models of these concepts are needed to justify and compare provenance techniques. In this paper we review a theory of causality based on structural models that has been developed in artificial intelligence, and describe work in progress on a causal semantics for provenance graphs.Comment: Workshop submissio
    • …
    corecore