176 research outputs found
Causality-Guided Adaptive Interventional Debugging
Runtime nondeterminism is a fact of life in modern database applications.
Previous research has shown that nondeterminism can cause applications to
intermittently crash, become unresponsive, or experience data corruption. We
propose Adaptive Interventional Debugging (AID) for debugging such intermittent
failures. AID combines existing statistical debugging, causal analysis, fault
injection, and group testing techniques in a novel way to (1) pinpoint the root
cause of an application's intermittent failure and (2) generate an explanation
of how the root cause triggers the failure. AID works by first identifying a
set of runtime behaviors (called predicates) that are strongly correlated to
the failure. It then utilizes temporal properties of the predicates to
(over)-approximate their causal relationships. Finally, it uses fault injection
to execute a sequence of interventions on the predicates and discover their
true causal relationships. This enables AID to identify the true root cause and
its causal relationship to the failure. We theoretically analyze how fast AID
can converge to the identification. We evaluate AID with six real-world
applications that intermittently fail under specific inputs. In each case, AID
was able to identify the root cause and explain how the root cause triggered
the failure, much faster than group testing and more precisely than statistical
debugging. We also evaluate AID with many synthetically generated applications
with known root causes and confirm that the benefits also hold for them.Comment: Technical report of AID (SIGMOD 2020
Recommended from our members
Enhancing Usability and Explainability of Data Systems
The recent growth of data science expanded its reach to an ever-growing user base of nonexperts, increasing the need for usability, understandability, and explainability in these systems. Enhancing usability makes data systems accessible to people with different skills and backgrounds alike, leading to democratization of data systems. Furthermore, proper understanding of data and data-driven systems is necessary for the users to trust the function of the systems that learn from data. Finally, data systems should be transparent: when a data system behaves unexpectedly or malfunctions, the users deserve proper explanation of what caused the observed incident. Unfortunately, most existing data systems offer limited usability and support for explanations: these systems are usable only by experts with sound technical skills, and even expert users are hindered by the lack of transparency into the systems\u27 inner workings and functions. The aim of my thesis is to bridge the usability gap between nonexpert users and complex data systems, aid all sort of users, including the expert ones, in data and system understanding, and provide explanations that help reason about unexpected outcomes involving data systems. Specifically, my thesis has the following three goals: (1) enhancing usability of data systems for nonexperts, (2) enable data understanding that can assist users in a variety of tasks such as achieving trust in data-driven machine learning, gaining data understanding, and data cleaning, and (3) explaining causes of unexpected outcomes involving data and data systems.
For enhancing usability, we focus on example-driven user intent discovery. We develop systems based on example-driven interactions in two different settings: querying relational databases and personalized document summarization. Towards data understanding, we develop a new data-profiling primitive that can characterize tuples for which a machine-learned model is likely to produce untrustworthy predictions. We also develop an explanation framework to explain causes of such untrustworthy predictions. Additionally, this new data-profiling primitive enables interactive data cleaning. Finally, we develop two explanation frameworks, tailored to provide explanations in debugging data system components, including the data itself. The explanation frameworks focus on explaining the root cause of a concurrent application\u27s intermittent failure and exposing issues in the data that cause a data-driven system to malfunction
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach
While code generation has been widely used in various software development
scenarios, the quality of the generated code is not guaranteed. This has been a
particular concern in the era of large language models (LLMs)- based code
generation, where LLMs, deemed a complex and powerful black-box model, is
instructed by a high-level natural language specification, namely a prompt, to
generate code. Nevertheless, effectively evaluating and explaining the code
generation capability of LLMs is inherently challenging, given the complexity
of LLMs and the lack of transparency.
Inspired by the recent progress in causality analysis and its application in
software engineering, this paper launches a causality analysis-based approach
to systematically analyze the causal relations between the LLM input prompts
and the generated code. To handle various technical challenges in this study,
we first propose a novel causal graph-based representation of the prompt and
the generated code, which is established over the fine-grained,
human-understandable concepts in the input prompts. The formed causal graph is
then used to identify the causal relations between the prompt and the derived
code. We illustrate the insights that our framework can provide by studying
over 3 popular LLMs with over 12 prompt adjustment strategies. The results of
these studies illustrate the potential of our technique to provide insights
into LLM effectiveness, and aid end-users in understanding predictions.
Additionally, we demonstrate that our approach provides actionable insights to
improve the quality of the LLM-generated code by properly calibrating the
prompt
DataPrism: Exposing Disconnect between Data and Systems
peer reviewedAs data is a central component of many modern systems, the cause of a system malfunction may reside in the data, and, specifically, particular properties of data. E.g., a health-monitoring system that is designed under the assumption that weight is reported in lbs will malfunction when encountering weight reported in kilograms. Like software debugging, which aims to find bugs in the source code or runtime conditions, our goal is to debug data to identify potential sources of disconnect between the assumptions about some data and systems that operate on that data. We propose DataPrism, a framework to identify data properties (profiles) that are the root causes of performance degradation or failure of a data-driven system. Such identification is necessary to repair data and resolve the disconnect between data and systems. Our technique is based on causal reasoning through interventions: when a system malfunctions for a dataset, DataPrism alters the data profiles and observes changes in the system's behavior due to the alteration. Unlike statistical observational analysis that reports mere correlations, DataPrism reports causally verified root causes-in terms of data profiles-of the system malfunction. We empirically evaluate DataPrism on seven real-world and several synthetic data-driven systems that fail on certain datasets due to a diverse set of reasons. In all cases, DataPrism identifies the root causes precisely while requiring orders of magnitude fewer interventions than prior techniques
Causal models for decision making via integrative inference
Understanding causes and effects is important in many parts of life, especially when decisions have to be made. The systematic inference of causal models remains a challenge though. In this thesis, we study (1) "approximative" and "integrative" inference of causal models and (2) causal models as a basis for decision making in complex systems. By "integrative" here we mean including and combining settings and knowledge beyond the outcome of perfect randomization or pure observation for causal inference, while "approximative" means that the causal model is only constrained but not uniquely identified. As a basis for the study of topics (1) and (2), which are closely related, we first introduce causal models, discuss the meaning of causation and embed the notion of causation into a broader context of other fundamental concepts.
Then we begin our main investigation with a focus on topic (1): we consider the problem of causal inference from a non-experimental multivariate time series X, that is, we integrate temporal knowledge. We take the following approach: We assume that X together with some potential hidden common cause - "confounder" - Z forms a first order vector autoregressive (VAR) process with structural transition matrix A. Then we examine under which conditions the most important parts of A are identifiable or approximately identifiable from only X, in spite of the effects of Z. Essentially, sufficient conditions are (a) non-Gaussian, independent noise or (b) no influence from X to Z. We present two estimation algorithms that are tailored towards conditions (a) and (b), respectively, and evaluate them on synthetic and real-world data. We discuss how to check the model using X.
Still focusing on topic (1) but already including elements of topic (2), we consider the problem of approximate inference of the causal effect of a variable X on a variable Y in i.i.d. settings "between" randomized experiments and observational studies. Our approach is to first derive approximations (upper/lower bounds) on the causal effect, in dependence on bounds on (hidden) confounding. Then we discuss several scenarios where knowledge or beliefs can be integrated that in fact imply bounds on confounding. One example is about decision making in advertisement, where knowledge on partial compliance with guidelines can be integrated.
Then, concentrating on topic (2), we study decision making problems that arise in cloud computing, a computing paradigm and business model that involves complex technical and economical systems and interactions. More specifically, we consider the following two problems: debugging and control of computing systems with the help of sandbox experiments, and prediction of the cost of "spot" resources for decision making of cloud clients. We first establish two theoretical results on approximate counterfactuals and approximate integration of causal knowledge, which we then apply to the two problems in toy scenarios
Enabling Runtime Verification of Causal Discovery Algorithms with Automated Conditional Independence Reasoning (Extended Version)
Causal discovery is a powerful technique for identifying causal relationships
among variables in data. It has been widely used in various applications in
software engineering. Causal discovery extensively involves conditional
independence (CI) tests. Hence, its output quality highly depends on the
performance of CI tests, which can often be unreliable in practice. Moreover,
privacy concerns arise when excessive CI tests are performed.
Despite the distinct nature between unreliable and excessive CI tests, this
paper identifies a unified and principled approach to addressing both of them.
Generally, CI statements, the outputs of CI tests, adhere to Pearl's axioms,
which are a set of well-established integrity constraints on conditional
independence. Hence, we can either detect erroneous CI statements if they
violate Pearl's axioms or prune excessive CI statements if they are logically
entailed by Pearl's axioms. Holistically, both problems boil down to reasoning
about the consistency of CI statements under Pearl's axioms (referred to as CIR
problem).
We propose a runtime verification tool called CICheck, designed to harden
causal discovery algorithms from reliability and privacy perspectives. CICheck
employs a sound and decidable encoding scheme that translates CIR into SMT
problems. To solve the CIR problem efficiently, CICheck introduces a four-stage
decision procedure with three lightweight optimizations that actively prove or
refute consistency, and only resort to costly SMT-based reasoning when
necessary. Based on the decision procedure to CIR, CICheck includes two
variants: ED-CICheck and ED-CICheck, which detect erroneous CI tests (to
enhance reliability) and prune excessive CI tests (to enhance privacy),
respectively. [abridged due to length limit
- …