18 research outputs found
Debugging Machine Learning Pipelines
Machine learning tasks entail the use of complex computational pipelines to
reach quantitative and qualitative conclusions. If some of the activities in a
pipeline produce erroneous or uninformative outputs, the pipeline may fail or
produce incorrect results. Inferring the root cause of failures and unexpected
behavior is challenging, usually requiring much human thought, and is both
time-consuming and error-prone. We propose a new approach that makes use of
iteration and provenance to automatically infer the root causes and derive
succinct explanations of failures. Through a detailed experimental evaluation,
we assess the cost, precision, and recall of our approach compared to the state
of the art. Our source code and experimental data will be available for
reproducibility and enhancement.Comment: 10 page
BugDoc: Algorithms to Debug Computational Processes
Data analysis for scientific experiments and enterprises, large-scale
simulations, and machine learning tasks all entail the use of complex
computational pipelines to reach quantitative and qualitative conclusions. If
some of the activities in a pipeline produce erroneous outputs, the pipeline
may fail to execute or produce incorrect results. Inferring the root cause(s)
of such failures is challenging, usually requiring time and much human thought,
while still being error-prone. We propose a new approach that makes use of
iteration and provenance to automatically infer the root causes and derive
succinct explanations of failures. Through a detailed experimental evaluation,
we assess the cost, precision, and recall of our approach compared to the state
of the art. Our experimental data and processing software is available for use,
reproducibility, and enhancement.Comment: To appear in SIGMOD 2020. arXiv admin note: text overlap with
arXiv:2002.0464
Causality-Guided Adaptive Interventional Debugging
Runtime nondeterminism is a fact of life in modern database applications.
Previous research has shown that nondeterminism can cause applications to
intermittently crash, become unresponsive, or experience data corruption. We
propose Adaptive Interventional Debugging (AID) for debugging such intermittent
failures. AID combines existing statistical debugging, causal analysis, fault
injection, and group testing techniques in a novel way to (1) pinpoint the root
cause of an application's intermittent failure and (2) generate an explanation
of how the root cause triggers the failure. AID works by first identifying a
set of runtime behaviors (called predicates) that are strongly correlated to
the failure. It then utilizes temporal properties of the predicates to
(over)-approximate their causal relationships. Finally, it uses fault injection
to execute a sequence of interventions on the predicates and discover their
true causal relationships. This enables AID to identify the true root cause and
its causal relationship to the failure. We theoretically analyze how fast AID
can converge to the identification. We evaluate AID with six real-world
applications that intermittently fail under specific inputs. In each case, AID
was able to identify the root cause and explain how the root cause triggered
the failure, much faster than group testing and more precisely than statistical
debugging. We also evaluate AID with many synthetically generated applications
with known root causes and confirm that the benefits also hold for them.Comment: Technical report of AID (SIGMOD 2020
Recommended from our members
Enhancing Usability and Explainability of Data Systems
The recent growth of data science expanded its reach to an ever-growing user base of nonexperts, increasing the need for usability, understandability, and explainability in these systems. Enhancing usability makes data systems accessible to people with different skills and backgrounds alike, leading to democratization of data systems. Furthermore, proper understanding of data and data-driven systems is necessary for the users to trust the function of the systems that learn from data. Finally, data systems should be transparent: when a data system behaves unexpectedly or malfunctions, the users deserve proper explanation of what caused the observed incident. Unfortunately, most existing data systems offer limited usability and support for explanations: these systems are usable only by experts with sound technical skills, and even expert users are hindered by the lack of transparency into the systems\u27 inner workings and functions. The aim of my thesis is to bridge the usability gap between nonexpert users and complex data systems, aid all sort of users, including the expert ones, in data and system understanding, and provide explanations that help reason about unexpected outcomes involving data systems. Specifically, my thesis has the following three goals: (1) enhancing usability of data systems for nonexperts, (2) enable data understanding that can assist users in a variety of tasks such as achieving trust in data-driven machine learning, gaining data understanding, and data cleaning, and (3) explaining causes of unexpected outcomes involving data and data systems.
For enhancing usability, we focus on example-driven user intent discovery. We develop systems based on example-driven interactions in two different settings: querying relational databases and personalized document summarization. Towards data understanding, we develop a new data-profiling primitive that can characterize tuples for which a machine-learned model is likely to produce untrustworthy predictions. We also develop an explanation framework to explain causes of such untrustworthy predictions. Additionally, this new data-profiling primitive enables interactive data cleaning. Finally, we develop two explanation frameworks, tailored to provide explanations in debugging data system components, including the data itself. The explanation frameworks focus on explaining the root cause of a concurrent application\u27s intermittent failure and exposing issues in the data that cause a data-driven system to malfunction
Technologies for a FAIRer use of Ocean Best Practices
The publication and dissemination of best practices in ocean observing is pivotal for multiple aspects
of modern marine science, including cross-disciplinary interoperability, improved reproducibility of
observations and analyses, and training of new practitioners. Often, best practices are not published
in a scientific journal and may not even be formally documented, residing solely within the minds of
individuals who pass the information along through direct instruction. Naturally, documenting best
practices is essential to accelerate high-quality marine science; however, documentation in a drawer
has little impact. To enhance the application and development of best practices, we must leverage
contemporary document handling technologies to make best practices discoverable, accessible, and
interlinked, echoing the logic of the FAIR data principles [1]
Staring down the lion: Uncertainty avoidance and operational risk culture in a tourism organisation
The academic literature is not clear about how uncertainty influences operational risk decision-making. This study, therefore, investigated operational risk-based decision-making in the face of uncertainty in a large African safari tourism organisation by exploring individual and perceived team member approaches to uncertainty. Convenience sampling was used to identify 15 managers across three African countries in three domains of work: safari camp; regional office; and head office. Semi-structured interviews were conducted in which vignettes were incorporated, to which participants responded with their own reactions and decisions to the situations described, as well as with ways they thought other managers would react to these specific operational contexts. The data were transcribed and qualitatively analysed through thematic coding processes. The findings indicated that approaches to uncertainty were influenced by factors including situational context, the availability and communication of information, the level of operational experience, and participants’ roles. Contextual factors alongside diverse individual emotional and cognitive influences were shown to require prudent consideration by safari tourism operators in understanding employee behavioural reactions to uncertain situations. A preliminary model drawn from the findings suggests that, in practice, decision-making in the face of uncertainty is more complex than existing theoretical studies propose. Specifically, the diverse responses anticipated by staff in response to the vignettes could guide safari tourism management towards better handling of risk under uncertainty in remote locations