Search CORE

18 research outputs found

Debugging Machine Learning Pipelines

Author: Freire Juliana
Lourenço Raoni
Shasha Dennis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Machine learning tasks entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous or uninformative outputs, the pipeline may fail or produce incorrect results. Inferring the root cause of failures and unexpected behavior is challenging, usually requiring much human thought, and is both time-consuming and error-prone. We propose a new approach that makes use of iteration and provenance to automatically infer the root causes and derive succinct explanations of failures. Through a detailed experimental evaluation, we assess the cost, precision, and recall of our approach compared to the state of the art. Our source code and experimental data will be available for reproducibility and enhancement.Comment: 10 page

arXiv.org e-Print Archive

Crossref

Scipedia

Open Repository and Bibliography - Luxembourg

BugDoc: Algorithms to Debug Computational Processes

Author: Alvaro Peter
Attariyan Mona
Bergstra J.
Bergstra James
Chen Ang
Dolatnia Nima
Galhotra Sainyam
Godefroid Patrice
Holler Christian
Hutter F.
Johnson Brittany
Lee Kang Wook
Liblit Ben
Lourencco Raoni
Meliou Alexandra
Snoek Jasper
Snoek Jasper
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/04/2020
Field of study

Data analysis for scientific experiments and enterprises, large-scale simulations, and machine learning tasks all entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous outputs, the pipeline may fail to execute or produce incorrect results. Inferring the root cause(s) of such failures is challenging, usually requiring time and much human thought, while still being error-prone. We propose a new approach that makes use of iteration and provenance to automatically infer the root causes and derive succinct explanations of failures. Through a detailed experimental evaluation, we assess the cost, precision, and recall of our approach compared to the state of the art. Our experimental data and processing software is available for use, reproducibility, and enhancement.Comment: To appear in SIGMOD 2020. arXiv admin note: text overlap with arXiv:2002.0464

arXiv.org e-Print Archive

Crossref

Scipedia

Open Repository and Bibliography - Luxembourg

Causality-Guided Adaptive Interventional Debugging

Author: Agarwal Abhishek
Alvaro Peter
Attariyan Mona
Attariyan Mona
Feyzi Farid
Han Seungjae
James
Jensen David D.
Johnson Noah M.
Joseph
Liu Bo
Meliou Alexandra
Oldenburg Lennart
Pearl Judea
Pearl Judea
Shu Gang
Shvachko Konstantin
Yuan Ding
Zheng Alice X.
Zheng Alice X.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/04/2020
Field of study

Runtime nondeterminism is a fact of life in modern database applications. Previous research has shown that nondeterminism can cause applications to intermittently crash, become unresponsive, or experience data corruption. We propose Adaptive Interventional Debugging (AID) for debugging such intermittent failures. AID combines existing statistical debugging, causal analysis, fault injection, and group testing techniques in a novel way to (1) pinpoint the root cause of an application's intermittent failure and (2) generate an explanation of how the root cause triggers the failure. AID works by first identifying a set of runtime behaviors (called predicates) that are strongly correlated to the failure. It then utilizes temporal properties of the predicates to (over)-approximate their causal relationships. Finally, it uses fault injection to execute a sequence of interventions on the predicates and discover their true causal relationships. This enables AID to identify the true root cause and its causal relationship to the failure. We theoretically analyze how fast AID can converge to the identification. We evaluate AID with six real-world applications that intermittently fail under specific inputs. In each case, AID was able to identify the root cause and explain how the root cause triggered the failure, much faster than group testing and more precisely than statistical debugging. We also evaluate AID with many synthetically generated applications with known root causes and confirm that the benefits also hold for them.Comment: Technical report of AID (SIGMOD 2020

arXiv.org e-Print Archive

Crossref

Recommended from our members

Enhancing Usability and Explainability of Data Systems

Author: Fariha Anna
Publication venue: ScholarWorks@UMass Amherst
Publication date: 20/10/2021
Field of study

The recent growth of data science expanded its reach to an ever-growing user base of nonexperts, increasing the need for usability, understandability, and explainability in these systems. Enhancing usability makes data systems accessible to people with different skills and backgrounds alike, leading to democratization of data systems. Furthermore, proper understanding of data and data-driven systems is necessary for the users to trust the function of the systems that learn from data. Finally, data systems should be transparent: when a data system behaves unexpectedly or malfunctions, the users deserve proper explanation of what caused the observed incident. Unfortunately, most existing data systems offer limited usability and support for explanations: these systems are usable only by experts with sound technical skills, and even expert users are hindered by the lack of transparency into the systems\u27 inner workings and functions. The aim of my thesis is to bridge the usability gap between nonexpert users and complex data systems, aid all sort of users, including the expert ones, in data and system understanding, and provide explanations that help reason about unexpected outcomes involving data systems. Specifically, my thesis has the following three goals: (1) enhancing usability of data systems for nonexperts, (2) enable data understanding that can assist users in a variety of tasks such as achieving trust in data-driven machine learning, gaining data understanding, and data cleaning, and (3) explaining causes of unexpected outcomes involving data and data systems. For enhancing usability, we focus on example-driven user intent discovery. We develop systems based on example-driven interactions in two different settings: querying relational databases and personalized document summarization. Towards data understanding, we develop a new data-profiling primitive that can characterize tuples for which a machine-learned model is likely to produce untrustworthy predictions. We also develop an explanation framework to explain causes of such untrustworthy predictions. Additionally, this new data-profiling primitive enables interactive data cleaning. Finally, we develop two explanation frameworks, tailored to provide explanations in debugging data system components, including the data itself. The explanation frameworks focus on explaining the root cause of a concurrent application\u27s intermittent failure and exposing issues in the data that cause a data-driven system to malfunction

ScholarWorks@UMass Amherst

Heritage and Digital learning:understanding how communities learn about Cultural Heritage from online content and how it can be embedded in traditional education

Author: Gandolfi Eleonora
Publication venue
Publication date: 01/01/2022
Field of study

King's Research Portal

Technologies for a FAIRer use of Ocean Best Practices

Author: Bushnell Mark
Buttigieg Pier L.
Caltagirone Scott
Hermes Juliet
Heslop Emma
Karstensen Johannes
Muller-Karger Frank
Munoz Cristian
Pearlman Francoise
Pearlman Jay
Piessierssens Peter
Simpson Pauline
Publication venue
Publication date: 01/01/2018
Field of study

The publication and dissemination of best practices in ocean observing is pivotal for multiple aspects of modern marine science, including cross-disciplinary interoperability, improved reproducibility of observations and analyses, and training of new practitioners. Often, best practices are not published in a scientific journal and may not even be formally documented, residing solely within the minds of individuals who pass the information along through direct instruction. Naturally, documenting best practices is essential to accelerate high-quality marine science; however, documentation in a drawer has little impact. To enhance the application and development of best practices, we must leverage contemporary document handling technologies to make best practices discoverable, accessible, and interlinked, echoing the logic of the FAIR data principles [1]

OceanRep

Electronic Publication Information Center

Open Marine Archive

Pertanika Journal of Social Sciences & Humanities

Author: Universiti Putra Malaysia Press
Publication venue: Universiti Putra Malaysia Press
Publication date: 01/01/2020
Field of study

Universiti Putra Malaysia Institutional Repository

Staring down the lion: Uncertainty avoidance and operational risk culture in a tourism organisation

Author: Goede Fred
Keevey Malora
Publication venue
Publication date: 01/10/2021
Field of study

The academic literature is not clear about how uncertainty influences operational risk decision-making. This study, therefore, investigated operational risk-based decision-making in the face of uncertainty in a large African safari tourism organisation by exploring individual and perceived team member approaches to uncertainty. Convenience sampling was used to identify 15 managers across three African countries in three domains of work: safari camp; regional office; and head office. Semi-structured interviews were conducted in which vignettes were incorporated, to which participants responded with their own reactions and decisions to the situations described, as well as with ways they thought other managers would react to these specific operational contexts. The data were transcribed and qualitatively analysed through thematic coding processes. The findings indicated that approaches to uncertainty were influenced by factors including situational context, the availability and communication of information, the level of operational experience, and participants’ roles. Contextual factors alongside diverse individual emotional and cognitive influences were shown to require prudent consideration by safari tourism operators in understanding employee behavioural reactions to uncertain situations. A preliminary model drawn from the findings suggests that, in practice, decision-making in the face of uncertainty is more complex than existing theoretical studies propose. Specifically, the diverse responses anticipated by staff in response to the vignettes could guide safari tourism management towards better handling of risk under uncertainty in remote locations

EUR Research Repository

Staring down the lion: Uncertainty avoidance and operational risk culture in a tourism organisation

Author: Goede Fred
Keevey Malora
Publication venue
Publication date: 01/10/2021
Field of study

EUR Research Repository