18 research outputs found

    Debugging Machine Learning Pipelines

    Get PDF
    Machine learning tasks entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous or uninformative outputs, the pipeline may fail or produce incorrect results. Inferring the root cause of failures and unexpected behavior is challenging, usually requiring much human thought, and is both time-consuming and error-prone. We propose a new approach that makes use of iteration and provenance to automatically infer the root causes and derive succinct explanations of failures. Through a detailed experimental evaluation, we assess the cost, precision, and recall of our approach compared to the state of the art. Our source code and experimental data will be available for reproducibility and enhancement.Comment: 10 page

    BugDoc: Algorithms to Debug Computational Processes

    Get PDF
    Data analysis for scientific experiments and enterprises, large-scale simulations, and machine learning tasks all entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous outputs, the pipeline may fail to execute or produce incorrect results. Inferring the root cause(s) of such failures is challenging, usually requiring time and much human thought, while still being error-prone. We propose a new approach that makes use of iteration and provenance to automatically infer the root causes and derive succinct explanations of failures. Through a detailed experimental evaluation, we assess the cost, precision, and recall of our approach compared to the state of the art. Our experimental data and processing software is available for use, reproducibility, and enhancement.Comment: To appear in SIGMOD 2020. arXiv admin note: text overlap with arXiv:2002.0464

    Causality-Guided Adaptive Interventional Debugging

    Full text link
    Runtime nondeterminism is a fact of life in modern database applications. Previous research has shown that nondeterminism can cause applications to intermittently crash, become unresponsive, or experience data corruption. We propose Adaptive Interventional Debugging (AID) for debugging such intermittent failures. AID combines existing statistical debugging, causal analysis, fault injection, and group testing techniques in a novel way to (1) pinpoint the root cause of an application's intermittent failure and (2) generate an explanation of how the root cause triggers the failure. AID works by first identifying a set of runtime behaviors (called predicates) that are strongly correlated to the failure. It then utilizes temporal properties of the predicates to (over)-approximate their causal relationships. Finally, it uses fault injection to execute a sequence of interventions on the predicates and discover their true causal relationships. This enables AID to identify the true root cause and its causal relationship to the failure. We theoretically analyze how fast AID can converge to the identification. We evaluate AID with six real-world applications that intermittently fail under specific inputs. In each case, AID was able to identify the root cause and explain how the root cause triggered the failure, much faster than group testing and more precisely than statistical debugging. We also evaluate AID with many synthetically generated applications with known root causes and confirm that the benefits also hold for them.Comment: Technical report of AID (SIGMOD 2020

    Technologies for a FAIRer use of Ocean Best Practices

    Get PDF
    The publication and dissemination of best practices in ocean observing is pivotal for multiple aspects of modern marine science, including cross-disciplinary interoperability, improved reproducibility of observations and analyses, and training of new practitioners. Often, best practices are not published in a scientific journal and may not even be formally documented, residing solely within the minds of individuals who pass the information along through direct instruction. Naturally, documenting best practices is essential to accelerate high-quality marine science; however, documentation in a drawer has little impact. To enhance the application and development of best practices, we must leverage contemporary document handling technologies to make best practices discoverable, accessible, and interlinked, echoing the logic of the FAIR data principles [1]

    Pertanika Journal of Social Sciences & Humanities

    Get PDF

    Staring down the lion: Uncertainty avoidance and operational risk culture in a tourism organisation

    Get PDF
    The academic literature is not clear about how uncertainty influences operational risk decision-making. This study, therefore, investigated operational risk-based decision-making in the face of uncertainty in a large African safari tourism organisation by exploring individual and perceived team member approaches to uncertainty. Convenience sampling was used to identify 15 managers across three African countries in three domains of work: safari camp; regional office; and head office. Semi-structured interviews were conducted in which vignettes were incorporated, to which participants responded with their own reactions and decisions to the situations described, as well as with ways they thought other managers would react to these specific operational contexts. The data were transcribed and qualitatively analysed through thematic coding processes. The findings indicated that approaches to uncertainty were influenced by factors including situational context, the availability and communication of information, the level of operational experience, and participants’ roles. Contextual factors alongside diverse individual emotional and cognitive influences were shown to require prudent consideration by safari tourism operators in understanding employee behavioural reactions to uncertain situations. A preliminary model drawn from the findings suggests that, in practice, decision-making in the face of uncertainty is more complex than existing theoretical studies propose. Specifically, the diverse responses anticipated by staff in response to the vignettes could guide safari tourism management towards better handling of risk under uncertainty in remote locations
    corecore