5,393 research outputs found

    Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing System Failure

    Full text link
    As machine learning systems move from computer-science laboratories into the open world, their accountability becomes a high priority problem. Accountability requires deep understanding of system behavior and its failures. Current evaluation methods such as single-score error metrics and confusion matrices provide aggregate views of system performance that hide important shortcomings. Understanding details about failures is important for identifying pathways for refinement, communicating the reliability of systems in different settings, and for specifying appropriate human oversight and engagement. Characterization of failures and shortcomings is particularly complex for systems composed of multiple machine learned components. For such systems, existing evaluation methods have limited expressiveness in describing and explaining the relationship among input content, the internal states of system components, and final output quality. We present Pandora, a set of hybrid human-machine methods and tools for describing and explaining system failures. Pandora leverages both human and system-generated observations to summarize conditions of system malfunction with respect to the input content and system architecture. We share results of a case study with a machine learning pipeline for image captioning that show how detailed performance views can be beneficial for analysis and debugging

    Exploration of the scalability of LocFaults approach for error localization with While-loops programs

    Get PDF
    A model checker can produce a trace of counterexample, for an erroneous program, which is often long and difficult to understand. In general, the part about the loops is the largest among the instructions in this trace. This makes the location of errors in loops critical, to analyze errors in the overall program. In this paper, we explore the scala-bility capabilities of LocFaults, our error localization approach exploiting paths of CFG(Control Flow Graph) from a counterexample to calculate the MCDs (Minimal Correction Deviations), and MCSs (Minimal Correction Subsets) from each found MCD. We present the times of our approach on programs with While-loops unfolded b times, and a number of deviated conditions ranging from 0 to n. Our preliminary results show that the times of our approach, constraint-based and flow-driven, are better compared to BugAssist which is based on SAT and transforms the entire program to a Boolean formula, and further the information provided by LocFaults is more expressive for the user

    Enhanced Formal Verification Flow for Circuits Integrating Debugging and Coverage Analysis

    Get PDF
    In this paper we briefly review techniques used in formal hardware verification. An advanced flow emerges from integrating two major methodological improvements: debugging support and coverage analysis. The verification engineer can locate the source of a failure with an automatic debugging support. Components are identified which explain the discrepancy between the property and the circuit behavior.This method is complemented by an approach to analyze functional coverage of the proven Bounded Model Checking(BMC) properties. The approach automatically determines whether the property set is complete or not. In the latter case coverage gaps are returned. Both techniques are integrated in an enhanced verification flow. A running example demonstrates the resulting advantages

    Diagnose network failures via data-plane analysis

    Get PDF
    Diagnosing problems in networks is a time-consuming and error-prone process. Previous tools to assist operators primarily focus on analyzing control plane configuration. Configuration analysis is limited in that it cannot find bugs in router software, and is harder to generalize across protocols since it must model complex configuration languages and dynamic protocol behavior. This paper studies an alternate approach: diagnosing problems through static analysis of the data plane. This approach can catch bugs that are invisible at the level of configuration files, and simplifies unified analysis of a network across many protocols and implementations. We present Anteater, a tool for checking invariants in the data plane. Anteater translates high-level network invariants into boolean satisfiability problems, checks them against network state using a SAT solver, and reports counterexamples if violations have been found. Applied to a large campus network, Anteater revealed 23 bugs, including forwarding loops and stale ACL rules, with only five false positives. Nine of these faults are being fixed by campus network operators

    Automated Fixing of Programs with Contracts

    Full text link
    This paper describes AutoFix, an automatic debugging technique that can fix faults in general-purpose software. To provide high-quality fix suggestions and to enable automation of the whole debugging process, AutoFix relies on the presence of simple specification elements in the form of contracts (such as pre- and postconditions). Using contracts enhances the precision of dynamic analysis techniques for fault detection and localization, and for validating fixes. The only required user input to the AutoFix supporting tool is then a faulty program annotated with contracts; the tool produces a collection of validated fixes for the fault ranked according to an estimate of their suitability. In an extensive experimental evaluation, we applied AutoFix to over 200 faults in four code bases of different maturity and quality (of implementation and of contracts). AutoFix successfully fixed 42% of the faults, producing, in the majority of cases, corrections of quality comparable to those competent programmers would write; the used computational resources were modest, with an average time per fix below 20 minutes on commodity hardware. These figures compare favorably to the state of the art in automated program fixing, and demonstrate that the AutoFix approach is successfully applicable to reduce the debugging burden in real-world scenarios.Comment: Minor changes after proofreadin
    • …
    corecore