7,707 research outputs found

    Systematic literature review of validation methods for AI systems

    Get PDF
    Context: Artificial intelligence (AI) has made its way into everyday activities, particularly through new techniques such as machine learning (ML). These techniques are implementable with little domain knowledge. This, combined with the difficulty of testing AI systems with traditional methods, has made system trustworthiness a pressing issue. Objective: This paper studies the methods used to validate practical AI systems reported in the literature. Our goal is to classify and describe the methods that are used in realistic settings to ensure the dependability of AI systems. Method: A systematic literature review resulted in 90 papers. Systems presented in the papers were analysed based on their domain, task, complexity, and applied validation methods. Results: The validation methods were synthesized into a taxonomy consisting of trial, simulation, model-centred validation, and expert opinion. Failure monitors, safety channels, redundancy, voting, and input and output restrictions are methods used to continuously validate the systems after deployment. Conclusions: Our results clarify existing strategies applied to validation. They form a basis for the synthesization, assessment, and refinement of AI system validation in research and guidelines for validating individual systems in practice. While various validation strategies have all been relatively widely applied, only few studies report on continuous validation.Peer reviewe

    Acceptance in Incomplete Argumentation Frameworks

    Get PDF
    A Abstract argumentation frameworks (AFs), originally proposed by Dung, constitute a central formal model for the study of computational aspects of argumentation in AI. Credulous and skeptical acceptance of arguments in a given AF are well-studied problems both in terms of theoretical analysis-especially computational complexity-and the development of practical decision procedures for the problems. However, AFs make the assumption that all attacks between arguments are certain (i.e., present attacks are known to exist, and missing attacks are known to not exist), which can in various settings be a restrictive assumption. A generalization of AFs to incomplete AFs was recently proposed as a formalism that allows the representation of both uncertain attacks and uncertain arguments in AFs. In this article, we explore the impact of allowing for modeling such uncertainties in AFs on the computational complexity of natural generalizations of acceptance problems to incomplete AFs under various central AF semantics. Complementing the complexity-theoretic analysis, we also develop the first practical decision procedures for all of the NP-hard variants of acceptance in incomplete AFs. In terms of complexity analysis, we establish a full complexity landscape, showing that depending on the variant of acceptance and property/semantics, the complexity of acceptance in incomplete AFs ranges from polynomial-time decidable to completeness for Sigma(p)(3). In terms of algorithms, we show through an extensive empirical evaluation that an implementation of the proposed decision procedures, based on boolean satisfiability (SAT) solving, is effective in deciding variants of acceptance under uncertainties. We also establish conditions for what type of atomic changes are guaranteed to be redundant from the perspective of preserving extensions of completions of incomplete AFs, and show that the results allow for considerably improving the empirical efficiency of the proposed SAT-based counterexample-guided abstraction refinement algorithms for acceptance in incomplete AFs for problem variants with complexity beyond NP. (C) 2021 The Authors. Published by Elsevier B.V.Peer reviewe

    Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 4: FTMP executive summary

    Get PDF
    The FTMP architecture is a high reliability computer concept modeled after a homogeneous multiprocessor architecture. Elements of the FTMP are operated in tight synchronism with one another and hardware fault-detection and fault-masking is provided which is transparent to the software. Operating system design and user software design is thus greatly simplified. Performance of the FTMP is also comparable to that of a simplex equivalent due to the efficiency of fault handling hardware. The FTMP project constructed an engineering module of the FTMP, programmed the machine and extensively tested the architecture through fault injection and other stress testing. This testing confirmed the soundness of the FTMP concepts

    Advanced flight control system study

    Get PDF
    A fly by wire flight control system architecture designed for high reliability includes spare sensor and computer elements to permit safe dispatch with failed elements, thereby reducing unscheduled maintenance. A methodology capable of demonstrating that the architecture does achieve the predicted performance characteristics consists of a hierarchy of activities ranging from analytical calculations of system reliability and formal methods of software verification to iron bird testing followed by flight evaluation. Interfacing this architecture to the Lockheed S-3A aircraft for flight test is discussed. This testbed vehicle can be expanded to support flight experiments in advanced aerodynamics, electromechanical actuators, secondary power systems, flight management, new displays, and air traffic control concepts

    Software defect prediction: do different classifiers find the same defects?

    Get PDF
    Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.During the last 10 years, hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of about 80% recall. We investigate the individual defects that four classifiers predict and analyse the level of prediction uncertainty produced by these classifiers. We perform a sensitivity analysis to compare the performance of Random Forest, Naïve Bayes, RPart and SVM classifiers when predicting defects in NASA, open source and commercial datasets. The defect predictions that each classifier makes is captured in a confusion matrix and the prediction uncertainty of each classifier is compared. Despite similar predictive performance values for these four classifiers, each detects different sets of defects. Some classifiers are more consistent in predicting defects than others. Our results confirm that a unique subset of defects can be detected by specific classifiers. However, while some classifiers are consistent in the predictions they make, other classifiers vary in their predictions. Given our results, we conclude that classifier ensembles with decision-making strategies not based on majority voting are likely to perform best in defect prediction.Peer reviewedFinal Published versio
    corecore