1,537 research outputs found

    Robust Computer Algebra, Theorem Proving, and Oracle AI

    Get PDF
    In the context of superintelligent AI systems, the term "oracle" has two meanings. One refers to modular systems queried for domain-specific tasks. Another usage, referring to a class of systems which may be useful for addressing the value alignment and AI control problems, is a superintelligent AI system that only answers questions. The aim of this manuscript is to survey contemporary research problems related to oracles which align with long-term research goals of AI safety. We examine existing question answering systems and argue that their high degree of architectural heterogeneity makes them poor candidates for rigorous analysis as oracles. On the other hand, we identify computer algebra systems (CASs) as being primitive examples of domain-specific oracles for mathematics and argue that efforts to integrate computer algebra systems with theorem provers, systems which have largely been developed independent of one another, provide a concrete set of problems related to the notion of provable safety that has emerged in the AI safety community. We review approaches to interfacing CASs with theorem provers, describe well-defined architectural deficiencies that have been identified with CASs, and suggest possible lines of research and practical software projects for scientists interested in AI safety.Comment: 15 pages, 3 figure

    Testing scientific software: techniques for automatic detection of metamorphic relations

    Get PDF
    2015 Spring.Includes bibliographical references.Scientific software plays an important role in critical decision making in fields such as the nuclear industry, medicine, and the military. Systematic testing of such software can help to ensure that it works as expected. Comprehensive, automated software testing requires an oracle to check whether the output produced by a test case matches the expected behavior of the program. But the challenges in creating suitable oracles limit the ability to perform automated testing of scientific software. For some programs, creating an oracle may be not possible since the correct output is not known a priori. Further, it may be impractical to implement an oracle for an arbitrary input due to the complexity of a program. The software testing community refers to such programs as non-testable. Many scientific programs fall into this category of non-testable programs, since they are either written to find answers that are previously unknown or they perform complex calculations. In this work, we developed techniques to automatically predict metamorphic relations by analyzing the program structure. These metamorphic relations can serve as automated partial test oracles in scientific software. Metamorphic testing is a method for automating the testing process for programs without test oracles. This technique operates by checking whether a program behaves according to a certain set of properties called metamorphic relations. A metamorphic relation is a relationship between multiple input and output pairs of the program. It specifies how the output should change following a specific change made to the input. A change in the output that differs from what is specified by the metamorphic relation indicates a fault in the program. Metamorphic testing can be effective in testing machine learning applications, bioinformatics programs, health-care simulations, partial differential equations and other programs. Unfortunately, finding appropriate metamorphic relations for use in metamorphic testing remains a labor intensive task that is generally performed by a domain expert or a programmer. In this work we applied novel machine learning based approaches to automatically derive metamorphic relations. We first evaluated the effectiveness of modeling the metamorphic relation prediction problem as a binary classification problem. We found that support vector machines are the most effective binary classifiers for predicting metamorphic relations. We also found that using walk-based graph kernels for feature extraction from graph-based program representations further improves the prediction accuracy. In addition, incorporating mathematical properties of operations in the graph kernel computation improves the prediction accuracy. Further, we found that control flow information of a function are more effective than data dependency information for predicting metamorphic relations. Finally we investigated the possibility of creating multi-label classifiers that can predict multiple metamorphic relations using a single classifier. Our empirical studies show that multi-label classifiers are not effective as binary classifiers for predicting metamorphic relations. Automated testing will make the testing process faster, reduce the testing cost and make it more reliable. Automated testing requires automated test oracles. Automatically discovering metamorphic relations is an important step towards automating oracle creation. Work presented here is the first attempt towards developing automated techniques for deriving metamorphic relations. Our work contributes toward automating the testing process of programs that face oracle problems

    Identifying Implementation Bugs in Machine Learning based Image Classifiers using Metamorphic Testing

    Full text link
    We have recently witnessed tremendous success of Machine Learning (ML) in practical applications. Computer vision, speech recognition and language translation have all seen a near human level performance. We expect, in the near future, most business applications will have some form of ML. However, testing such applications is extremely challenging and would be very expensive if we follow today's methodologies. In this work, we present an articulation of the challenges in testing ML based applications. We then present our solution approach, based on the concept of Metamorphic Testing, which aims to identify implementation bugs in ML based image classifiers. We have developed metamorphic relations for an application based on Support Vector Machine and a Deep Learning based application. Empirical validation showed that our approach was able to catch 71% of the implementation bugs in the ML applications.Comment: Published at 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018

    Using Machine Learning to Generate Test Oracles: A Systematic Literature Review

    Get PDF
    Machine learning may enable the automated generation of test oracles. We have characterized emerging research in this area through a systematic literature review examining oracle types, researcher goals, the ML techniques applied, how the generation process was assessed, and the open research challenges in this emerging field. Based on a sample of 22 relevant studies, we observed that ML algorithms generated test verdict, metamorphic relation, and - most commonly - expected output oracles. Almost all studies employ a supervised or semi-supervised approach, trained on labeled system executions or code metadata - including neural networks, support vector machines, adaptive boosting, and decision trees. Oracles are evaluated using the mutation score, correct classifications, accuracy, and ROC. Work-to-date show great promise, but there are significant open challenges regarding the requirements imposed on training data, the complexity of modeled functions, the ML algorithms employed - and how they are applied - the benchmarks used by researchers, and replicability of the studies. We hope that our findings will serve as a roadmap and inspiration for researchers in this field.Comment: Pre-print. Article accepted to 1st International Workshop on Test Oracles at ESEC/FSE 202

    Using Machine Learning to Generate Test Oracles: A Systematic Literature Review

    Get PDF
    Machine learning may enable the automated generation of test oracles. We have characterized emerging research in this area through a systematic literature review examining oracle types, researcher goals, the ML techniques applied, how the generation process was assessed, and the open research challenges in this emerging field.Based on a sample of 22 relevant studies, we observed that ML algorithms generated test verdict, metamorphic relation, and - most commonly - expected output oracles. Almost all studies employ a supervised or semi-supervised approach, trained on labeled system executions or code metadata - including neural networks, support vector machines, adaptive boosting, and decision trees. Oracles are evaluated using the mutation score, correct classifications, accuracy, and ROC. Work-to-date show great promise, but there are significant open challenges regarding the requirements imposed on training data, the complexity of modeled functions, the ML algorithms employed - and how they are applied - the benchmarks used by researchers, and replicability of the studies. We hope that our findings will serve as a roadmap and inspiration for researchers in this field
    • …
    corecore