7 research outputs found

    Using machine learning to classify test outcomes

    Get PDF
    When testing software it has been shown that there are substantial benefits to be gained from approaches which exercise unusual or unexplored interactions with a system - techniques such as random testing, fuzzing, and exploratory testing. However, such approaches have a drawback in that the outputs of the tests need to be manually checked for correctness, representing a significant burden for the software engineer. This paper presents a strategy to support the process of identifying which tests have passed or failed by combining clustering and semi-supervised learning. We have shown that by using machine learning it is possible to cluster test cases in such a way that those corresponding to failures concentrate into smaller clusters. Examining the test outcomes in cluster-size order has the effect of prioritising the results: those that are checked early on have a much higher probability of being a failing test. As the software engineer examines the results (and confirms or refutes the initial classification), this information is employed to bootstrap a secondary learner to further improve the accuracy of the classification of the (as yet) unchecked tests. Results from experimenting with a range of systems demonstrate the substantial benefits that can be gained from this strategy, and how remarkably accurate test output classifications can be derived from examining a relatively small proportion of results

    Machine learning techniques for automated software fault detection via dynamic execution data : empirical evaluation study

    Get PDF
    The biggest obstacle of automated software testing is the construction of test oracles. Today, it is possible to generate enormous amount of test cases for an arbitrary system that reach a remarkably high level of coverage, but the effectiveness of test cases is limited by the availability of test oracles that can distinguish failing executions. Previous work by the authors has explored the use of unsupervised and semi-supervised learning techniques to develop test oracles so that the correctness of software outputs and behaviours on new test cases can be predicated [1], [2], [10], and experimental results demonstrate the promise of this approach. In this paper, we present an evaluation study for test oracles based on machine-learning approaches via dynamic execution data (firstly, input/output pairs and secondly, amalgamations of input/output pairs and execution traces) by comparing their effectiveness with existing techniques from the specification mining domain (the data invariant detector Daikon [5]). The two approaches are evaluated on a range of mid-sized systems and compared in terms of their fault detection ability and false positive rate. The empirical study also discuss the major limitations and the most important properties related to the application of machine learning techniques as test oracles in practice. The study also gives a road map for further research direction in order to tackle some of discussed limitations such as accuracy and scalability. The results show that in most cases semi-supervised learning techniques performed far better as an automated test classifier than Daikon (especially in the case that input/output pairs were augmented with their execution traces). However, there is one system for which our strategy struggles and Daikon performed far better. Furthermore, unsupervised learning techniques performed on a par when compared with Daikon in several cases particularly when input/output pairs were used together with execution traces

    Research Methods in Machine Learning: A Content Analysis

    Get PDF
    Research methods in machine learning play a pivotal role since the accuracy and reliability of the results are influenced by the research methods used. The main aims of this paper were to explore current research methods in machine learning, emerging themes, and the implications of those themes in machine learning research.  To achieve this the researchers analyzed a total of 100 articles published since 2019 in IEEE journals. This study revealed that Machine learning uses quantitative research methods with experimental research design being the de facto research approach. The study also revealed that researchers nowadays use more than one algorithm to address a problem. Optimal feature selection has also emerged to be a key thing that researchers are using to optimize the performance of Machine learning algorithms. Confusion matrix and its derivatives are still the main ways used to evaluate the performance of algorithms, although researchers are now also considering the processing time taken by an algorithm to execute. Python programming languages together with its libraries are the most used tools in creating, training, and testing models. The most used algorithms in addressing both classification and prediction problems are; Naïve Bayes, Support Vector Machine, Random Forest, Artificial Neural Networks, and Decision Tree. The recurring themes identified in this study are likely to open new frontiers in Machine learning research.  

    Research Methods in Machine Learning: A Content Analysis

    Get PDF
    Research methods in machine learning play a pivotal role since the accuracy and reliability of the results are influenced by the research methods used. The main aims of this paper were to explore current research methods in machine learning, emerging themes, and the implications of those themes in machine learning research.  To achieve this the researchers analyzed a total of 100 articles published since 2019 in IEEE journals. This study revealed that Machine learning uses quantitative research methods with experimental research design being the de facto research approach. The study also revealed that researchers nowadays use more than one algorithm to address a problem. Optimal feature selection has also emerged to be a key thing that researchers are using to optimize the performance of Machine learning algorithms. Confusion matrix and its derivatives are still the main ways used to evaluate the performance of algorithms, although researchers are now also considering the processing time taken by an algorithm to execute. Python programming languages together with its libraries are the most used tools in creating, training, and testing models. The most used algorithms in addressing both classification and prediction problems are; Naïve Bayes, Support Vector Machine, Random Forest, Artificial Neural Networks, and Decision Tree. The recurring themes identified in this study are likely to open new frontiers in Machine learning research.  

    Machine Learning-based Test Selection for Simulation-based Testing of Self-driving Cars Software

    Full text link
    Simulation platforms facilitate the development of emerging Cyber-Physical Systems (CPS) like self-driving cars (SDC) because they are more efficient and less dangerous than field operational test cases. Despite this, thoroughly testing SDCs in simulated environments remains challenging because SDCs must be tested in a sheer amount of long-running test cases. Past results on software testing optimization have shown that not all the test cases contribute equally to establishing confidence in test subjects' quality and reliability, and the execution of "safe and uninformative" test cases can be skipped to reduce testing effort. However, this problem is only partially addressed in the context of SDC simulation platforms. In this paper, we investigate test selection strategies to increase the cost-effectiveness of simulation-based testing in the context of SDCs. We propose an approach called SDC-Scissor (SDC coSt-effeCtIve teSt SelectOR) that leverages Machine Learning (ML) strategies to identify and skip test cases that are unlikely to detect faults in SDCs before executing them. Our evaluation shows that SDC-Scissor outperforms the baselines. With the Logistic model, we achieve an accuracy of 70%, a precision of 65%, and a recall of 80% in selecting tests leading to a fault and improved testing cost-effectiveness. Specifically, SDC-Scissor avoided the execution of 50% of unnecessary tests as well as outperformed two baseline strategies. Complementary to existing work, we also integrated SDC-Scissor into the context of an industrial organization in the automotive domain to demonstrate how it can be used in industrial settings.Comment: arXiv admin note: substantial text overlap with arXiv:2111.0466

    Machine learning-based test selection for simulation-based testing of self-driving cars software

    Get PDF
    Simulation platforms facilitate the development of emerging Cyber-Physical Systems (CPS) like self-driving cars (SDC) because they are more efficient and less dangerous than eld operational test cases. Despite this, thoroughly testing SDCs in simulated environments remains challenging because SDCs must be tested in a sheer amount of long-running test cases. Past results on software testing optimization have shown that not all the test cases contribute equally to establishing con dence in test subjects' quality and reliability, and the execution of \safe and uninformative" test cases can be skipped to reduce testing effort. However, this problem is only partially addressed in the context of SDC simulation platforms. In this paper, we investigate test selection strategies to increase the cost-effectiveness of simulation-based testing in the context of SDCs. We propose an approach called SDC-Scissor (SDC coSt-effeCtIve teSt SelectOR) that leverages Machine Learning (ML) strategies to identify and skip test cases that are unlikely to detect faults in SDCs before executing them

    Actas de las VI Jornadas Nacionales (JNIC2021 LIVE)

    Get PDF
    Estas jornadas se han convertido en un foro de encuentro de los actores más relevantes en el ámbito de la ciberseguridad en España. En ellas, no sólo se presentan algunos de los trabajos científicos punteros en las diversas áreas de ciberseguridad, sino que se presta especial atención a la formación e innovación educativa en materia de ciberseguridad, y también a la conexión con la industria, a través de propuestas de transferencia de tecnología. Tanto es así que, este año se presentan en el Programa de Transferencia algunas modificaciones sobre su funcionamiento y desarrollo que han sido diseñadas con la intención de mejorarlo y hacerlo más valioso para toda la comunidad investigadora en ciberseguridad
    corecore