16 research outputs found

    Multi-granular Software Annotation using File-level Weak Labelling

    Get PDF
    One of the most time-consuming tasks for developers is the comprehension of new code bases. An effective approach to aid this process is to label source code files with meaningful annotations, which can help developers understand the content and functionality of a code base quicker. However, most existing solutions for code annotation focus on project-level classification: manually labelling individual files is time-consuming, error-prone and hard to scale. The work presented in this paper aims to automate the annotation of files by leveraging project-level labels; and using the file-level annotations to annotate items at larger levels of granularity, for example, packages and a whole project. We propose a novel approach to annotate source code files using a weak labelling approach and a subsequent hierarchical aggregation. We investigate whether this approach is effective in achieving multi-granular annotations of software projects, which can aid developers in understanding the content and functionalities of a code base more quickly. Our evaluation uses a combination of human assessment and automated metrics to evaluate the annotations' quality. Our approach correctly annotated 50% of files and more than 50\% of packages. Moreover, the information captured at the file-level allowed us to identify, on average, three new relevant labels for any given project. We can conclude that the proposed approach is a convenient and promising way to generate noisy (not precise) annotations for files. Furthermore, hierarchical aggregation effectively preserves the information captured at file-level, and it can be propagated to packages and the overall project itself.Comment: Accepted at the Journal of Empirical Software Engineerin

    A mesterséges intelligencia néhány biztonsági vetülete

    Get PDF
    A mély mesterséges neuronhálók elterjedése az ipari alkalmazásokban évekkel azok megbízhatóságával, értelmezhetőségével, és biztonságával kapcsolatos szakterületek fejlődését megelőzően történt. Az egyik, gyakorlatban is jelentős területen, a képfelismerésben például a megvalósult megoldások szinte már emberi teljesítményre képesek, de ezzel együtt célzott zajjal ezek a rendszerek félrevezethetők, megzavarhatók. Jelen kéziratban ismertetünk néhány tipikus biztonsági problémát, valamint rámutatunk arra, hogy a hagyományos szoftverfejlesztés területén alkalmazott minőségbiztosítási módszerekkel rokon megoldásokra szükség van az MI-re épülő rendszerek fejlesztésében, akár a mesterséges neuronhálók biztonságát, akár az MI rendszerek hagyományos komponenseinek fejlesztését tartjuk szem előtt

    Research Methods in Machine Learning: A Content Analysis

    Get PDF
    Research methods in machine learning play a pivotal role since the accuracy and reliability of the results are influenced by the research methods used. The main aims of this paper were to explore current research methods in machine learning, emerging themes, and the implications of those themes in machine learning research.  To achieve this the researchers analyzed a total of 100 articles published since 2019 in IEEE journals. This study revealed that Machine learning uses quantitative research methods with experimental research design being the de facto research approach. The study also revealed that researchers nowadays use more than one algorithm to address a problem. Optimal feature selection has also emerged to be a key thing that researchers are using to optimize the performance of Machine learning algorithms. Confusion matrix and its derivatives are still the main ways used to evaluate the performance of algorithms, although researchers are now also considering the processing time taken by an algorithm to execute. Python programming languages together with its libraries are the most used tools in creating, training, and testing models. The most used algorithms in addressing both classification and prediction problems are; Naïve Bayes, Support Vector Machine, Random Forest, Artificial Neural Networks, and Decision Tree. The recurring themes identified in this study are likely to open new frontiers in Machine learning research.  

    Research Methods in Machine Learning: A Content Analysis

    Get PDF
    Research methods in machine learning play a pivotal role since the accuracy and reliability of the results are influenced by the research methods used. The main aims of this paper were to explore current research methods in machine learning, emerging themes, and the implications of those themes in machine learning research.  To achieve this the researchers analyzed a total of 100 articles published since 2019 in IEEE journals. This study revealed that Machine learning uses quantitative research methods with experimental research design being the de facto research approach. The study also revealed that researchers nowadays use more than one algorithm to address a problem. Optimal feature selection has also emerged to be a key thing that researchers are using to optimize the performance of Machine learning algorithms. Confusion matrix and its derivatives are still the main ways used to evaluate the performance of algorithms, although researchers are now also considering the processing time taken by an algorithm to execute. Python programming languages together with its libraries are the most used tools in creating, training, and testing models. The most used algorithms in addressing both classification and prediction problems are; Naïve Bayes, Support Vector Machine, Random Forest, Artificial Neural Networks, and Decision Tree. The recurring themes identified in this study are likely to open new frontiers in Machine learning research.  

    Smoke Test Planning using Answer Set Programming

    Get PDF
    Smoke testing is an important method to increase stability and reliability of hardware- gramming, Testing depending systems. Due to concurrent access to the same physical resource and the impracticality of the use of virtualization, smoke testing requires some form of planning. In this paper, we propose to decompose test cases in terms of atomic actions consisting of preconditions and effects. We present a solution based on answer set programming with multi-shot solving that automatically generates short parallel test plans. Experiments suggest that the approach is feasible for non-inherently sequential test cases and scales up to thousands of test cases

    Machine Learning-based Test Selection for Simulation-based Testing of Self-driving Cars Software

    Full text link
    Simulation platforms facilitate the development of emerging Cyber-Physical Systems (CPS) like self-driving cars (SDC) because they are more efficient and less dangerous than field operational test cases. Despite this, thoroughly testing SDCs in simulated environments remains challenging because SDCs must be tested in a sheer amount of long-running test cases. Past results on software testing optimization have shown that not all the test cases contribute equally to establishing confidence in test subjects' quality and reliability, and the execution of "safe and uninformative" test cases can be skipped to reduce testing effort. However, this problem is only partially addressed in the context of SDC simulation platforms. In this paper, we investigate test selection strategies to increase the cost-effectiveness of simulation-based testing in the context of SDCs. We propose an approach called SDC-Scissor (SDC coSt-effeCtIve teSt SelectOR) that leverages Machine Learning (ML) strategies to identify and skip test cases that are unlikely to detect faults in SDCs before executing them. Our evaluation shows that SDC-Scissor outperforms the baselines. With the Logistic model, we achieve an accuracy of 70%, a precision of 65%, and a recall of 80% in selecting tests leading to a fault and improved testing cost-effectiveness. Specifically, SDC-Scissor avoided the execution of 50% of unnecessary tests as well as outperformed two baseline strategies. Complementary to existing work, we also integrated SDC-Scissor into the context of an industrial organization in the automotive domain to demonstrate how it can be used in industrial settings.Comment: arXiv admin note: substantial text overlap with arXiv:2111.0466

    Machine learning techniques for automated software fault detection via dynamic execution data : empirical evaluation study

    Get PDF
    The biggest obstacle of automated software testing is the construction of test oracles. Today, it is possible to generate enormous amount of test cases for an arbitrary system that reach a remarkably high level of coverage, but the effectiveness of test cases is limited by the availability of test oracles that can distinguish failing executions. Previous work by the authors has explored the use of unsupervised and semi-supervised learning techniques to develop test oracles so that the correctness of software outputs and behaviours on new test cases can be predicated [1], [2], [10], and experimental results demonstrate the promise of this approach. In this paper, we present an evaluation study for test oracles based on machine-learning approaches via dynamic execution data (firstly, input/output pairs and secondly, amalgamations of input/output pairs and execution traces) by comparing their effectiveness with existing techniques from the specification mining domain (the data invariant detector Daikon [5]). The two approaches are evaluated on a range of mid-sized systems and compared in terms of their fault detection ability and false positive rate. The empirical study also discuss the major limitations and the most important properties related to the application of machine learning techniques as test oracles in practice. The study also gives a road map for further research direction in order to tackle some of discussed limitations such as accuracy and scalability. The results show that in most cases semi-supervised learning techniques performed far better as an automated test classifier than Daikon (especially in the case that input/output pairs were augmented with their execution traces). However, there is one system for which our strategy struggles and Daikon performed far better. Furthermore, unsupervised learning techniques performed on a par when compared with Daikon in several cases particularly when input/output pairs were used together with execution traces

    BehAVExplor: Behavior Diversity Guided Testing for Autonomous Driving Systems

    Full text link
    Testing Autonomous Driving Systems (ADSs) is a critical task for ensuring the reliability and safety of autonomous vehicles. Existing methods mainly focus on searching for safety violations while the diversity of the generated test cases is ignored, which may generate many redundant test cases and failures. Such redundant failures can reduce testing performance and increase failure analysis costs. In this paper, we present a novel behavior-guided fuzzing technique (BehAVExplor) to explore the different behaviors of the ego vehicle (i.e., the vehicle controlled by the ADS under test) and detect diverse violations. Specifically, we design an efficient unsupervised model, called BehaviorMiner, to characterize the behavior of the ego vehicle. BehaviorMiner extracts the temporal features from the given scenarios and performs a clustering-based abstraction to group behaviors with similar features into abstract states. A new test case will be added to the seed corpus if it triggers new behaviors (e.g., cover new abstract states). Due to the potential conflict between the behavior diversity and the general violation feedback, we further propose an energy mechanism to guide the seed selection and the mutation. The energy of a seed quantifies how good it is. We evaluated BehAVExplor on Apollo, an industrial-level ADS, and LGSVL simulation environment. Empirical evaluation results show that BehAVExplor can effectively find more diverse violations than the state-of-the-art

    Machine learning-based test selection for simulation-based testing of self-driving cars software

    Get PDF
    Simulation platforms facilitate the development of emerging Cyber-Physical Systems (CPS) like self-driving cars (SDC) because they are more efficient and less dangerous than eld operational test cases. Despite this, thoroughly testing SDCs in simulated environments remains challenging because SDCs must be tested in a sheer amount of long-running test cases. Past results on software testing optimization have shown that not all the test cases contribute equally to establishing con dence in test subjects' quality and reliability, and the execution of \safe and uninformative" test cases can be skipped to reduce testing effort. However, this problem is only partially addressed in the context of SDC simulation platforms. In this paper, we investigate test selection strategies to increase the cost-effectiveness of simulation-based testing in the context of SDCs. We propose an approach called SDC-Scissor (SDC coSt-effeCtIve teSt SelectOR) that leverages Machine Learning (ML) strategies to identify and skip test cases that are unlikely to detect faults in SDCs before executing them

    Quality Assessment Methods for Textual Conversational Interfaces: A Multivocal Literature Review

    Get PDF
    The evaluation and assessment of conversational interfaces is a complex task since such software products are challenging to validate through traditional testing approaches. We conducted a systematic Multivocal Literature Review (MLR), on five different literature sources, to provide a view on quality attributes, evaluation frameworks, and evaluation datasets proposed to provide aid to the researchers and practitioners of the field. We came up with a final pool of 118 contributions, including grey (35) and white literature (83). We categorized 123 different quality attributes and metrics under ten different categories and four macro-categories: Relational, Conversational, User-Centered and Quantitative attributes. While Relational and Conversational attributes are most commonly explored by the scientific literature, we testified a predominance of User-Centered Attributes in industrial literature. We also identified five different academic frameworks/tools to automatically compute sets of metrics, and 28 datasets (subdivided into seven different categories based on the type of data contained) that can produce conversations for the evaluation of conversational interfaces. Our analysis of literature highlights that a high number of qualitative and quantitative attributes are available in the literature to evaluate the performance of conversational interfaces. Our categorization can serve as a valid entry point for researchers and practitioners to select the proper functional and non-functional aspects to be evaluated for their products
    corecore