5 research outputs found

    Learning to Accelerate Symbolic Execution via Code Transformation

    Get PDF
    Symbolic execution is an effective but expensive technique for automated test generation. Over the years, a large number of refined symbolic execution techniques have been proposed to improve its efficiency. However, the symbolic execution efficiency problem remains, and largely limits the application of symbolic execution in practice. Orthogonal to refined symbolic execution, in this paper we propose to accelerate symbolic execution through semantic-preserving code transformation on the target programs. During the initial stage of this direction, we adopt a particular code transformation, compiler optimization, which is initially proposed to accelerate program concrete execution by transforming the source program into another semantic-preserving target program with increased efficiency (e.g., faster or smaller). However, compiler optimizations are mostly designed to accelerate program concrete execution rather than symbolic execution. Recent work also reported that unified settings on compiler optimizations that can accelerate symbolic execution for any program do not exist at all. Therefore, in this work we propose a machine-learning based approach to tuning compiler optimizations to accelerate symbolic execution, whose results may also aid further design of specific code transformations for symbolic execution. In particular, the proposed approach LEO separates source-code functions and libraries through our program-splitter, and predicts individual compiler optimization (i.e., whether a type of code transformation is chosen) separately through analyzing the performance of existing symbolic execution. Finally, LEO applies symbolic execution on the code transformed by compiler optimization (through our local-optimizer). We conduct an empirical study on GNU Coreutils programs using the KLEE symbolic execution engine. The results show that LEO significantly accelerates symbolic execution, outperforming the default KLEE configurations (i.e., turning on/off all compiler optimizations) in various settings, e.g., with the default training/testing time, LEO achieves the highest line coverage in 50/68 programs, and its average improvement rate on all programs is 46.48%/88.92% in terms of line coverage compared with turning on/off all compiler optimizations

    Ієрархічна модель систем автоматизованого генерування модульних тестів

    Get PDF
    The study aims to further enhance the automated unit test generation methods by creating new generalized model of the unit test generation systems that supports metaheuristic and parallel computations. The study uses analysis method to find the current issues of the test generation task, modeling, generalization, and synthesis to propose producing an enhanced model of the test generation systems. A current main method for unit test generation is symbolic execution. Its capabilities are limited by high computation complexity and lack of universality, however many methods exist for boosting the performance in particular cases. The paper classifies current methods based on the input data used and the methods of data processing. The most advanced methods use several sources of input data or several data analysis techniques. As a result of classification, the paper proposes the hierarchical model, which takes into account variations of a symbolic execution unit test generation technique. There are 4 layers in the proposed model. The first layer contains the main algorithm. Consequently, it is responsible for overall process control and test generation. The second layer is a "bridge" between the additional heuristic or artificial intelligence and the main algorithm. Moreover, it combines the output of the third layer (solved logical conditions) with its own higher level rules and heuristic. The third layers consists of SMT (Satisfiability modulo theories)-solvers aimed at solving sets of logical conditions. The fourth layer is the individual algorithms employed by the SMT-solvers. The benefits of proposed model are universality, modularity, and potential to use metaheuristics in every layer of the model. Furthermore, it allows additionally splitting the computation process into layers and performing load balancing on each layer individually. The future work includes building up new systems and creating new metaheuristics based on the proposed model.Описано особливості проблеми тестування програмного забезпечення (ПЗ) за допомогою автоматизованих систем генерування модульних тестів. Проаналізовано методи автоматизованого модульного тестування, що використовуються для тестування ПЗ. Виконано класифікацію методів генерування модульних тестів на підставі вхідних даних і засобів для генерування тестів. Показано, що компільований байт-код та граф контролю потоку є основними видами вхідних даних, а символьне виконання є основним методом для генерування модульних тестів. Систематизовано новітні методи автоматизованого модульного тестування: символьне виконання з використанням штучних нейронних мереж, додаткової логіки та оптимізаційних алгоритмів. Проаналізовано можливості застосування мета- та гіперевристик системами автоматизованого генерування модульних тестів. Побудовано їх ієрархічну модель: до четвертого рівня віднесено пошукові алгоритми для аналізу умов у коді; до третього – SMT-бібліотеки, які містять множину алгоритмів першого рівня та стратегії їх використання; до другого – поєднання результатів роботи SMT-бібліотеки з результатами роботи додаткової логіки; до першого – алгоритм управління, що керують процесом генерування тестів. Описано можливості виконання паралельних обчислень на всіх рівнях ієрархії. Продемонстровано наявність вузьких місць у реалізаціях систем генерування модульних тестів. Запропоновано розподіл завдання генерування модульних тестів на підставі рівнів ієрархії моделі, що дає змогу обійти вузькі місця поточних систем та покращити масштабованість. Розроблено UML-діаграму класів на запропонованій моделі. Запропоновано одночасне використання метаевристик на всіх ієрархічних рівнях моделі для підвищення якості згенерованих тестів, що покращить універсальність і модульність системи. Обґрунтовано потребу подальшого розроблення нових методів для підвищення ефективності алгоритмів генерування тестів та якості тестування

    Security of Cyber-Physical Systems

    Get PDF
    Cyber-physical system (CPS) innovations, in conjunction with their sibling computational and technological advancements, have positively impacted our society, leading to the establishment of new horizons of service excellence in a variety of applicational fields. With the rapid increase in the application of CPSs in safety-critical infrastructures, their safety and security are the top priorities of next-generation designs. The extent of potential consequences of CPS insecurity is large enough to ensure that CPS security is one of the core elements of the CPS research agenda. Faults, failures, and cyber-physical attacks lead to variations in the dynamics of CPSs and cause the instability and malfunction of normal operations. This reprint discusses the existing vulnerabilities and focuses on detection, prevention, and compensation techniques to improve the security of safety-critical systems

    Leveraging Machine Learning to Improve Software Reliability

    Get PDF
    Finding software faults is a critical task during the lifecycle of a software system. While traditional software quality control practices such as statistical defect prediction, static bug detection, regression test, and code review are often inefficient and time-consuming, which cannot keep up with the increasing complexity of modern software systems. We argue that machine learning with its capability in knowledge representation, learning, natural language processing, classification, etc., can be used to extract invaluable information from software artifacts that may be difficult to obtain with other research methodologies to improve existing software reliability practices such as statistical defect prediction, static bug detection, regression test, and code review. This thesis presents a suite of machine learning based novel techniques to improve existing software reliability practices for helping developers find software bugs more effective and efficient. First, it introduces a deep learning based defect prediction technique to improve existing statistical defect prediction models. To build accurate prediction models, previous studies focused on manually designing features that encode the statistical characteristics of programs. However, these features often fail to capture the semantic difference of programs, and such a capability is needed for building accurate prediction models. To bridge the gap between programs' semantics and defect prediction features, this thesis leverages deep learning techniques to learn a semantic representation of programs automatically from source code and further build and train defect prediction models by using these semantic features. We examine the effectiveness of the deep learning based prediction models on both the open-source and commercial projects. Results show that the learned semantic features can significantly outperform existing defect prediction models. Second, it introduces an n-gram language based static bug detection technique, i.e., Bugram, to detect new types of bugs with less false positives. Most of existing static bug detection techniques are based on programming rules inferred from source code. It is known that if a pattern does not appear frequently enough, rules are not learned, thus missing many bugs. To solve this issue, this thesis proposes Bugram, which leverages n-gram language models instead of rules to detect bugs. Specifically, Bugram models program tokens sequentially, using the n-gram language model. Token sequences from the program are then assessed according to their probability in the learned model, and low probability sequences are marked as potential bugs. The assumption is that low probability token sequences in a program are unusual, which may indicate bugs, bad practices, or unusual/special uses of code of which developers may want to be aware. We examine the effectiveness of our approach on the latest versions of 16 open-source projects. Results show that Bugram detected 25 new bugs, 23 of which cannot be detected by existing rule-based bug detection approaches, which suggests that Bugram is complementary to existing bug detection approaches to detect more bugs and generates less false positives. Third, it introduces a machine learning based regression test prioritization technique, i.e., QTEP, to find and run test cases that could reveal bugs earlier. Existing test case prioritization techniques mainly focus on maximizing coverage information between source code and test cases to schedule test cases for finding bugs earlier. While they often do not consider the likely distribution of faults in the source code. However, software faults are not often equally distributed in source code, e.g., around 80\% faults are located in about 20\% source code. Intuitively, test cases that cover the faulty source code should have higher priorities, since they are more likely to find faults. To solve this issue, this thesis proposes QTEP, which leverages machine learning models to evaluate source code quality and then adapt existing test case prioritization algorithms by considering the weighted source code quality. Evaluation on seven open-source projects shows that QTEP can significantly outperform existing test case prioritization techniques to find failed test cases early. Finally, it introduces a machine learning based approach to identifying risky code review requests. Code review has been widely adopted in the development process of both the proprietary and open-source software, which helps improve the maintenance and quality of software before the code changes being merged into the source code repository. Our observation on code review requests from four large-scale projects reveals that around 20\% changes cannot pass the first round code review and require non-trivial revision effort (i.e., risky changes). In addition, resolving these risky changes requires 3X more time and 1.6X more reviewers than the regular changes (i.e., changes pass the first code review) on average. This thesis presents the first study to characterize these risky changes and automatically identify these risky changes with machine learning classifiers. Evaluation on one proprietary project and three large-scale open-source projects (i.e., Qt, Android, and OpenStack) shows that our approach is effective in identifying risky code review requests. Taken together, the results of the four studies provide evidence that machine learning can help improve traditional software reliability such as statistical defect prediction, static bug detection, regression test, and code review

    Leveraging Machine Learning to Improve Software Reliability

    Get PDF
    Finding software faults is a critical task during the lifecycle of a software system. While traditional software quality control practices such as statistical defect prediction, static bug detection, regression test, and code review are often inefficient and time-consuming, which cannot keep up with the increasing complexity of modern software systems. We argue that machine learning with its capability in knowledge representation, learning, natural language processing, classification, etc., can be used to extract invaluable information from software artifacts that may be difficult to obtain with other research methodologies to improve existing software reliability practices such as statistical defect prediction, static bug detection, regression test, and code review. This thesis presents a suite of machine learning based novel techniques to improve existing software reliability practices for helping developers find software bugs more effective and efficient. First, it introduces a deep learning based defect prediction technique to improve existing statistical defect prediction models. To build accurate prediction models, previous studies focused on manually designing features that encode the statistical characteristics of programs. However, these features often fail to capture the semantic difference of programs, and such a capability is needed for building accurate prediction models. To bridge the gap between programs' semantics and defect prediction features, this thesis leverages deep learning techniques to learn a semantic representation of programs automatically from source code and further build and train defect prediction models by using these semantic features. We examine the effectiveness of the deep learning based prediction models on both the open-source and commercial projects. Results show that the learned semantic features can significantly outperform existing defect prediction models. Second, it introduces an n-gram language based static bug detection technique, i.e., Bugram, to detect new types of bugs with less false positives. Most of existing static bug detection techniques are based on programming rules inferred from source code. It is known that if a pattern does not appear frequently enough, rules are not learned, thus missing many bugs. To solve this issue, this thesis proposes Bugram, which leverages n-gram language models instead of rules to detect bugs. Specifically, Bugram models program tokens sequentially, using the n-gram language model. Token sequences from the program are then assessed according to their probability in the learned model, and low probability sequences are marked as potential bugs. The assumption is that low probability token sequences in a program are unusual, which may indicate bugs, bad practices, or unusual/special uses of code of which developers may want to be aware. We examine the effectiveness of our approach on the latest versions of 16 open-source projects. Results show that Bugram detected 25 new bugs, 23 of which cannot be detected by existing rule-based bug detection approaches, which suggests that Bugram is complementary to existing bug detection approaches to detect more bugs and generates less false positives. Third, it introduces a machine learning based regression test prioritization technique, i.e., QTEP, to find and run test cases that could reveal bugs earlier. Existing test case prioritization techniques mainly focus on maximizing coverage information between source code and test cases to schedule test cases for finding bugs earlier. While they often do not consider the likely distribution of faults in the source code. However, software faults are not often equally distributed in source code, e.g., around 80\% faults are located in about 20\% source code. Intuitively, test cases that cover the faulty source code should have higher priorities, since they are more likely to find faults. To solve this issue, this thesis proposes QTEP, which leverages machine learning models to evaluate source code quality and then adapt existing test case prioritization algorithms by considering the weighted source code quality. Evaluation on seven open-source projects shows that QTEP can significantly outperform existing test case prioritization techniques to find failed test cases early. Finally, it introduces a machine learning based approach to identifying risky code review requests. Code review has been widely adopted in the development process of both the proprietary and open-source software, which helps improve the maintenance and quality of software before the code changes being merged into the source code repository. Our observation on code review requests from four large-scale projects reveals that around 20\% changes cannot pass the first round code review and require non-trivial revision effort (i.e., risky changes). In addition, resolving these risky changes requires 3X more time and 1.6X more reviewers than the regular changes (i.e., changes pass the first code review) on average. This thesis presents the first study to characterize these risky changes and automatically identify these risky changes with machine learning classifiers. Evaluation on one proprietary project and three large-scale open-source projects (i.e., Qt, Android, and OpenStack) shows that our approach is effective in identifying risky code review requests. Taken together, the results of the four studies provide evidence that machine learning can help improve traditional software reliability such as statistical defect prediction, static bug detection, regression test, and code review
    corecore