88 research outputs found

    IFIX: Fixing concurrency bugs while they are introduced

    Get PDF

    Understanding Concurrency Vulnerabilities in Linux Kernel

    Full text link
    While there is a large body of work on analyzing concurrency related software bugs and developing techniques for detecting and patching them, little attention has been given to concurrency related security vulnerabilities. The two are different in that not all bugs are vulnerabilities: for a bug to be exploitable, there needs be a way for attackers to trigger its execution and cause damage, e.g., by revealing sensitive data or running malicious code. To fill the gap, we conduct the first empirical study of concurrency vulnerabilities reported in the Linux operating system in the past ten years. We focus on analyzing the confirmed vulnerabilities archived in the Common Vulnerabilities and Exposures (CVE) database, which are then categorized into different groups based on bug types, exploit patterns, and patch strategies adopted by developers. We use code snippets to illustrate individual vulnerability types and patch strategies. We also use statistics to illustrate the entire landscape, including the percentage of each vulnerability type. We hope to shed some light on the problem, e.g., concurrency vulnerabilities continue to pose a serious threat to system security, and it is difficult even for kernel developers to analyze and patch them. Therefore, more efforts are needed to develop tools and techniques for analyzing and patching these vulnerabilities.Comment: It was finished in Oct 201


    Get PDF
    Modern computer software systems are complicated. Developers can change the behavior of the software system through software configurations. The large number of configuration option and their interactions make the task of software tuning, testing, and debugging very challenging. Performance is one of the key aspects of non-functional qualities, where performance bugs can cause significant performance degradation and lead to poor user experience. However, performance bugs are difficult to expose, primarily because detecting them requires specific inputs, as well as specific configurations. While researchers have developed techniques to analyze, quantify, detect, and fix performance bugs, many of these techniques are not effective in highly-configurable systems. To improve the non-functional qualities of configurable software systems, testing engineers need to be able to understand the performance influence of configuration options, adjust the performance of a system under different configurations, and detect configuration-related performance bugs. This research will provide an automated framework that allows engineers to effectively analyze performance-influence configuration options, detect performance bugs in highly-configurable software systems, and adjust configuration options to achieve higher long-term performance gains. To understand real-world performance bugs in highly-configurable software systems, we first perform a performance bug characteristics study from three large-scale opensource projects. Many researchers have studied the characteristics of performance bugs from the bug report but few have reported what the experience is when trying to replicate confirmed performance bugs from the perspective of non-domain experts such as researchers. This study is meant to report the challenges and potential workaround to replicate confirmed performance bugs. We also want to share a performance benchmark to provide real-world performance bugs to evaluate future performance testing techniques. Inspired by our performance bug study, we propose a performance profiling approach that can help developers to understand how configuration options and their interactions can influence the performance of a system. The approach uses a combination of dynamic analysis and machine learning techniques, together with configuration sampling techniques, to profile the program execution, analyze configuration options relevant to performance. Next, the framework leverages natural language processing and information retrieval techniques to automatically generate test inputs and configurations to expose performance bugs. Finally, the framework combines reinforcement learning and dynamic state reduction techniques to guide subject application towards achieving higher long-term performance gains

    Demystifying Dependency Bugs in Deep Learning Stack

    Full text link
    Deep learning (DL) applications, built upon a heterogeneous and complex DL stack (e.g., Nvidia GPU, Linux, CUDA driver, Python runtime, and TensorFlow), are subject to software and hardware dependencies across the DL stack. One challenge in dependency management across the entire engineering lifecycle is posed by the asynchronous and radical evolution and the complex version constraints among dependencies. Developers may introduce dependency bugs (DBs) in selecting, using and maintaining dependencies. However, the characteristics of DBs in DL stack is still under-investigated, hindering practical solutions to dependency management in DL stack. To bridge this gap, this paper presents the first comprehensive study to characterize symptoms, root causes and fix patterns of DBs across the whole DL stack with 446 DBs collected from StackOverflow posts and GitHub issues. For each DB, we first investigate the symptom as well as the lifecycle stage and dependency where the symptom is exposed. Then, we analyze the root cause as well as the lifecycle stage and dependency where the root cause is introduced. Finally, we explore the fix pattern and the knowledge sources that are used to fix it. Our findings from this study shed light on practical implications on dependency management

    Automated testing for GPU kernels

    Get PDF
    Graphics Processing Units (GPUs) are massively parallel processors offering performance acceleration and energy efficiency unmatched by current processors (CPUs) in computers. These advantages along with recent advances in the programmability of GPUs have made them widely used in various general-purpose computing domains. However, this has also made testing GPU kernels critical to ensure that their behaviour meets the requirements of the design and specification. Despite the advances in programmability, GPU kernels are hard to code and analyse due to the high complexity of memory sharing patterns, striding patterns for memory accesses, implicit synchronisation, and combinatorial explosion of thread interleavings. Existing few techniques for testing GPU kernels use symbolic execution for test generation that incur a high overhead, have limited scalability and do not handle all data types. In this thesis, we present novel approaches to measure test effectiveness and generate tests automatically for GPU kernels. To achieve this, we address significant challenges related to the GPU execution and memory model, and the lack of customised thread scheduling and global synchronisation. We make the following contributions: First, we present a framework, CLTestCheck, for assessing the quality of test suites developed for GPU kernels. The framework can measure code coverage using three different coverage metrics that are inspired by faults found in real kernel code. Fault finding capability of the test suite is also measured by the framework to seed different types of faults in the kernel and reported in the form of mutation score, which is the ratio of the number of uncovered faults to the total number of seeded faults. Second, with the goal of being fast, effective and scalable, we propose a test generation technique, CLFuzz, for GPU kernels that combines mutation-based fuzzing for fast test generation and selective SMT solving to help cover unreachable branches by fuzzing. Fuzz testing for GPU kernels has not been explored previously. Our approach for fuzz testing randomly mutates input kernel argument values with the goal of increasing branch coverage and supports GPU-specific data types such as images. When fuzz testing is unable to increase branch coverage with random mutations, we gather path constraints for uncovered branch conditions, build additional constraints to represent the context of GPU execution such as number of threads and work-group size, and invoke the Z3 constraint solver to generate tests for them. Finally, to help uncover inter work-group data races and replay these bugs with fixed work-group schedules, we present a schedule amplifier, CLSchedule, that simulates multiple work-group schedules, with which to execute each of the generated tests. By reimplementing the OpenCL API, CLSchedule executes the kernel with a fixed work-group schedule rather than the default arbitrary schedule. It also executes the kernel directly, without requiring the developer to manually provide boilerplate host code. The outcome of our research can be summarised as follows: 1. CLTestCheck is applied to 82 publicly available GPU kernels from industry-standard benchmark suites along with their test suites. The experiment reveals that CLTestCheck is capable of automatically measuring the effectiveness of test suites, in terms of code coverage, faulting finding capability and revealing data races in real OpenCL kernels. 2. CLFuzz can automatically generate tests and achieve close to 100% coverage and mutation score for the majority of the data set of 217 GPU kernels collected from open-source projects and industry-standard benchmarks. 3. CLSchedule is capable of exploring the effect of work-group schedules on the 217 GPU kernels and uncovers data races in 21 of them. The techniques developed in this thesis demonstrate that we can measure the effectiveness of tests developed for GPU kernels with our coverage criteria and fault seeding methods. The result is useful in highlighting code portions that may need developers' further attention. Our automated test generation and work-group scheduling approaches are also fast, effective and scalable, with small overhead incurred (average of 0.8 seconds) and scalability to large kernels with complex data structures

    Automated Regression Testing and Verification of Complex Code Changes

    Get PDF

    병렬 및 분산 임베디드 시스템을 위한 모델 기반 코드 생성 프레임워크

    Get PDF
    학위논문(박사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2020. 2. 하순회.소프트웨어 설계 생산성 및 유지보수성을 향상시키기 위해 다양한 소프트웨어 개발 방법론이 제안되었지만, 대부분의 연구는 응용 소프트웨어를 하나의 프로세서에서 동작시키는 데에 초점을 맞추고 있다. 또한, 임베디드 시스템을 개발하는 데에 필요한 지연이나 자원 요구 사항에 대한 비기능적 요구 사항을 고려하지 않고 있기 때문에 일반적인 소프트웨어 개발 방법론을 임베디드 소프트웨어를 개발하는 데에 적용하는 것은 적합하지 않다. 이 논문에서는 병렬 및 분산 임베디드 시스템을 대상으로 하는 소프트웨어를 모델로 표현하고, 이를 소프트웨어 분석이나 개발에 활용하는 개발 방법론을 소개한다. 우리의 모델에서 응용 소프트웨어는 계층적으로 표현할 수 있는 여러 개의 태스크로 이루어져 있으며, 하드웨어 플랫폼과 독립적으로 명세한다. 태스크 간의 통신 및 동기화는 모델이 정의한 규약이 정해져 있고, 이러한 규약을 통해 실제 프로그램을 실행하기 전에 소프트웨어 에러를 정적 분석을 통해 확인할 수 있고, 이는 응용의 검증 복잡도를 줄이는 데에 기여한다. 지정한 하드웨어 플랫폼에서 동작하는 프로그램은 태스크들을 프로세서에 매핑한 이후에 자동적으로 합성할 수 있다. 위의 모델 기반 소프트웨어 개발 방법론에서 사용하는 프로그램 합성기를 본 논문에서 제안하였는데, 명세한 플랫폼 요구 사항을 바탕으로 병렬 및 분산 임베디드 시스템을에서 동작하는 코드를 생성한다. 여러 개의 정형적 모델들을 계층적으로 표현하여 응용의 동적 행태를 나타고, 합성기는 여러 모델로 구성된 계층적인 모델로부터 병렬성을 고려하여 태스크를 실행할 수 있다. 또한, 프로그램 합성기에서 다양한 플랫폼이나 네트워크를 지원할 수 있도록 코드를 관리하는 방법도 보여주고 있다. 본 논문에서 제시하는 소프트웨어 개발 방법론은 6개의 하드웨어 플랫폼과 3 종류의 네트워크로 구성되어 있는 실제 감시 소프트웨어 시스템 응용 예제와 이종 멀티 프로세서를 활용하는 원격 딥 러닝 예제를 수행하여 개발 방법론의 적용 가능성을 시험하였다. 또한, 프로그램 합성기가 새로운 플랫폼이나 네트워크를 지원하기 위해 필요로 하는 개발 비용도 실제 측정 및 예측하여 상대적으로 적은 노력으로 새로운 플랫폼을 지원할 수 있음을 확인하였다. 많은 임베디드 시스템에서 예상치 못한 하드웨어 에러에 대해 결함을 감내하는 것을 필요로 하기 때문에 결함 감내에 대한 코드를 자동으로 생성하는 연구도 진행하였다. 본 기법에서 결함 감내 설정에 따라 태스크 그래프를 수정하는 방식을 활용하였으며, 결함 감내의 비기능적 요구 사항을 응용 개발자가 쉽게 적용할 수 있도록 하였다. 또한, 결함 감내 지원하는 것과 관련하여 실제 수동으로 구현했을 경우와 비교하였고, 결함 주입 도구를 이용하여 결함 발생 시나리오를 재현하거나, 임의로 결함을 주입하는 실험을 수행하였다. 마지막으로 결함 감내를 실험할 때에 활용한 결함 주입 도구는 본 논문의 또 다른 기여 사항 중 하나로 리눅스 환경으로 대상으로 응용 영역 및 커널 영역에 결함을 주입하는 도구를 개발하였다. 시스템의 견고성을 검증하기 위해 결함을 주입하여 결함 시나리오를 재현하는 것은 널리 사용되는 방법으로, 본 논문에서 개발된 결함 주입 도구는 시스템이 동작하는 도중에 재현 가능한 결함을 주입할 수 있는 도구이다. 커널 영역에서의 결함 주입을 위해 두 종류의 결함 주입 방법을 제공하며, 하나는 커널 GNU 디버거를 이용한 방법이고, 다른 하나는 ARM 하드웨어 브레이크포인트를 활용한 방법이다. 응용 영역에서 결함을 주입하기 위해 GDB 기반 결함 주입 방법을 이용하여 동일 시스템 혹은 원격 시스템의 응용에 결함을 주입할 수 있다. 결함 주입 도구에 대한 실험은 ODROID-XU4 보드에서 진행하였다.While various software development methodologies have been proposed to increase the design productivity and maintainability of software, they usually focus on the development of application software running on a single processing element, without concern about the non-functional requirements of an embedded system such as latency and resource requirements. In this thesis, we present a model-based software development method for parallel and distributed embedded systems. An application is specified as a set of tasks that follow a set of given rules for communication and synchronization in a hierarchical fashion, independently of the hardware platform. Having such rules enables us to perform static analysis to check some software errors at compile time to reduce the verification difficulty. Platform-specific program is synthesized automatically after mapping of tasks onto processing elements is determined. The program synthesizer is also proposed to generate codes which satisfies platform requirements for parallel and distributed embedded systems. As multiple models which can express dynamic behaviors can be depicted hierarchically, the synthesizer supports to manage multiple task graphs with a different hierarchy to run tasks with parallelism. Also, the synthesizer shows methods of managing codes for heterogeneous platforms and generating various communication methods. The viability of the proposed software development method is verified with a real-life surveillance application that runs on six processing elements with three remote communication methods, and remote deep learning example is conducted to use heterogeneous multiprocessing components on distributed systems. Also, supporting a new platform and network requires a small effort by measuring and estimating development costs. Since tolerance to unexpected errors is a required feature of many embedded systems, we also support an automatic fault-tolerant code generation. Fault tolerance can be applied by modifying the task graph based on the selected fault tolerance configurations, so the non-functional requirement of fault tolerance can be easily adopted by an application developer. To compare the effort of supporting fault tolerance, manual implementation of fault tolerance is performed. Also, the fault tolerance method is tested with the fault injection tool to emulate fault scenarios and inject faults randomly. Our fault injection tool, which has used for testing our fault-tolerance method, is another work of this thesis. Emulating fault scenarios by intentionally injecting faults is commonly used to test and verify the robustness of a system. To emulate faults on an embedded system, we present a run-time fault injection framework that can inject a fault on both a kernel and application layer of Linux-based systems. For injecting faults on a kernel layer, two complementary fault injection techniques are used. One is based on Kernel GNU Debugger, and the other is using a hardware breakpoint supported by the ARM architecture. For application-level fault injection, the GDB-based fault injection method is used to inject a fault on a remote application. The viability of the proposed fault injection tool is proved by real-life experiments with an ODROID-XU4 system.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contribution 6 1.3 Dissertation Organization 8 Chapter 2 Background 9 2.1 HOPES: Hope of Parallel Embedded Software 9 2.1.1 Software Development Procedure 9 2.1.2 Components of HOPES 12 2.2 Universal Execution Model 13 2.2.1 Task Graph Specification 13 2.2.2 Dataflow specification of an Application 15 2.2.3 Task Code Specification and Generic APIs 21 2.2.4 Meta-data Specification 23 Chapter 3 Program Synthesis for Parallel and Distributed Embedded Systems 24 3.1 Motivational Example 24 3.2 Program Synthesis Overview 26 3.3 Program Synthesis from Hierarchically-mixed Models 30 3.4 Platform Code Synthesis 33 3.5 Communication Code Synthesis 36 3.6 Experiments 40 3.6.1 Development Cost of Supporting New Platforms and Networks 40 3.6.2 Program Synthesis for the Surveillance System Example 44 3.6.3 Remote GPU-accelerated Deep Learning Example 46 3.7 Document Generation 48 3.8 Related Works 49 Chapter 4 Model Transformation for Fault-tolerant Code Synthesis 56 4.1 Fault-tolerant Code Synthesis Techniques 56 4.2 Applying Fault Tolerance Techniques in HOPES 61 4.3 Experiments 62 4.3.1 Development Cost of Applying Fault Tolerance 62 4.3.2 Fault Tolerance Experiments 62 4.4 Random Fault Injection Experiments 65 4.5 Related Works 68 Chapter 5 Fault Injection Framework for Linux-based Embedded Systems 70 5.1 Background 70 5.1.1 Fault Injection Techniques 70 5.1.2 Kernel GNU Debugger 71 5.1.3 ARM Hardware Breakpoint 72 5.2 Fault Injection Framework 74 5.2.1 Overview 74 5.2.2 Architecture 75 5.2.3 Fault Injection Techniques 79 5.2.4 Implementation 83 5.3 Experiments 90 5.3.1 Experiment Setup 90 5.3.2 Performance Comparison of Two Fault Injection Methods 90 5.3.3 Bit-flip Fault Experiments 92 5.3.4 eMMC Controller Fault Experiments 94 Chapter 6 Conclusion 97 Bibliography 99 요 약 108Docto

    Witness-based validation of verification results with applications to software-model checking

    Get PDF
    In the scientific world, formal verification is an established engineering technique to ensure the correctness of hardware and software systems. Because formal verification is an arduous and error-prone endeavor, automated solutions are desirable, and researchers continue to develop new algorithms and optimize existing ones to push the boundaries of what can be verified automatically. These efforts do not go unnoticed by the industry. Hardware-circuit designs, flight-control systems, and operating-system drivers are just a few examples of systems where formal verification is already part of the quality-assurance repertoire. Nevertheless, the primary fields of application for formal verification are mainly those where errors carry a high risk of significant damage, either financial or physical, because the costs of formal verification are considered to be too high for most other projects, despite the fact that the research community has made vast advancements regarding the effectiveness and efficiency of formal verification techniques in the last decades. We present and address two potential reasons for this discrepancy that we identified in the field of automated formal software verification. (1) Even for experts in the field, it is often difficult to decide which of the multitude of available techniques is the most suitable solution they should recommend to solve a given verification problem. Moreover, even if a suitable solution is found for a given system, there is no guarantee that the solution is sustainable as the system evolves. Consequently, the cost of finding and maintaining a suitable approach for applying formal software verification to real-world systems is high. (2) Even assuming that a suitable and maintainable solution for applying formal software verification to a given system is found and verification results could be obtained, developers of the system still require further guidance towards making practical use of these results, which often differ significantly from the results they obtain from classical quality-assurance techniques they are familiar with, such as testing. To mitigate the first issue, using the open-source software-verification framework CPAchecker, we investigate several popular formal software-verification techniques such as predicate abstraction, Impact, bounded model checking, k -induction, and PDR, and perform an extensive and rigorous experimental study to identify their strengths and weaknesses regarding their comparative effectiveness and efficiency when applied to a large and established benchmark set, to provide a basis for choosing the best technique for a given problem. To mitigate the second issue, we propose a concrete standard format for the representation and communication of verification results that raises the bar from plain "yes" or "no" answers to verification witnesses, which are valuable artifacts of the verification process that contain detailed information discovered during the analysis. We then use these verification witnesses for several applications: To increase the trust in verification results, we irst develop several independent validators based on violation witnesses, i.e. verification witnesses that represent bugs detected by a verifier. We then extend our validators to also erify the verification results obtained from a successful verification, which are represented y correctness witnesses. Lastly, we also develop an interactive web service to store and retrieve these verification witnesses, to provide online validation to quickly de-prioritize likely wrong results, and to graphically visualize the witnesses, as an example of how verification can be integrated into a development process. Since the introduction of our proposed standard format for verification witnesses, it has been adopted by over thirty different software verifiers, and our witness-based result-validation tools have become a core component in the scoring process of the International Competition on Software Verification.In der Welt der Wissenschaft gilt die Formale Verifikation als etablierte Methode, die Korrektheit von Hard- und Software zu gewährleisten. Da die Anwendung formaler Verifikation jedoch selbst ein beschwerliches und fehlerträchtiges Unterfangen darstellt, ist es erstrebenswert, automatisierte Lösungen dafür zu finden. Forscher entwickeln daher immer wieder neue Algorithmen Formaler Verifikation oder verbessern bereits existierende Algorithmen, um die Grenzen der Automatisierbarkeit Formaler Verifikation weiter und weiter zu dehnen. Auch die Industrie ist bereits auf diese Anstrengungen aufmerksam geworden. Flugsteuerungssysteme, Betriebssystemtreiber und Entwürfe von Hardware-Schaltungen sind nur einzelne Beispiele von Systemen, bei denen Formale Verifikation bereits heute einen festen Stammplatz im Arsenal der Qualitätssicherungsmaßnahmen eingenommen hat. Trotz alledem bleiben die primären Einsatzgebiete Formaler Verifikation jene, in denen Fehler ein hohes Risiko finanzieller oder physischer Schäden bergen, da in anderen Projekten die Kosten des Einsatzes Formaler Verifikation in der Regel als zu hoch empfunden werden, unbeachtet der Tatsache, dass es der Forschungsgemeinschaft in den letzten Jahrzehnten gelungen ist, enorme Fortschritte bei der Verbesserung der Effektivität und Effizienz Formaler Verifikationstechniken zu machen. Wir präsentieren und diskutieren zwei potenzielle Ursachen für diese Diskrepanz zwischen Forschung und Industrie, die wir auf dem Gebiet der Automatisierten Formalen Softwareverifikation identifiziert haben. (1) Sogar Fachleuten fällt es oft schwer, zu entscheiden, welche der zahlreichen verfügbaren Methoden sie als vielversprechendste Lösung eines gegebenen Verifikationsproblems empfehlen sollten. Darüber hinaus gibt es selbst dann, wenn eine passende Lösung für ein gegebenes System gefunden wird, keine Garantie, dass sich diese Lösung im Laufe der Evolution des Systems als Nachhaltig erweisen wird. Daher sind sowohl die Wahl als auch der Unterhalt eines passenden Ansatzes zur Anwendung Formaler Softwareverifikation auf reale Systeme kostspielige Unterfangen. (2) Selbst unter der Annahme, dass eine passende und wartbare Lösung zur Anwendung Formaler Softwareverifikation auf ein gegebenes System gefunden und Verifikationsergebnisse erzielt werden, benötigen die Entwickler des Systems immer noch weitere Unterstützung, um einen praktischen Nutzen aus den Ergebnissen ziehen zu können, die sich oft maßgeblich unterscheiden von den Ergebnissen jener klassischen Qualitätssicherungssysteme, mit denen sie vertraut sind, wie beispielsweise dem Testen. Um das erste Problem zu entschärfen, untersuchen wir unter Verwendung des Open-Source-Softwareverifikationsystems CPAchecker mehrere beliebte Formale Softwareverifikationsmethoden, wie beispielsweise Prädikatenabstraktion, Impact, Bounded-Model-Checking, k-Induktion und PDR, und führen umfangreiche und gründliche experimentelle Studien auf einem großen und etablierten Konvolut an Beispielprogrammen durch, um die Stärken und Schwächen dieser Methoden hinsichtlich ihrer relativen Effektivität und Effizienz zu ermitteln und daraus eine Entscheidungsgrundlage für die Wahl der besten Lösung für ein gegebenes Problem abzuleiten. Um das zweite Problem zu entschärfen, schlagen wir ein konkretes Standardformat zur Modellierung und zum Austausch von Verifikationsergebnissen vor, welches die Ansprüche an Verifikationsergebnisse anhebt, weg von einfachen "ja/nein"-Antworten und hin zu Verifikationszeugen (Verification Witnesses), bei denen es sich um wertvolle Produkte des Verifikationsprozesses handelt und die detaillierte, während der Analyse entdeckte Informationen enthalten. Wir stellen mehrere Anwendungsbeispiele für diese Verifikationszeugen vor: Um das Vertrauen in Verifikationsergebnisse zu erhöhen, entwickeln wir zunächst mehrere, voneinander unabhängige Validatoren, die Verletzungszeugen (Violation Witnesses) verwenden, also Verifikationszeugen, welche von einem Verifikationswerkzeug gefundene Spezifikationsverletzungen darstellen, Diese Validatoren erweitern wir anschließend so, dass sie auch in der Lage sind, die Verifikationsergebnisse erfolgreicher Verifikationen, also Korrektheitsbehauptungen, die durch Korrektheitszeugen (Correctness Witnesses) dokumentiert werden, nachzuvollziehen. Schlussendlich entwickeln wir als Beispiel für die Integrierbarkeit Formaler Verifikation in den Entwicklungsprozess einen interaktiven Webservice für die Speicherung und den Abruf von Verifikationzeugen, um einen Online-Validierungsdienst zur schnellen Depriorisierung mutmaßlich falscher Verifikationsergebnisse anzubieten und Verifikationszeugen graphisch darzustellen. Unser Vorschlag für ein Standardformat für Verifikationszeugen wurde inzwischen von mehr als dreißig verschiedenen Softwareverifikationswerkzeugen übernommen und unsere zeugen-basierten Validierungswerkzeuge sind zu einer Kernkomponente des Bewertungsschemas des Internationalen Softwareverifikationswettbewerbs geworden