344 research outputs found

    Localizing Defects in Multithreaded Programs by Mining Dynamic Call Graphs

    Get PDF
    Writing multithreaded software for multicore computers confronts many developers with the difficulty of finding parallel programming errors. In the past, most parallel debugging techniques have concentrated on finding race conditions due to wrong usage of synchronization constructs. A widely unexplored issue, however, is that a wrong usage of non-parallel programming constructs may also cause wrong parallel application behavior. This paper presents a novel defect-localization technique for multithreaded shared-memory programs that is based on analyzing execution anomalies. Compared to race detectors that report just on wrong synchronization, this method can detect a wider range of defects affecting parallel execution. It works on a condensed representation of the call graphs of multithreaded applications and employs data-mining techniques to locate a method containing a defect. Our results from controlled application experiments show that we found race conditions, but also other programming errors leading to incorrect parallel program behavior. On average, our approach reduced in our benchmark the amount of code to be inspected to just 7.1% of all methods

    Characterizing and Optimizing the Performance of the MAESTRO 49-core Processor

    Get PDF
    As space-based imagery-intelligence systems become increasingly complex, processing units are needed that can process the extra data these systems seek to collect. However, the space environment presents a number of threats, such as ambient or malicious radiation, that can damage and otherwise interfere with electronic systems. There is a need, then, for processors that can tolerate radiation-induced faults, and that also have sufficient computational power to handle the large flow of data they encounter. This research investigates one potential solution: a multi-core processor that is radiation-hardened and designed to provide highly parallelized MIMD execution of applicable workloads. A variety of benchmarking programs are used to explore the capabilities of this processor. Additionally, the source code is modified in an attempt to enhance the processor speed and efficiency; the consequent improvements in performance are documented

    Localizing Defects in Multithreaded Programs by Mining Dynamic Call Graphs

    Get PDF
    Writing multithreaded software for multicore computers confronts many developers with the difficulty of finding parallel programming errors. In the past, most parallel debugging techniques have concentrated on finding race conditions due to wrong usage of synchronization constructs. A widely unexplored issue, however, is that a wrong usage of non-parallel programming constructs may also cause wrong parallel application behavior. This paper presents a novel defect-localization technique for multithreaded shared-memory programs that is based on analyzing execution anomalies. Compared to race detectors that report just on wrong synchronization, this method can detect a wider range of defects affecting parallel execution. It works on a condensed representation of the call graphs of multithreaded applications and employs data-mining techniques to locate a method containing a defect. Our results from controlled application experiments show that we found race conditions, but also other programming errors leading to incorrect parallel program behavior. On average, our approach reduced in our benchmark the amount of code to be inspected to just 7.1% of all methods

    Detailed Low-cost Energy and Power Monitoring of Computing Systems

    Get PDF
    Power and energy are increasingly important metrics in modern computing systems. Large supercomputers utilize millions of cores and can consume as much power as a small town; monitoring and reducing power consumption is an important task. At the other extreme, power usage of embedded and mobile devices is also critically important. Battery life is a key concern in such devices; having detailed power measurement allows optimizing these devices for power as well. Current systems are not set up to allow easy power measurement. There has been much work in this area, but obtaining power readings is often expensive, intrusive, and not well validated. In this work we propose a low-cost, easy-to-use, power measurement methodology that can be used in both high-end servers and low-end embedded systems. We then validate the results gathered against existing power measurement systems. We extend the existing Linux perf utility so that it can provide real-world fine-grained power measurements, allowing users easy access to these values, enabling new advanced power optimization opportunities

    Benchmarking implementations of functional languages with ‘Pseudoknot', a float-intensive benchmark

    Get PDF
    Over 25 implementations of different functional languages are benchmarked using the same program, a floating-point intensive application taken from molecular biology. The principal aspects studied are compile time and execution time for the various implementations that were benchmarked. An important consideration is how the program can be modified and tuned to obtain maximal performance on each language implementation. With few exceptions, the compilers take a significant amount of time to compile this program, though most compilers were faster than the then current GNU C compiler (GCC version 2.5.8). Compilers that generate C or Lisp are often slower than those that generate native code directly: the cost of compiling the intermediate form is normally a large fraction of the total compilation time. There is no clear distinction between the runtime performance of eager and lazy implementations when appropriate annotations are used: lazy implementations have clearly come of age when it comes to implementing largely strict applications, such as the Pseudoknot program. The speed of C can be approached by some implementations, but to achieve this performance, special measures such as strictness annotations are required by non-strict implementations. The benchmark results have to be interpreted with care. Firstly, a benchmark based on a single program cannot cover a wide spectrum of ‘typical' applications. Secondly, the compilers vary in the kind and level of optimisations offered, so the effort required to obtain an optimal version of the program is similarly varie

    Test Data Generation for Exposing Interference Bugs in Multi-Threaded Systems

    Get PDF
    RÉSUMÉ Tester les systèmes multi-thread est difficile en raison du comportement non-déterministe de l'environnement (ordonnanceurs, cache, interruptions) dans lequel ils s'exécutent. Comme ils ne peuvent contrôler l'environnement, les testeurs doivent recourir à des moyens indirects pour augmenter le nombre d'ordonnancements testés. Une autre source de non-déterminisme pour les systèmes multi-thread est l'accès à la mémoire partagée. Lors de leur exécution, les systèmes multi-thread peuvent générer des conditions de courses aux données ou d'interférence dues aux données partagées. La génération de jeux de test en utilisant des techniques basées sur les recherches locales a déjà fourni des solutions aux problèmes des tests de systèmes mono et multi-thread. Mais il n'y a pas encore eu de travaux abordant la question des bugs d'interférence dans les systèmes multi-thread en utilisant les recherches locales. Dans cette thèse, nous étudions la possibilité d'utiliser ces approches afin de maximiser la possibilité d'exposer des bugs d'interférence dans les systèmes multi-thread. Nous formulons notre hypothèse de recherche comme suit : les techniques basées sur les recherches locales peuvent être utilisées efficacement pour générer des jeux de test pour maximiser les conditions d’obtention de bugs d'interférence dans les systèmes multi-thread. Après étude de la littérature, nous avons découvert trois défis majeurs concernant l'utilisation des approches basées sur les recherches locales : C1 : Formuler le problème initial comme un problème de recherche locale, C2 : Développer une fonction de coût adaptée, et C3 : Trouver la meilleure recherche locale (la plus adaptée). Nous procédons d'abord à une étude préliminaire sur la façon dont ces défis peuvent être relevés dans les systèmes mono-thread. Nous pensons que nos résultats pourraient être ensuite applicables aux systèmes multi-thread. Pour notre première étude, nous abordons le problème de la génération des jeux de test pour lever des exceptions de type division par zéro dans un système mono-thread en utilisant des approches basées sur les recherches locales, tout en tenant compte de C1, C2 et C3. Nous constatons que les trois défis sont importants et peuvent être traitées, et que les approches basées sur les recherches locales sont nettement plus efficaces qu'une approche aléatoire lorsque le nombre de paramètres d'entrées croît. Nous traitons ensuite notre principal problème de génération de jeux de test afin de révéler des bugs d'interférence dans les systèmes multi-thread en utilisant des approches basées sur les recherches locales, tout en répondant aux trois mêmes défis. Nous avons constaté que même dans les systèmes multi-thread, il est important de traiter les trois défis et que les approches basées sur les recherches locales surpassent une approche aléatoire lorsque le nombre de paramètres d'entrée devient grand. Nous validons ainsi notre thèse. Cependant, d'autres études sont nécessaires afin de généraliser nos résultats.----------ABSTRACT Testing multi-threaded systems is difficult due to the non-deterministic behaviour of the environment (schedulers, cache, interrupts) in which the multi-threaded system runs. As one cannot control the environment, testers must resort to indirect means to increase the number of schedules tested. Another source of non-determinism in multi-threaded systems is the shared memory access. When executed, multi-threaded systems can experience one of many possible interleavings of memory accesses to shared data, resulting in data race or interference conditions being raised. Test data generation using search-based techniques has provided solutions to the problem of testing single and multi-threaded systems over the years. But to the best of our knowledge, there has been no work addressing the issue of interference bugs in multi-threaded systems using search-based approaches. In this thesis, we perform a feasibility study of using search-based approaches to maximize the possibility of exposing interference bug pattern in multi-threaded systems. We frame our thesis hypothesis as: Search-based techniques can be used effectively to generate test data to expose interference condition in multi-threaded systems. From the related work we found out three major challenges of using search-based approaches: C1: Formulating the original problem as a search problem, C2: Developing the right fitness function for the problem formulation, and C3: Finding the right (scalable) search-based approach using scalability analysis. Before studying multi-threaded systems, we perform a preliminary study on how these challenges can be addressed in single-threaded systems, as we feel that our findings might be applicable to the more complex multi-threaded systems. In our first study, we address the problem of generating test data for raising divide-by-zero exception in single-threaded systems using search-based approaches, while addressing C1, C2, and C3. We find that the three challenges are important and can be addressed. Further, we found that search-based approaches scale significantly better than random when the input search space grows from very small to huge. Based on the knowledge obtained from the study, we address our main problem of generating test data to expose interference bugs in multi-threaded systems using search-based approaches, while addressing the same challenges C1, C2, and C3. We found that even in multi-threaded systems, it is important to address the three challenges and that search-based approaches outperforms random when the input search-space becomes large. Thus we confirm our thesis. However, further studies are necessary to generalize

    Finding and Tolerating Concurrency Bugs.

    Full text link
    Shared-memory multi-threaded programming is inherently more difficult than single-threaded programming. The main source of complexity is that, the threads of an application can interleave in so many different ways. To ensure correctness, a programmer has to test all possible thread interleavings, which, however, is impractical. Many rare thread interleavings remain untested in production systems, and they are the major cause for a majority of concurrency bugs. Given that untested interleavings are the major cause of a majority of the concurrency bugs, this dissertation explores two possible ways to tackle concurrency bugs in this dissertation. One is to expose untested interleavings during testing to find concurrency bugs. The other is to avoid untested interleavings during production runs to tolerate concurrency bugs. The key is an efficient and effective way to encode and remember tested interleavings. This dissertation first discusses two hypotheses about concurrency bugs: the small scope hypothesis and the value independent hypothesis. Based on these two hypotheses, this dissertation defines a set of interleaving patterns, called interleaving idioms, which are used to encode tested interleavings. The empirical analysis shows that the idiom based interleaving encoding scheme is able to represent most of the concurrency bugs that are used in the study. Then, this dissertation discusses an open source testing tool called Maple. It memoizes tested interleavings and actively seeks to expose untested interleavings. The results show that Maple is able to expose concurrency bugs and expose interleavings faster than other conventional testing techniques. Finally, this dissertation discusses two parallel runtime system designs which seek to avoid untested interleavings during production runs to tolerate concurrency bugs. Avoiding untested interleavings significantly improve correctness because most of the concurrency bugs are caused by untested interleavings. Also, the performance overhead for disallowing untested interleavings is low as commonly occuring interleavings should have been tested in a well-tested program.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/99765/1/jieyu_1.pd

    Benchmarking Implementations of Functional Languages with ``Pseudoknot'', a Float-Intensive Benchmark

    Get PDF
    Over 25 implementations of different functional languages are benchmarked using the same program, a floatingpoint intensive application taken from molecular biology. The principal aspects studied are compile time and execution time for the various implementations that were benchmarked. An important consideration is how the program can be modified and tuned to obtain maximal performance on each language implementation.\ud With few exceptions, the compilers take a significant amount of time to compile this program, though most compilers were faster than the then current GNU C compiler (GCC version 2.5.8). Compilers that generate C or Lisp are often slower than those that generate native code directly: the cost of compiling the intermediate form is normally a large fraction of the total compilation time.\ud There is no clear distinction between the runtime performance of eager and lazy implementations when appropriate annotations are used: lazy implementations have clearly come of age when it comes to implementing largely strict applications, such as the Pseudoknot program. The speed of C can be approached by some implemtations, but to achieve this performance, special measures such as strictness annotations are required by non-strict implementations.\ud The benchmark results have to be interpreted with care. Firstly, a benchmark based on a single program cannot cover a wide spectrum of 'typical' applications.j Secondly, the compilers vary in the kind and level of optimisations offered, so the effort required to obtain an optimal version of the program is similarly varied
    corecore