Search CORE

1,170 research outputs found

A Fast Causal Profiler for Task Parallel Programs

Author: Nagarakatte Santosh
Yoga Adarsh
Publication venue
Publication date: 02/07/2017
Field of study

This paper proposes TASKPROF, a profiler that identifies parallelism bottlenecks in task parallel programs. It leverages the structure of a task parallel execution to perform fine-grained attribution of work to various parts of the program. TASKPROF's use of hardware performance counters to perform fine-grained measurements minimizes perturbation. TASKPROF's profile execution runs in parallel using multi-cores. TASKPROF's causal profile enables users to estimate improvements in parallelism when a region of code is optimized even when concrete optimizations are not yet known. We have used TASKPROF to isolate parallelism bottlenecks in twenty three applications that use the Intel Threading Building Blocks library. We have designed parallelization techniques in five applications to in- crease parallelism by an order of magnitude using TASKPROF. Our user study indicates that developers are able to isolate performance bottlenecks with ease using TASKPROF.Comment: 11 page

arXiv.org e-Print Archive

Crossref

Non-intrusive on-the-fly data race detection using execution replay

Author: De Bosschere Koen
Ronsse Michiel
Publication venue
Publication date: 01/01/2000
Field of study

This paper presents a practical solution for detecting data races in parallel programs. The solution consists of a combination of execution replay (RecPlay) with automatic on-the-fly data race detection. This combination enables us to perform the data race detection on an unaltered execution (almost no probe effect). Furthermore, the usage of multilevel bitmaps and snooped matrix clocks limits the amount of memory used. As the record phase of RecPlay is highly efficient, there is no need to switch it off, hereby eliminating the possibility of Heisenbugs because tracing can be left on all the time.Comment: In M. Ducasse (ed), proceedings of the Fourth International Workshop on Automated Debugging (AAdebug 2000), August 2000, Munich. cs.SE/001003

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Fast and Precise Symbolic Analysis of Concurrency Bugs in Device Drivers

Author: Deligiannis P
Donaldson AF
Rakamaric Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/07/2015
Field of study

© 2015 IEEE.Concurrency errors, such as data races, make device drivers notoriously hard to develop and debug without automated tool support. We present Whoop, a new automated approach that statically analyzes drivers for data races. Whoop is empowered by symbolic pairwise lockset analysis, a novel analysis that can soundly detect all potential races in a driver. Our analysis avoids reasoning about thread interleavings and thus scales well. Exploiting the race-freedom guarantees provided by Whoop, we achieve a sound partial-order reduction that significantly accelerates Corral, an industrial-strength bug-finder for concurrent programs. Using the combination of Whoop and Corral, we analyzed 16 drivers from the Linux 4.0 kernel, achieving 1.5 - 20× speedups over standalone Corral

Spiral - Imperial College Digital Repository

Coz: Finding Code that Counts with Causal Profiling

Author: Ammons G.
Hollingsworth J. K.
Kambadur M.
Kambadur M.
Miller B. P.
Thompson K.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/08/2016
Field of study

Improving performance is a central concern for software developers. To locate optimization opportunities, developers rely on software profilers. However, these profilers only report where programs spent their time: optimizing that code may have no impact on performance. Past profilers thus both waste developer time and make it difficult for them to uncover significant optimization opportunities. This paper introduces causal profiling. Unlike past profiling approaches, causal profiling indicates exactly where programmers should focus their optimization efforts, and quantifies their potential impact. Causal profiling works by running performance experiments during program execution. Each experiment calculates the impact of any potential optimization by virtually speeding up code: inserting pauses that slow down all other code running concurrently. The key insight is that this slowdown has the same relative effect as running that line faster, thus "virtually" speeding it up. We present Coz, a causal profiler, which we evaluate on a range of highly-tuned applications: Memcached, SQLite, and the PARSEC benchmark suite. Coz identifies previously unknown optimization opportunities that are both significant and targeted. Guided by Coz, we improve the performance of Memcached by 9%, SQLite by 25%, and accelerate six PARSEC applications by as much as 68%; in most cases, these optimizations involve modifying under 10 lines of code.Comment: Published at SOSP 2015 (Best Paper Award

arXiv.org e-Print Archive

Crossref

A Survey of Phase Classification Techniques for Characterizing Variable Application Behavior

Author: Adegbija Tosiron
Criswell Keeley
Publication venue
Publication date: 16/07/2019
Field of study

Adaptable computing is an increasingly important paradigm that specializes system resources to variable application requirements, environmental conditions, or user requirements. Adapting computing resources to variable application requirements (or application phases) is otherwise known as phase-based optimization. Phase-based optimization takes advantage of application phases, or execution intervals of an application, that behave similarly, to enable effective and beneficial adaptability. In order for phase-based optimization to be effective, the phases must first be classified to determine when application phases begin and end, and ensure that system resources are accurately specialized. In this paper, we present a survey of phase classification techniques that have been proposed to exploit the advantages of adaptable computing through phase-based optimization. We focus on recent techniques and classify these techniques with respect to several factors in order to highlight their similarities and differences. We divide the techniques by their major defining characteristics---online/offline and serial/parallel. In addition, we discuss other characteristics such as prediction and detection techniques, the characteristics used for prediction, interval type, etc. We also identify gaps in the state-of-the-art and discuss future research directions to enable and fully exploit the benefits of adaptable computing.Comment: To appear in IEEE Transactions on Parallel and Distributed Systems (TPDS

arXiv.org e-Print Archive

Hybrid Data Race Detection for Multicore Software

Author: Kalaci Onder
Sen Alper
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 03/05/2018
Field of study

Multithreaded programs are prone to concurrency errors such as deadlocks, race conditions and atomicity violations. These errors are notoriously difficult to detect due to the non-deterministic nature of concurrent software running on multicore hardware. Data races result from the concurrent access of shared data by multiple threads and can result in unexpected program behaviors. Main dynamic data race detection techniques in the literature are happens-before and lockset algorithms which suffer from high execution time and memory overhead, miss many data races or produce a high number of false alarms. Our goal is to improve the performance of dynamic data race detection, while at the same time improving its accuracy by generating fewer false alarms. We develop a hybrid data race detection algorithm that is a combination of the happens-before and lockset algorithms in a tool. Rather than focusing on individual memory accesses by each thread, we focus on sequence of memory accesses by each thread, called a segment. This allows us to improve the performance of data race detection. We implement several optimizations on our hybrid data race detector and compare our technique with traditional happens-before and lockset detectors. The experiments are performed with C/C++ multithreaded benchmarks using Pthreads library from PARSEC suite and large applications such as Apache web server. Our experiments showed that our hybrid detector is 15 % faster than the happens-before detector and produces 50 % less potential data races than the lockset detector. Ultimately, a hybrid data race detector can improve the performance and accuracy of data race detection, enhancing its usability in practice

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)