14 research outputs found
SmartTrack: Efficient Predictive Race Detection
Widely used data race detectors, including the state-of-the-art FastTrack
algorithm, incur performance costs that are acceptable for regular in-house
testing, but miss races detectable from the analyzed execution. Predictive
analyses detect more data races in an analyzed execution than FastTrack
detects, but at significantly higher performance cost.
This paper presents SmartTrack, an algorithm that optimizes predictive race
detection analyses, including two analyses from prior work and a new analysis
introduced in this paper. SmartTrack's algorithm incorporates two main
optimizations: (1) epoch and ownership optimizations from prior work, applied
to predictive analysis for the first time; and (2) novel conflicting critical
section optimizations introduced by this paper. Our evaluation shows that
SmartTrack achieves performance competitive with FastTrack-a qualitative
improvement in the state of the art for data race detection.Comment: Extended arXiv version of PLDI 2020 paper (adds Appendices A-E) #228
SmartTrack: Efficient Predictive Race Detectio
Uniparallel Execution and its Uses.
We introduce uniparallelism: a new style of execution that allows
multithreaded applications to benefit from the simplicity of
uniprocessor execution while scaling performance with increasing
processors.
A uniparallel execution consists of a thread-parallel execution, where
each thread runs on its own processor, and an epoch-parallel
execution, where multiple time intervals (epochs) of the program run
concurrently. The epoch-parallel execution runs all threads of a
given epoch on a single processor; this enables the use of techniques
that are effective on a uniprocessor. To scale performance with
increasing cores, a thread-parallel execution runs ahead of the
epoch-parallel execution and generates speculative checkpoints from
which to start future epochs. If these checkpoints match the program
state produced by the epoch-parallel execution at the end of each
epoch, the speculation is committed and output externalized; if they
mismatch, recovery can be safely initiated as no speculative state has
been externalized.
We use uniparallelism to build two novel systems: DoublePlay and
Frost. DoublePlay benefits from the efficiency of logging the
epoch-parallel execution (as threads in an epoch are constrained to a
single processor, only infrequent thread context-switches need to be
logged to recreate the order of shared-memory accesses), allowing it
to outperform all prior systems that guarantee deterministic replay on
commodity multiprocessors.
While traditional methods detect data races by analyzing the events
executed by a program, Frost introduces a new, substantially faster
method called outcome-based race detection to detect the effects of a
data race by comparing the program state of replicas for divergences.
Unlike DoublePlay, which runs a single epoch-parallel execution of the
program, Frost runs multiple epoch-parallel replicas with
complementary schedules, which are a set of thread schedules crafted
to ensure that replicas diverge only if a data race occurs and to make
it very likely that harmful data races cause divergences. Frost
detects divergences by comparing the outputs and memory states of
replicas at the end of each epoch. Upon detecting a divergence, Frost
analyzes the replica outcomes to diagnose the data race bug and
selects an appropriate recovery strategy that masks the failure.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89677/1/kaushikv_1.pd
Holistic System Design for Deterministic Replay.
Deterministic replay systems record and reproduce the execution of a hardware or software system. While it is well known how to replay uniprocessor systems, it is much harder to provide deterministic replay of shared memory multithreaded programs on multiprocessors because shared memory accesses add a high-frequency source of non-determinism. This thesis proposes efficient multiprocessor replay systems: Respec, Chimera, and Rosa.
Respec is an operating-system-based replay system. Respec is based on the observation that most program executions are data-race-free and for programs with no data races it is sufficient to record program input and the happens-before order of synchronization operations for replay. Respec speculates that a program is data-race-free and supports rollback and recovery from misspeculation. For racy programs, Respec employs a cheap runtime check that compares system call outputs and memory/register states of recorded and replayed processes at a semi-regular interval.
Chimera uses a sound static data race detector to find all potential data races and instrument pairs of potentially racing instructions to transform an arbitrary program to make it data-race-free. Then, Chimera records only the non-deterministic inputs and the order of synchronization operations for replay. However, existing static data race detectors generate excessive false warnings, leading to high recording overhead. Chimera resolves this problem by employing a combination of profiling, symbolic analysis, and dynamic checks that target the sources of imprecision in the static data race detector.
Rosa is a processor-based ultra-low overhead (less than one percent) replay solution that requires very little hardware support as it essentially only needs a log of cache misses to reproduce a multiprocessor execution. Unlike previous hardware-assisted systems, Rosa does not record shared memory dependencies at all. Instead, it infers them offline using a Satisfiability Modulo Theories (SMT) solver. Our offline analysis is capable of inferring interleavings that are legal under the Sequentially Consistency (SC) and Total Store Order (TSO) memory models.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/102374/1/dongyoon_1.pd
Dynamic Analysis of Embedded Software
abstract: Most embedded applications are constructed with multiple threads to handle concurrent events. For optimization and debugging of the programs, dynamic program analysis is widely used to collect execution information while the program is running. Unfortunately, the non-deterministic behavior of multithreaded embedded software makes the dynamic analysis difficult. In addition, instrumentation overhead for gathering execution information may change the execution of a program, and lead to distorted analysis results, i.e., probe effect. This thesis presents a framework that tackles the non-determinism and probe effect incurred in dynamic analysis of embedded software. The thesis largely consists of three parts. First of all, we discusses a deterministic replay framework to provide reproducible execution. Once a program execution is recorded, software instrumentation can be safely applied during replay without probe effect. Second, a discussion of probe effect is presented and a simulation-based analysis is proposed to detect execution changes of a program caused by instrumentation overhead. The simulation-based analysis examines if the recording instrumentation changes the original program execution. Lastly, the thesis discusses data race detection algorithms that help to remove data races for correctness of the replay and the simulation-based analysis. The focus is to make the detection efficient for C/C++ programs, and to increase scalability of the detection on multi-core machines.Dissertation/ThesisDoctoral Dissertation Computer Science 201
Detecting and Surviving Data Races using Complementary Schedules
Data races are a common source of errors in multithreaded programs. In this paper, we show how to protect a program from data race errors at runtime by executing multiple replicas of the program with complementary thread schedules. Complementary schedules are a set of replica thread schedules crafted to ensure that replicas diverge only if a data race occurs and to make it very likely that harmful data races cause divergences. Our system, called Frost 1, uses complementary schedules to cause at least one replica to avoid the order of racing instructions that leads to incorrect program execution for most harmful data races. Frost introduces outcome-based race detection, which detects data races by comparing the state of replicas executing complementary schedules. We show that this method is substantiall
Exposing concurrency failures: a comprehensive survey of the state of the art and a novel approach to reproduce field failures
With the rapid advance of multi-core and distributed architectures, concurrent systems are becoming more and more popular. Concurrent systems are extremely hard to develop and validate, as their overall behavior depends on the non-deterministic interleaving of the execution flows that comprise the system. Wrong and unexpected interleavings may lead to concurrency faults that are extremely hard to avoid, detect, and fix due to their non-deterministic nature. This thesis addresses the problem of exposing concurrency failures. Exposing concurrency failures is a crucial activity to locate and fix the related fault and amounts to determine both a test case and an interleaving that trigger the failure. Given the high cost of manually identifying a failure-inducing test case and interleaving among the infinite number of inputs and interleavings of the system, the problem of automatically exposing concurrency failures has been studied by researchers since the late seventies and is still a hot research topic. This thesis advances the research in exposing concurrency failures by proposing two main contributions. The first contribution is a comprehensive survey and taxonomy of the state-of-the-art techniques for exposing concurrency failures. The taxonomy and survey provide a framework that captures the key features of the existing techniques, identify a set of classification criteria to review and compare them, and highlight their strengths and weaknesses, leading to a thorough assessment of the field and paving the road for future progresses. The second contribution of this thesis is a technique to automatically expose and reproduce concurrency field failure. One of the main findings of our survey is that automatically reproducing concurrency field failures is still an open problem, as the few techniques that have been proposed rely on information that may be hard to collect, and identify failure-inducing interleavings but do not synthesize failure-inducing test cases. We propose a technique that advances over state- of-the-art approaches by relying on information that is easily obtainable and by automatically identifying both a failure- inducing test case and interleaving. We empirically demonstrate the effectiveness of our approach on a benchmark of real concurrency failures taken from different popular code bases
Recommended from our members
Sound and Precise Analysis of Multithreaded Programs through Schedule Specialization and Execution Filters
Multithreaded programs are known to be difficult to analyze. A key reason is that they typically have an enormous number of execution interleavings, or schedules. Static analysis with respect to all schedules requires over-approximation, resulting in poor precision; dynamic analysis rarely covers more than a tiny fraction of all schedules, so its result may not hold for schedules not covered.
To address this challenge, we propose a novel approach called schedule specialization that restricts the schedules of a program to make it easier to analyze. Schedule specialization combines static and dynamic analysis. It first statically analyzes a multithreaded program with respect to a small set of schedules for precision, and then enforces these schedules at runtime for soundness of the static analysis results.
To demonstrate that this approach works, we build three systems. The first system is a specialization framework that specializes a program into a simpler program based on a schedule for precision. It allows stock analyses to automatically gain precision with only little modification.
The second system is Peregrine, a deterministic multithreading system that collects and enforces schedules on future inputs. Peregrine reuses a small set of schedules on many inputs, ensuring our static analysis results to be sound for a wide range of inputs. It also enforces these schedules efficiently, making schedule specialization suitable for production usage.
Although schedule specialization can make static concurrency error detection more precise, some concurrency errors such as races may still slip detection and enter production systems. To mitigate this limitation, we build Loom, a live-workaround system that protects a live multithreaded program from races that slip detection. It allows developers to easily write execution filters to safely and efficiently work around deployed races in live multithreaded programs without restarting them