44 research outputs found

    On-the-Fly Maintenance of Series-Parallel Relationships in Fork-Join Multithreaded Programs

    Get PDF
    A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two algorithms, one serial and one parallel, to maintain series-parallel (SP) relationships "on the fly" for fork-join multithreaded programs. The serial SP-order algorithm runs in O(1) amortized time per operation. In contrast, the previously best algorithm requires a time per operation that is proportional to Tarjan’s functional inverse of Ackermann’s function. SP-order employs an order-maintenance data structure that allows us to implement a more efficient "English-Hebrew" labeling scheme than was used in earlier race detectors, which immediately yields an improved determinacy-race detector. In particular, any fork-join program running in T₁ time on a single processor can be checked on the fly for determinacy races in O(T₁) time. Corresponding improved bounds can also be obtained for more sophisticated data-race detectors, for example, those that use locks. By combining SP-order with Feng and Leiserson’s serial SP-bags algorithm, we obtain a parallel SP-maintenance algorithm, called SP-hybrid. Suppose that a fork-join program has n threads, T₁ work, and a critical-path length of T[subscript â]. When executed on P processors, we prove that SP-hybrid runs in O((T₁/P + PT[subscript â]) lg n) expected time. To understand this bound, consider that the original program obtains linear speed-up over a 1-processor execution when P = O(T₁/T[subscript â]). In contrast, SP-hybrid obtains linear speed-up when P = O(√T₁/T[subscript â]), but the work is increased by a factor of O(lg n).Singapore-MIT Alliance (SMA

    Data-race detection in transactions-everywhere parallel programming

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.Includes bibliographical references (p. 69-72).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.This thesis studies how to perform dynamic data-race detection in programs using "transactions everywhere", a new methodology for shared-memory parallel programming. Since the conventional definition of a data race does not make sense in the transactions-everywhere methodology, this thesis develops a new definition based on a weak assumption about the correctness of the target program's parallel-control flow, which is made in the same spirit as the assumption underlying the conventional definition. This thesis proves, via a reduction from the problem of 3cnf-formula satisfiability, that data-race detection in the transactions-everywhere methodology is an NP-complete problem. In view of this result, it presents an algorithm that approximately detects data races. The algorithm never reports false negatives. When a possible data race is detected, the algorithm outputs simple information that allows the programmer to efficiently resolve the root of the problem. The algorithm requires running time that is worst-case quadratic in the size of a graph representing all the scheduling constraints in the target program.by Kai Huang.M.Eng

    Easier Parallel Programming with Provably-Efficient Runtime Schedulers

    Get PDF
    Over the past decade processor manufacturers have pivoted from increasing uniprocessor performance to multicore architectures. However, utilizing this computational power has proved challenging for software developers. Many concurrency platforms and languages have emerged to address parallel programming challenges, yet writing correct and performant parallel code retains a reputation of being one of the hardest tasks a programmer can undertake. This dissertation will study how runtime scheduling systems can be used to make parallel programming easier. We address the difficulty in writing parallel data structures, automatically finding shared memory bugs, and reproducing non-deterministic synchronization bugs. Each of the systems presented depends on a novel runtime system which provides strong theoretical performance guarantees and performs well in practice

    Provably good data-race detector that runs in parallel

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (leaves 47-49).This thesis describes the implementation of a provably good data-race detector, called the Nondeterminator-3, which runs efficiently in parallel. A data race occurs in a multithreaded program when two logically parallel threads access the same location while holding no common locks and at least one of the accesses is a write. The Nondeterminator-3 checks for data races in programs coded in Cilk [3, 10], a shared-memory multithreaded programming language. A key capability of data-race detectors is in determining the series-parallel (SP) relationship between two threads. The Nondeterminator-3 is based on a provably good parallel SP-maintenance algorithm known as SP-hybrid [2]. For a program with n threads, T1 work, and critical-path length To, the SP-hybrid algorithm runs in O((T1/P + PTO) lg n) expected time when executed on P processors. A data-race detector must also maintain an access-history, which consists of, for each shared memory location, a representative subset of memory accesses to that location. The Nondeterminator-3 uses an extension of the ALL-SETS [4] access-history algorithm used by its serially running predecessor, the Nondeterminator-2. First, the ALL-SETS algorithm was extended to correctly support the inlet feature of Cilk.(cont.) This extension increases the memory-access cost by only a constant factor. Then, this extended ALL-SETS algorithm was parallelized, so that it can be combined with the SP-hybrid algorithm to obtain a data-race detector. Assuming that the cost of locking the access-history can be ignored, this parallelization also inflates the memory-access cost by only a constant factor. I tested the Nondeterminator-3 on several programs to verify the accuracy of the implementation. I have also observed that the Nondeterminator-3 achieves good speed-up when run on a multiprocessor machine.by Tushara C. Karunaratna.M.Eng

    Provably good race detection that runs in parallel

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 93-98).A multithreaded parallel program that is intended to be deterministic may exhibit nondeterminism clue to bugs called determinacy races. A key capability of race detectors is to determine whether one thread executes logically in parallel with another thread or whether the threads must operate in series. This thesis presents two algorithms, one serial and one parallel, to maintain the series-parallel (SP) relationships "on the fly" for fork-join multithreaded programs. For a fork-join program with T1 work and a critical-path length of T[infinity], the serial SP-Maintenance algorithm runs in O(T1) time. The parallel algorithm executes in the nearly optimal O(T1/P + PT[infinity]) time, when run on P processors and using an efficient scheduler. These SP-maintenance algorithms can be incorporated into race detectors to get a provably good race detector that runs in parallel. This thesis describes an efficient parallel race detector I call Nondeterminator-3. For a fork-join program T1 work, critical-path length T[infinity], and v shared memory locations, the Nondeterminator-3 runs in O(T1/P + PT[infinity] lg P + min [(T1 lg P)/P, vT[infinity] Ig P]) expected time, when run on P processors and using an efficient scheduler.by Jeremy T. Fineman.S.M

    Aikido: Accelerating shared data dynamic analyses

    Get PDF
    Despite a burgeoning demand for parallel programs, the tools available to developers working on shared-memory multicore processors have lagged behind. One reason for this is the lack of hardware support for inspecting the complex behavior of these parallel programs. Inter-thread communication, which must be instrumented for many types of analyses, may occur with any memory operation. To detect such thread communication in software, many existing tools require the instrumentation of all memory operations, which leads to significant performance overheads. To reduce this overhead, some existing tools resort to random sampling of memory operations, which introduces false negatives. Unfortunately, neither of these approaches provide the speed and accuracy programmers have traditionally expected from their tools. In this work, we present Aikido, a new system and framework that enables the development of efficient and transparent analyses that operate on shared data. Aikido uses a hybrid of existing hardware features and dynamic binary rewriting to detect thread communication with low overhead. Aikido runs a custom hypervisor below the operating system, which exposes per-thread hardware protection mechanisms not available in any widely used operating system. This hybrid approach allows us to benefit from the low cost of detecting memory accesses with hardware, while maintaining the word-level accuracy of a software-only approach. To evaluate our framework, we have implemented an Aikido-enabled vector clock race detector. Our results show that the Aikido enabled race-detector outperforms existing techniques that provide similar accuracy by up to 6.0x, and 76% on average, on the PARSEC benchmark suite.National Science Foundation (U.S.) (NSF grant CCF-0832997)National Science Foundation (U.S.) (DOE SC0005288)United States. Defense Advanced Research Projects Agency (DARPA HR0011-10- 9-0009

    Cilk : efficient multithreaded computing

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 170-179).by Keith H. Randall.Ph.D

    Debugging multithreaded programs that incorporate user-level locking

    Get PDF
    Thesis (S.B. and M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 119-124).by Andrew F. Stark.S.B.and M.Eng

    Dynamic Data Race Detection for Structured Parallelism

    Get PDF
    With the advent of multicore processors and an increased emphasis on parallel computing, parallel programming has become a fundamental requirement for achieving available performance. Parallel programming is inherently hard because, to reason about the correctness of a parallel program, programmers have to consider large numbers of interleavings of statements in different threads in the program. Though structured parallelism imposes some restrictions on the programmer, it is an attractive approach because it provides useful guarantees such as deadlock-freedom. However, data races remain a challenging source of bugs in parallel programs. Data races may occur only in few of the possible schedules of a parallel program, thereby making them extremely hard to detect, reproduce, and correct. In the past, dynamic data race detection algorithms have suffered from at least one of the following limitations: some algorithms have a worst-case linear space and time overhead, some algorithms are dependent on a specific scheduling technique, some algorithms generate false positives and false negatives, some have no empirical evaluation as yet, and some require sequential execution of the parallel program. In this thesis, we introduce dynamic data race detection algorithms for structured parallel programs that overcome past limitations. We present a race detection algorithm called ESP-bags that requires the input program to be executed sequentially and another algorithm called SPD3 that can execute the program in parallel. While the ESP-bags algorithm addresses all the above mentioned limitations except sequential execution, the SPD3 algorithm addresses the issue of sequential execution by scaling well across highly parallel shared memory multiprocessors. Our algorithms incur constant space overhead per memory location and time overhead that is independent of the number of processors on which the programs execute. Our race detection algorithms support a rich set of parallel constructs (including async, finish, isolated, and future) that are found in languages such as HJ, X10, and Cilk. Our algorithms for async, finish, and future are precise and sound for a given input. In the presence of isolated, our algorithms are precise but not sound. Our experiments show that our algorithms (for async, finish, and isolated) perform well in practice, incurring an average slowdown of under 3x over the original execution time on a suite of 15 benchmarks. SPD3 is the first practical dynamic race detection algorithm for async-finish parallel programs that can execute the input program in parallel and use constant space per memory location. This takes us closer to our goal of building dynamic data race detectors that can be "always-on" when developing parallel applications
    corecore