Search CORE

32 research outputs found

Efficient Detection of Determinacy Races in Cilk Programs

Author: M. Feng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

On-the-Fly Maintenance of Series-Parallel Relationships in Fork-Join Multithreaded Programs

Author: Bender Michael A.
Fineman Jeremy T.
Gilbert Seth
Leiserson Charles E.
Publication venue
Publication date: 01/01/2004
Field of study

A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two algorithms, one serial and one parallel, to maintain series-parallel (SP) relationships "on the fly" for fork-join multithreaded programs. The serial SP-order algorithm runs in O(1) amortized time per operation. In contrast, the previously best algorithm requires a time per operation that is proportional to Tarjan’s functional inverse of Ackermann’s function. SP-order employs an order-maintenance data structure that allows us to implement a more efficient "English-Hebrew" labeling scheme than was used in earlier race detectors, which immediately yields an improved determinacy-race detector. In particular, any fork-join program running in T₁ time on a single processor can be checked on the fly for determinacy races in O(T₁) time. Corresponding improved bounds can also be obtained for more sophisticated data-race detectors, for example, those that use locks. By combining SP-order with Feng and Leiserson’s serial SP-bags algorithm, we obtain a parallel SP-maintenance algorithm, called SP-hybrid. Suppose that a fork-join program has n threads, T₁ work, and a critical-path length of T[subscript â]. When executed on P processors, we prove that SP-hybrid runs in O((T₁/P + PT[subscript â]) lg n) expected time. To understand this bound, consider that the original program obtains linear speed-up over a 1-processor execution when P = O(T₁/T[subscript â]). In contrast, SP-hybrid obtains linear speed-up when P = O(√T₁/T[subscript â]), but the work is increased by a factor of O(lg n).Singapore-MIT Alliance (SMA

CiteSeerX

DSpace@MIT

Crossref

Easier Parallel Programming with Provably-Efficient Runtime Schedulers

Author: Utterback Robert
Publication venue: Washington University Open Scholarship
Publication date: 15/08/2017
Field of study

Over the past decade processor manufacturers have pivoted from increasing uniprocessor performance to multicore architectures. However, utilizing this computational power has proved challenging for software developers. Many concurrency platforms and languages have emerged to address parallel programming challenges, yet writing correct and performant parallel code retains a reputation of being one of the hardest tasks a programmer can undertake. This dissertation will study how runtime scheduling systems can be used to make parallel programming easier. We address the difficulty in writing parallel data structures, automatically finding shared memory bugs, and reproducing non-deterministic synchronization bugs. Each of the systems presented depends on a novel runtime system which provides strong theoretical performance guarantees and performs well in practice

Washington University St. Louis: Open Scholarship

Debugging multithreaded programs that incorporate user-level locking

Author: Stark Andrew F. (Andrew Frederick), 1975-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1998
Field of study

Thesis (S.B. and M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 119-124).by Andrew F. Stark.S.B.and M.Eng

DSpace@MIT

Transactions Everywhere

Author: Kuszmaul Bradley C.
Leiserson Charles E.
Publication venue
Publication date: 01/01/2003
Field of study

Arguably, one of the biggest deterrants for software developers who might otherwise choose to write parallel code is that parallelism makes their lives more complicated. Perhaps the most basic problem inherent in the coordination of concurrent tasks is the enforcing of atomicity so that the partial results of one task do not inadvertently corrupt another task. Atomicity is typically enforced through locking protocols, but these protocols can introduce other complications, such as deadlock, unless restrictive methodologies in their use are adopted. We have recently begun a research project focusing on transactional memory [18] as an alternative mechanism for enforcing atomicity, since it allows the user to avoid many of the complications inherent in locking protocols. Rather than viewing transactions as infrequent occurrences in a program, as has generally been done in the past, we have adopted the point of view that all user code should execute in the context of some transaction. To make this viewpoint viable requires the development of two key technologies: effective hardware support for scalable transactional memory, and linguistic and compiler support. This paper describes our preliminary research results on making “transactions everywhere” a practical reality.Singapore-MIT Alliance (SMA

DSpace@MIT

Efficient Race Detection with Futures

Author: Agrawal Kunal
Fineman Jeremy
Lee I-Ting Angelina
Utterback Robert
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/01/2019
Field of study

This paper addresses the problem of provably efficient and practically good on-the-fly determinacy race detection in task parallel programs that use futures. Prior works determinacy race detection have mostly focused on either task parallel programs that follow a series-parallel dependence structure or ones with unrestricted use of futures that generate arbitrary dependences. In this work, we consider a restricted use of futures and show that it can be race detected more efficiently than general use of futures. Specifically, we present two algorithms: MultiBags and MultiBags+. MultiBags targets programs that use futures in a restricted fashion and runs in time

O(T_1 \alpha(m,n))

, where

T_1

is the sequential running time of the program,

\alpha

is the inverse Ackermann's function,

m

is the total number of memory accesses,

n

is the dynamic count of places at which parallelism is created. Since

\alpha

is a very slowly growing function (upper bounded by

4

for all practical purposes), it can be treated as a close-to-constant overhead. MultiBags+ an extension of MultiBags that target programs with general use of futures. It runs in time

O((T_1+k^2)\alpha(m,n))

where

T_1

\alpha

m

and

n

are defined as before, and

k

is the number of future operations in the computation. We implemented both algorithms and empirically demonstrate their efficiency

arXiv.org e-Print Archive

Crossref

Algorithms for data-race detection in multithreaded programs

Author: Cheng Guang-Ien, 1975-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1998
Field of study

Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 69-71).by Guang-Ien Cheng.M.Eng

CiteSeerX

DSpace@MIT

Cilk : efficient multithreaded computing

Author: Randall Keith H. (Keith Harold)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1998
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 170-179).by Keith H. Randall.Ph.D

DSpace@MIT

Efficient Data Race Detection for Async-Finish Parallelism

Author: C. Flanagan
C. Sadowski
D. Lea
D. Leijen
E.A. Lee
J. Mellor-Crummey
J.-D. Choi
J.K. Lee
M. Feng
R. Barik
R. Barik
R.D. Blumofe
S. Agarwal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Abstract. A major productivity hurdle for parallel programming is the presence of data races. Data races can lead to all kinds of harmful program behaviors, includ-ing determinism violations and corrupted memory. However, runtime overheads of current dynamic data race detectors are still prohibitively large (often incurring slowdowns of 10 × or larger) for use in mainstream software development. In this paper, we present an efficient dynamic race detector algorithm targeting the async-finish task-parallel parallel programming model. The async and finish constructs are at the core of languages such as X10 and Habanero Java (HJ). These constructs generalize the spawn-sync constructs used in Cilk, while still ensuring that all computation graphs are deadlock-free. We have implemented our algorithm in a tool called TASKCHECKER and eval-uated it on a suite of 12 benchmarks. To reduce overhead of the dynamic analysis, we have also implemented various static optimizations in the tool. Our experi-mental results indicate that our approach performs well in practice, incurring an average slowdown of 3.05 × compared to a serial execution in the optimized case.

CiteSeerX

Crossref

Provably and Practically Efficient Race Detection for Task-Parallel Code

Author: Xu Yifan
Publication venue: Washington University Open Scholarship
Publication date: 15/08/2021
Field of study

Parallel systems are pervasive nowadays. Specifically, modern computers have embraced multicore architectures due to the difficulties of exploiting higher clock speeds on single-core CPUs. However, parallel programming is challenging. Determinacy race, in particular, is a common pitfall when writing task-parallel code. It can easily lead to non-deterministic behavior of the parallel program and therefore a determinacy race is often considered as a bug. Unfortunately, such bugs are hard to debug because they do not necessarily produce obvious failures in every single execution. To ease the debugging process of determinacy races in task-parallel code, this dissertation proposes several provably and practically efficient parallel race detection algorithms. Unlike prior works mostly target fork-join parallelism, we focus on less structured but important programming paradigms – pipeline parallelism and futures. In addition, we build an efficient runtime system for scheduling futures, which is not only a facility to study the race detection problem for futures but also useful in practice. Finally, this dissertation investigates mechanisms that optimize the access history of race detectors, which provides significant additional boost to the performance

Washington University St. Louis: Open Scholarship