4 research outputs found

    Programming with Exceptions in JCilk

    Get PDF
    JCilk extends the Java language to provide call-return semantics for multithreading, much as Cilk does for C. Java's built-in thread model does not support the passing of exceptions or return values from one thread back to the "parent" thread that created it. JCilk imports Cilk's fork-join primitives spawn and sync into Java to provide procedure-call semantics for concurrent subcomputations. This paper shows how JCilk integrates exception handling with multithreading by defining semantics consistent with the existing semantics of Java's try and catch constructs, but which handle concurrency in spawned methods. JCilk's strategy of integrating multithreading with Java's exception semantics yields some surprising semantic synergies. In particular, JCilk extends Java's exception semantics to allow exceptions to be passed from a spawned method to its parent in a natural way that obviates the need for Cilk's inlet and abort constructs. This extension is "faithful" in that it obeys Java's ordinary serial semantics when executed on a single processor. When executed in parallel, however, an exception thrown by a JCilk computation signals its sibling computations to abort, which yields a clean semantics in which only a single exception from the enclosing try block is handled. The decision to implicitly abort side computations opens a Pandora's box of subsidiary linguistic problems to be resolved, however. For instance, aborting might cause a computation to be interrupted asynchronously, causing havoc in programmer understanding of code behavior. To minimize the complexity of reasoning about aborts, JCilk signals them "semisynchronously" so that abort signals do not interrupt ordinary serial code. In addition, JCilk propagates an abort signal throughout a subcomputation naturally with a built-in CilkAbort exception, thereby allowing programmers to handle clean-up by simply catching the CilkAbort exception. The semantics of JCilk allow programs with speculative computations to be programmed easily. Speculation is essential for parallelizing programs such as branch-and-bound or heuristic search. We show how JCilk's linguistic mechanisms can be used to program a solution to the "queens" problem and an implemention of a parallel alpha-beta search.Singapore-MIT Alliance (SMA

    The Cilk system for parallel multithreaded computing

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.Includes bibliographical references (p. 187-199).by Christopher F. Joerg.Ph.D

    Memory abstractions for parallel programming

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 156-163).A memory abstraction is an abstraction layer between the program execution and the memory that provides a different "view" of a memory location depending on the execution context in which the memory access is made. Properly designed memory abstractions help ease the task of parallel programming by mitigating the complexity of synchronization or admitting more efficient use of resources. This dissertation describes five memory abstractions for parallel programming: (i) cactus stacks that interoperate with linear stacks, (ii) efficient reducers, (iii) reducer arrays, (iv) ownershipaware transactions, and (v) location-based memory fences. To demonstrate the utility of memory abstractions, my collaborators and I developed Cilk-M, a dynamically multithreaded concurrency platform which embodies the first three memory abstractions. Many dynamic multithreaded concurrency platforms incorporate cactus stacks to support multiple stack views for all the active children simultaneously. The use of cactus stacks, albeit essential, forces concurrency platforms to trade off between performance, memory consumption, and interoperability with serial code due to its incompatibility with linear stacks. This dissertation proposes a new strategy to build a cactus stack using thread-local memory mapping (or TLMM), which enables Cilk-M to satisfy all three criteria simultaneously. A reducer hyperobject allows different branches of a dynamic multithreaded program to maintain coordinated local views of the same nonlocal variable. With reducers, one can use nonlocal variables in a parallel computation without restructuring the code or introducing races. This dissertation introduces memory-mapped reducers, which admits a much more efficient access compared to existing implementations. When used in large quantity, reducers incur unnecessarily high overhead in execution time and space consumption. This dissertation describes support for reducer arrays, which offers the same functionality as an array of reducers with significantly less overhead. Transactional memory is a high-level synchronization mechanism, designed to be easier to use and more composable than fine-grain locking. This dissertation presents ownership-aware transactions, the first transactional memory design that provides provable safety guarantees for "opennested" transactions. On architectures that implement memory models weaker than sequential consistency, programs communicating via shared memory must employ memory-fences to ensure correct execution. This dissertation examines the concept of location-based memoryfences, which unlike traditional memory fences, incurs latency only when synchronization is necessary.by I-Ting Angelina Lee.Ph.D

    The StarTech massively parallel chess program

    No full text
    The StarTech massively parallel chess program, running on a 512-processor Connection Machine CM-5 supercomputer, tied for third place at the 1993 ACM International Computer Chess Championship. StarTech employs the Jamboree search algorithm, a natural extension of J. Pearl’s Scout search algorithm, to find parallelism in game-tree searches. StarTech’s work-stealing scheduler distributes the work specified by the search algorithm across the processors of the CM-5. StarTech uses a global transposition table shared among the processors. StarTech has an informally estimated rating of over 2400 USCF. Two performance measures help in understanding the performance of the StarTech program: the work,W, and the critical path length,C. The Jamboree search algorithm used in StarTech seems to perform about 2 to 3 times more work than does our best serial implementation. The critical path length, under tournament conditions, is less than 0.1% of the total work, yielding an average parallelism of over 1000. The StarTech scheduler achieves actual performance of approximatelyT1:02W=P+1:5ConPprocessors. The critical path and work can be used to tune performance by allowing development of the program on a small,readily accessable, machine while predicting the performance on a big, tournament-sized, machine.
    corecore