36 research outputs found

    Concurrent Collections (CnC): A new approach to parallel programming

    No full text
    A common approach in designing parallel languages is to provide some high level handles to manipulate the use of the parallel platform. This exposes some aspects of the target platform, for example, shared vs. distributed memory. It may expose some but not all types of parallelism, for example, data parallelism but not task parallelism.&nbsp;This approach must find a balance between the desire to provide a simple view for the domain expert and provide sufficient power for tuning.&nbsp;This is hard for any given architecture and harder if the language is to apply to a range of architectures. Either simplicity or power is lost. Instead of viewing the language design problem as one of providing the programmer with high level handles, we view the problem as one of designing an interface. On one side of this interface is the programmer (domain expert) who knows the application but needs no knowledge of any aspects of the platform. On the other side of the interface is the performance expert (programmer or program) who demands maximal flexibility for optimizing the mapping to a wide range of target platforms (parallel / serial, shared / distributed, homogeneous / heterogeneous, etc.) but needs no knowledge of the domain. Concurrent Collections (CnC) is based on this separation of concerns. The talk will present CnC and its benefits. About the speaker Kathleen Knobe has focused throughout her career on parallelism especially compiler technology, runtime system design and language design. She worked at Compass (aka Massachusetts Computer Associates) from 1980 to 1991 designing compilers for a wide range of parallel platforms for Thinking Machines, MasPar, Alliant, Numerix, and several government projects. In 1991 she decided to finish her education. After graduating from MIT in 1997, she joined Digital Equipment’s Cambridge Research Lab (CRL). She stayed through the DEC/Compaq/HP mergers and when CRL was acquired and absorbed by Intel. She currently works in the Software and Services Group / Technology Pathfinding and Innovation. </p

    Abstract Array SSA form and its use in Parallelization

    No full text
    Static single assignment (SSA) form for scalars has been a significant advance. It has simplified the way we think about scalar variables. It has simpliied the design of some optimizations and has made other optimizations more ef-fective. Unfortunately none of thii can be be said for SSA form for arrays. The current SSA processing of arrays views an array as a single object. But the kinds of analyses that sophisticated compilers need to perform on arrays, for exam-ple those that drive loop parallelization, are at the element level. Current SSA form for arrays is incapable of providing the element-level data flow information required for such analyses. In this paper, we introduce an Array SSA form that cap-tures precise element-level data flow information for array variables in all cases. It is general and simple, and coincides with standard SSA form when applied to scalar variables. It can also be used for structures and other variable types that can be modeled as arrays. An important application of Array SSA form is in automatic parallelization. We show how Array SSA form can enable parallelization of any loop that is free of loop-carried true data dependences. This in-cludes loops with loop-carried anti and output dependences, unanalyzable subscript expressions, and arbitrary control flow within an iteration. Array SSA form achieves this level of generality by making manifest its 4 functions as runtime computations in cases that are not amenable to compile-time analysis

    A method for inferring context-free grammars

    No full text

    The subspace model : shape-based compilation for parallel systems

    No full text
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (p. 207-211).by Kathleen B. Knobe.Ph.D

    The Subspace Model: Shape-based Compilation for Parallel Systems

    No full text
    Subspace analysis is a target independent parallel compilation technique. It applies to a wide range of parallel architectures including vector, SIMD, distributed memory MIMD, shared memory MIMD, symmetric multiprocessors and VLIW systems. The focus of the subspace analysis is shape. The shape of an object is a subset of the iteration indices. Each index represents an axis of the object. The concept of shape is a hidden but crucial theme underlying the work in parallelism detection algorithms, many architecture specific optimizations and many strategies for compiling to parallel programs. Parallelization is shape-based. So are a variety of optimizations (e.g., privatization), strategies (e.g., the replication of scalars in SPMD systems) and language issues (e.g., conformance in Fortran 90). Shape is of critical importance in parallel systems since a shape that is too small (has too few axes) can result in unnecessary serialization whereas a shape that is too large (has too many axes) c..

    Performance Evaluation of Concurrent Collections on High-Performance Multicore Computing Systems

    No full text
    This paper is the first extensive performance study of a recently proposed parallel programming model, called Concurrent Collections (CnC). In CnC, the programmer expresses her computation in terms of application-specific operations, partially-ordered by semantic scheduling constraints. The CnC model is well-suited to expressing asynchronous-parallel algorithms, so we evaluate CnC using two dense linear algebra algorithms in this style for execution on state-of-the-art multicore systems: (i) a recently proposed asynchronous-parallel Cholesky factorization algorithm, (ii) a novel and non-trivial “higher-level” partly-asynchronous generalized eigensolver for dense symmetric matrices. Given a well-tuned sequential BLAS, our implementations match or exceed competing multithreaded vendor-tuned codes by up to 2.6×. Our evaluation compares with alternative models, including ScaLAPACK with a shared memory MPI, OpenMP, Cilk++, and PLASMA 2.0, on Intel Harpertown, Nehalem, and AMD Barcelona systems. Looking forward, we identify new opportunities to improve the CnC language and runtime scheduling and execution.

    The Subspace Model: A Theory of Shapes for Parallel Systems

    No full text
    This paper presents a shape based abstraction for compiling to parallel systems. Data layout is often the subject of direct analysis while shape is addressed in ad hoc ways at best. However, a suboptimal shape can be more costly than a suboptimal location. Unnecessary serialization can result if the shape used is too small and unnecessary communication and computation can result if the shape is too large. For each dimension in the space of an object, the object attains that dimension, serially, in parallel or via a parallel prefix operation. These expansion categories are O(N), O(1) and O(log N) respectively, where N is the extent of the dimension. Using an expansion category slower than the natural one is unnecessarily slow. The subspace model addresses these problems. There are three major aspects of this work. First, the subspace model is useful by itself. The subspace abstraction subsumes existing shape related optimizations (such as privatization and invariant code motion) and sha..
    corecore