16 research outputs found

    The STAPL pList

    Get PDF
    We present the design and implementation of the Standard Template Adap- tive Parallel Library (stapl) pList, a parallel container that has the properties of a sequential list, but allows for scalable concurrent access when used in a paral- lel program. The stapl is a parallel programming library that extends C with support for parallelism. stapl provides a collection of distributed data structures (pContainers) and parallel algorithms (pAlgorithms) and a generic methodology for extending them to provide customized functionality. stapl pContainers are thread-safe, concurrent objects, providing appropriate interfaces (pViews) that can be used by generic pAlgorithms. The pList provides Standard Template Library (stl) equivalent methods, such as insert, erase, and splice, additional methods such as split, and efficient asyn- chronous (non-blocking) variants of some methods for improved parallel performance. List related algorithms such as list ranking, Euler Tour (ET), and its applications to compute tree based functions can be computed efficiently and expressed naturally using the pList. Lists are not usually considered useful in parallel algorithms because they do not allow random access to its elements. Instead, they access elements through a serializing traversal of the list. Our design of the pList, which consists of a collec- tion of distributed lists (base containers), provides almost random access to its base containers. The degree of parallelism supported can be tuned by setting the number of base containers. Thus, a key feature of the pList is that it offers the advantages of a classical list while enabling scalable parallelism. We evaluate the performance of the stapl pList on an IBM Power 5 cluster and on a CRAY XT4 massively parallel processing system. Although lists are generally not considered good data structures for parallel processing, we show that pList methods and pAlgorithms, and list related algorithms such as list ranking and ET technique operating on pLists provide good scalability on more than 16, 000 processors. We also show that the pList compares favorably with other dynamic data structures such as the pVector that explicitly support random access

    The STAPL Parallel Container Framework

    Get PDF
    The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming infrastructure that extends C with support for parallelism. STAPL provides a run-time system, a collection of distributed data structures (pContainers) and parallel algorithms (pAlgorithms), and a generic methodology for extending them to provide customized functionality. Parallel containers are data structures addressing issues related to data partitioning, distribution, communication, synchronization, load balancing, and thread safety. This dissertation presents the STAPL Parallel Container Framework (PCF), which is designed to facilitate the development of generic parallel containers. We introduce a set of concepts and a methodology for assembling a pContainer from existing sequential or parallel containers without requiring the programmer to deal with concurrency or data distribution issues. The STAPL PCF provides a large number of basic data parallel structures (e.g., pArray, pList, pVector, pMatrix, pGraph, pMap, pSet). The STAPL PCF is distinguished from existing work by offering a class hierarchy and a composition mechanism which allows users to extend and customize the current container base for improved application expressivity and performance. We evaluate the performance of the STAPL pContainers on various parallel machines including a massively parallel CRAY XT4 system and an IBM P5-575 cluster. We show that the pContainer methods, generic pAlgorithms, and different applications, all provide good scalability on more than 10^4 processors

    10191 Abstracts Collection -- Program Composition and Optimization : Autotuning, Scheduling, Metaprogramming and Beyond

    Get PDF
    From May 9 to 12, 2010, the Dagstuhl Seminar 10191 ``Program Composition and Optimization: Autotuning, Scheduling, Metaprogramming and Beyond\u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    MULTIMAP AND MULTISET DATA STRUCTURES IN STAPL

    Get PDF
    The Standard Template Adaptive Parallel Library (STAPL) is an e_cient programming framework whose components make it easier to implement parallel applications that can utilize multiple processors to solve large problems concurrently [1]. STAPL is developed using the C++ programming language and provides parallel equivalents of many algorithms and data structures (containers) found in its Standard Template Library (STL). Although STAPL contains a large collection of parallel data structures and algorithms, there are still many algorithms and containers that are not yet implemented in STAPL. Multimap and multiset are two associative containers that are included in STL but not yet implemented in STAPL. The goal of this work is to design and implement the parallel multimap and parallel multiset containers that provide the same functionality as their STL counterparts while enabling parallel computation on large scale data

    PList-based Divide and Conquer Parallel Programming

    Get PDF
    This paper details an extension of a Java parallel programming framework – JPLF. The JPLF framework is a programming framework that helps programmers build parallel programs using existing building blocks. The framework is based on {em PowerLists} and PList Theories and it naturally supports multi-way Divide and Conquer. By using this framework, the programmer is exempted from dealing with all the complexities of writing parallel programs from scratch. This extension to the JPLF framework adds PLists support to the framework and so, it enlarges the applicability of the framework to a larger set of parallel solvable problems. Using this extension, we may apply more flexible data division strategies. In addition, the length of the input lists no longer has to be a power of two – as required by the PowerLists theory. In this paper we unveil new applications that emphasize the new class of computations that can be executed within the JPLF framework. We also give a detailed description of the data structures and functions involved in the PLists extension of the JPLF, and extended performance experiments are described and analyzed

    Parallel Seismic Ray Tracing

    Get PDF
    Seismic ray tracing is a common method for understanding and modeling seismic wave propagation. The wavefront construction (WFC) method handles wavefronts instead of individual rays, thereby providing a mechanism to control ray density on the wavefront. In this thesis we present the design and implementation of a parallel wavefront construction algorithm (pWFC) for seismic ray tracing. The proposed parallel algo- rithm is developed using the stapl library for parallel C++ code.We present the idea of modeling ray tubes with an additional ray in the center to facilitate parallelism. The parallel wavefront construction algorithm is applied to wide range of models such as simple synthetic models that enable us to study various aspects of the method while others are intended to be representative of basic geological features such as salt domes. We also present a theoretical model to understand the performance of the pWFC algorithm. We evaluate the performance of the proposed parallel wavefront construction algorithm on an IBM Power 5 cluster. We study the effect of using different mesh types, varying the position of source and their number etc. The method is shown to provide good scalable performance for different models. Load balancing is also shown to be the major factor hindering the performance of the algorithm. We provide two load balancing algorithms to solve the load imbalance problem. These algorithms will be developed as an extension of the current work

    The Paragraph: Design and Implementation of the STAPL Parallel Task Graph

    Get PDF
    Parallel programming is becoming mainstream due to the increased availability of multiprocessor and multicore architectures and the need to solve larger and more complex problems. Languages and tools available for the development of parallel applications are often difficult to learn and use. The Standard Template Adaptive Parallel Library (STAPL) is being developed to help programmers address these difficulties. STAPL is a parallel C++ library with functionality similar to STL, the ISO adopted C++ Standard Template Library. STAPL provides a collection of parallel pContainers for data storage and pViews that provide uniform data access operations by abstracting away the details of the pContainer data distribution. Generic pAlgorithms are written in terms of PARAGRAPHs, high level task graphs expressed as a composition of common parallel patterns. These task graphs define a set of operations on pViews as well as any ordering (i.e., dependences) on these operations that must be enforced by STAPL for a valid execution. The subject of this dissertation is the PARAGRAPH Executor, a framework that manages the runtime instantiation and execution of STAPL PARAGRAPHS. We address several challenges present when using a task graph program representation and discuss a novel approach to dependence specification which allows task graph creation and execution to proceed concurrently. This overlapping increases scalability and reduces the resources required by the PARAGRAPH Executor. We also describe the interface for task specification as well as optimizations that address issues such as data locality. We evaluate the performance of the PARAGRAPH Executor on several parallel machines including massively parallel Cray XT4 and Cray XE6 systems and an IBM Power5 cluster. Using tests including generic parallel algorithms, kernels from the NAS NPB suite, and a nuclear particle transport application written in STAPL, we demonstrate that the PARAGRAPH Executor enables STAPL to exhibit good scalability on more than 10410^4 processors

    Parallel Seismic Ray Tracing

    Get PDF
    Seismic ray tracing is a common method for understanding and modeling seismic wave propagation. The wavefront construction (WFC) method handles wavefronts instead of individual rays, thereby providing a mechanism to control ray density on the wavefront. In this thesis we present the design and implementation of a parallel wavefront construction algorithm (pWFC) for seismic ray tracing. The proposed parallel algo- rithm is developed using the stapl library for parallel C++ code.We present the idea of modeling ray tubes with an additional ray in the center to facilitate parallelism. The parallel wavefront construction algorithm is applied to wide range of models such as simple synthetic models that enable us to study various aspects of the method while others are intended to be representative of basic geological features such as salt domes. We also present a theoretical model to understand the performance of the pWFC algorithm. We evaluate the performance of the proposed parallel wavefront construction algorithm on an IBM Power 5 cluster. We study the effect of using different mesh types, varying the position of source and their number etc. The method is shown to provide good scalable performance for different models. Load balancing is also shown to be the major factor hindering the performance of the algorithm. We provide two load balancing algorithms to solve the load imbalance problem. These algorithms will be developed as an extension of the current work

    Parallel Program Composition with Paragraphs in Stapl

    Get PDF
    Languages and tools currently available for the development of parallel applications are difficult to learn and use. The Standard Template Adaptive Parallel Library (STAPL) is being developed to make it easier for programmers to implement a parallel application. STAPL is a parallel programming library for C++ that adopts the generic programming philosophy of the C++ Standard Template Library. STAPL provides collections of parallel algorithms (pAlgorithms) and containers (pContainers) that allow a developer to write their application without reimplementing the algorithms and data structures commonly used in parallel computing. pViews in STAPL are abstract data types that provide generic data access operations independently of the type of pContainer used to store the data. Algorithms and applications have a formal, high level representation in STAPL. A computation in STAPL is represented as a parallel task graph, which we call a PARAGRAPH. A PARAGRAPH contains a representation of the algorithm's input data, the operations that are used to transform individual data elements, and the ordering between the application of operations that transform the same data element. Just as programs are the result of a composition of algorithms, STAPL programs are the result of a composition of PARAGRAPHs. This dissertation develops the PARAGRAPH program representation and its compositional methods. PARAGRAPHs improve the developer's difficult situation by simplifying what she must specify when writing a parallel algorithm. The performance of the PARAGRAPH is evaluated using parallel generic algorithms, benchmarks from the NAS suite, and a nuclear particle transport application that has been written using STAPL. Our experiments were performed on Cray XT4 and Cray XE6 massively parallel systems and an IBM Power5 cluster, and show that scalable performance beyond 16,000 processors is possible using the PARAGRAPH

    STAPL-RTS: A Runtime System for Massive Parallelism

    Get PDF
    Modern High Performance Computing (HPC) systems are complex, with deep memory hierarchies and increasing use of computational heterogeneity via accelerators. When developing applications for these platforms, programmers are faced with two bad choices. On one hand, they can explicitly manage machine resources, writing programs using low level primitives from multiple APIs (e.g., MPI+OpenMP), creating efficient but rigid, difficult to extend, and non-portable implementations. Alternatively, users can adopt higher level programming environments, often at the cost of lost performance. Our approach is to maintain the high level nature of the application without sacrificing performance by relying on the transfer of high level, application semantic knowledge between layers of the software stack at an appropriate level of abstraction and performing optimizations on a per-layer basis. In this dissertation, we present the STAPL Runtime System (STAPL-RTS), a runtime system built for portable performance, suitable for massively parallel machines. While the STAPL-RTS abstracts and virtualizes the underlying platform for portability, it uses information from the upper layers to perform the appropriate low level optimizations that restore the performance characteristics. We outline the fundamental ideas behind the design of the STAPL-RTS, such as the always distributed communication model and its asynchronous operations. Through appropriate code examples and benchmarks, we prove that high level information allows applications written on top of the STAPL-RTS to attain the performance of optimized, but ad hoc solutions. Using the STAPL library, we demonstrate how this information guides important decisions in the STAPL-RTS, such as multi-protocol communication coordination and request aggregation using established C++ programming idioms. Recognizing that nested parallelism is of increasing interest for both expressivity and performance, we present a parallel model that combines asynchronous, one-sided operations with isolated nested parallel sections. Previous approaches to nested parallelism targeted either static applications through the use of blocking, isolated sections, or dynamic applications by using asynchronous mechanisms (i.e., recursive task spawning) which come at the expense of isolation. We combine the flexibility of dynamic task creation with the isolation guarantees of the static models by allowing the creation of asynchronous, one-sided nested parallel sections that work in tandem with the more traditional, synchronous, collective nested parallelism. This allows selective, run-time customizable use of parallelism in an application, based on the input and the algorithm
    corecore