24 research outputs found

    The Paragraph: Design and Implementation of the STAPL Parallel Task Graph

    Get PDF
    Parallel programming is becoming mainstream due to the increased availability of multiprocessor and multicore architectures and the need to solve larger and more complex problems. Languages and tools available for the development of parallel applications are often difficult to learn and use. The Standard Template Adaptive Parallel Library (STAPL) is being developed to help programmers address these difficulties. STAPL is a parallel C++ library with functionality similar to STL, the ISO adopted C++ Standard Template Library. STAPL provides a collection of parallel pContainers for data storage and pViews that provide uniform data access operations by abstracting away the details of the pContainer data distribution. Generic pAlgorithms are written in terms of PARAGRAPHs, high level task graphs expressed as a composition of common parallel patterns. These task graphs define a set of operations on pViews as well as any ordering (i.e., dependences) on these operations that must be enforced by STAPL for a valid execution. The subject of this dissertation is the PARAGRAPH Executor, a framework that manages the runtime instantiation and execution of STAPL PARAGRAPHS. We address several challenges present when using a task graph program representation and discuss a novel approach to dependence specification which allows task graph creation and execution to proceed concurrently. This overlapping increases scalability and reduces the resources required by the PARAGRAPH Executor. We also describe the interface for task specification as well as optimizations that address issues such as data locality. We evaluate the performance of the PARAGRAPH Executor on several parallel machines including massively parallel Cray XT4 and Cray XE6 systems and an IBM Power5 cluster. Using tests including generic parallel algorithms, kernels from the NAS NPB suite, and a nuclear particle transport application written in STAPL, we demonstrate that the PARAGRAPH Executor enables STAPL to exhibit good scalability on more than 10410^4 processors

    Parallel Program Composition with Paragraphs in Stapl

    Get PDF
    Languages and tools currently available for the development of parallel applications are difficult to learn and use. The Standard Template Adaptive Parallel Library (STAPL) is being developed to make it easier for programmers to implement a parallel application. STAPL is a parallel programming library for C++ that adopts the generic programming philosophy of the C++ Standard Template Library. STAPL provides collections of parallel algorithms (pAlgorithms) and containers (pContainers) that allow a developer to write their application without reimplementing the algorithms and data structures commonly used in parallel computing. pViews in STAPL are abstract data types that provide generic data access operations independently of the type of pContainer used to store the data. Algorithms and applications have a formal, high level representation in STAPL. A computation in STAPL is represented as a parallel task graph, which we call a PARAGRAPH. A PARAGRAPH contains a representation of the algorithm's input data, the operations that are used to transform individual data elements, and the ordering between the application of operations that transform the same data element. Just as programs are the result of a composition of algorithms, STAPL programs are the result of a composition of PARAGRAPHs. This dissertation develops the PARAGRAPH program representation and its compositional methods. PARAGRAPHs improve the developer's difficult situation by simplifying what she must specify when writing a parallel algorithm. The performance of the PARAGRAPH is evaluated using parallel generic algorithms, benchmarks from the NAS suite, and a nuclear particle transport application that has been written using STAPL. Our experiments were performed on Cray XT4 and Cray XE6 massively parallel systems and an IBM Power5 cluster, and show that scalable performance beyond 16,000 processors is possible using the PARAGRAPH

    MULTIMAP AND MULTISET DATA STRUCTURES IN STAPL

    Get PDF
    The Standard Template Adaptive Parallel Library (STAPL) is an e_cient programming framework whose components make it easier to implement parallel applications that can utilize multiple processors to solve large problems concurrently [1]. STAPL is developed using the C++ programming language and provides parallel equivalents of many algorithms and data structures (containers) found in its Standard Template Library (STL). Although STAPL contains a large collection of parallel data structures and algorithms, there are still many algorithms and containers that are not yet implemented in STAPL. Multimap and multiset are two associative containers that are included in STL but not yet implemented in STAPL. The goal of this work is to design and implement the parallel multimap and parallel multiset containers that provide the same functionality as their STL counterparts while enabling parallel computation on large scale data

    The STAPL Parallel Container Framework

    Get PDF
    The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming infrastructure that extends C with support for parallelism. STAPL provides a run-time system, a collection of distributed data structures (pContainers) and parallel algorithms (pAlgorithms), and a generic methodology for extending them to provide customized functionality. Parallel containers are data structures addressing issues related to data partitioning, distribution, communication, synchronization, load balancing, and thread safety. This dissertation presents the STAPL Parallel Container Framework (PCF), which is designed to facilitate the development of generic parallel containers. We introduce a set of concepts and a methodology for assembling a pContainer from existing sequential or parallel containers without requiring the programmer to deal with concurrency or data distribution issues. The STAPL PCF provides a large number of basic data parallel structures (e.g., pArray, pList, pVector, pMatrix, pGraph, pMap, pSet). The STAPL PCF is distinguished from existing work by offering a class hierarchy and a composition mechanism which allows users to extend and customize the current container base for improved application expressivity and performance. We evaluate the performance of the STAPL pContainers on various parallel machines including a massively parallel CRAY XT4 system and an IBM P5-575 cluster. We show that the pContainer methods, generic pAlgorithms, and different applications, all provide good scalability on more than 10^4 processors

    The STAPL pList

    Get PDF
    We present the design and implementation of the Standard Template Adap- tive Parallel Library (stapl) pList, a parallel container that has the properties of a sequential list, but allows for scalable concurrent access when used in a paral- lel program. The stapl is a parallel programming library that extends C with support for parallelism. stapl provides a collection of distributed data structures (pContainers) and parallel algorithms (pAlgorithms) and a generic methodology for extending them to provide customized functionality. stapl pContainers are thread-safe, concurrent objects, providing appropriate interfaces (pViews) that can be used by generic pAlgorithms. The pList provides Standard Template Library (stl) equivalent methods, such as insert, erase, and splice, additional methods such as split, and efficient asyn- chronous (non-blocking) variants of some methods for improved parallel performance. List related algorithms such as list ranking, Euler Tour (ET), and its applications to compute tree based functions can be computed efficiently and expressed naturally using the pList. Lists are not usually considered useful in parallel algorithms because they do not allow random access to its elements. Instead, they access elements through a serializing traversal of the list. Our design of the pList, which consists of a collec- tion of distributed lists (base containers), provides almost random access to its base containers. The degree of parallelism supported can be tuned by setting the number of base containers. Thus, a key feature of the pList is that it offers the advantages of a classical list while enabling scalable parallelism. We evaluate the performance of the stapl pList on an IBM Power 5 cluster and on a CRAY XT4 massively parallel processing system. Although lists are generally not considered good data structures for parallel processing, we show that pList methods and pAlgorithms, and list related algorithms such as list ranking and ET technique operating on pLists provide good scalability on more than 16, 000 processors. We also show that the pList compares favorably with other dynamic data structures such as the pVector that explicitly support random access

    Unordered Associative Containers in STAPL

    Get PDF
    The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming framework for C++ that provides parallel algorithms and containers similar to those found in the Standard Template Library (STL). Currently STAPL is lacking implementations for three unordered associative containers: unordered set, unordered multiset, and unordered multimap. These are commonly used containers in the field of computer science; therefore, their implementations are a necessity for STAPL. The similarity of operations and structure between each container will allow a large portion of code to be reused. The goal of this work is to design and create a parallel implementation of these containers that provides the same user-level facilities as their STL equivalents and displays a high level of scalability when executed on a large number of processors

    Dynamic Load Balancing in a Geophysics Application Using STAPL

    Get PDF
    Seismic wavefront simulation is a common method to understand the composition of earth below the surface, especially for hydrocarbon exploration. One of these simulation methods is the wavefront construction algorithm. In this thesis, we reduced the load imbalance in a parallel implementation of the wavefront construction algorithm. We added a generic redistribution framework for data structures in the C++ parallel library STAPL. We present a redistribution algorithm for the parallel wavefront construction application which uses the recursive coordinate bisection method to find a near-optimal data distribution of the data. This algorithm leveraged the added redistribution features in STAPL to improve the running time of our application. We compared the run time of the application with and without redistribution on different geophysics models. We show that the proposed redistribution provides up to 9.45x speedup on a Cray XE6m cluster and 11.85x speedup on an IBM BlueGene/Q cluster

    Parallel Seismic Ray Tracing

    Get PDF
    Seismic ray tracing is a common method for understanding and modeling seismic wave propagation. The wavefront construction (WFC) method handles wavefronts instead of individual rays, thereby providing a mechanism to control ray density on the wavefront. In this thesis we present the design and implementation of a parallel wavefront construction algorithm (pWFC) for seismic ray tracing. The proposed parallel algo- rithm is developed using the stapl library for parallel C++ code.We present the idea of modeling ray tubes with an additional ray in the center to facilitate parallelism. The parallel wavefront construction algorithm is applied to wide range of models such as simple synthetic models that enable us to study various aspects of the method while others are intended to be representative of basic geological features such as salt domes. We also present a theoretical model to understand the performance of the pWFC algorithm. We evaluate the performance of the proposed parallel wavefront construction algorithm on an IBM Power 5 cluster. We study the effect of using different mesh types, varying the position of source and their number etc. The method is shown to provide good scalable performance for different models. Load balancing is also shown to be the major factor hindering the performance of the algorithm. We provide two load balancing algorithms to solve the load imbalance problem. These algorithms will be developed as an extension of the current work

    Parallel Seismic Ray Tracing

    Get PDF
    Seismic ray tracing is a common method for understanding and modeling seismic wave propagation. The wavefront construction (WFC) method handles wavefronts instead of individual rays, thereby providing a mechanism to control ray density on the wavefront. In this thesis we present the design and implementation of a parallel wavefront construction algorithm (pWFC) for seismic ray tracing. The proposed parallel algo- rithm is developed using the stapl library for parallel C++ code.We present the idea of modeling ray tubes with an additional ray in the center to facilitate parallelism. The parallel wavefront construction algorithm is applied to wide range of models such as simple synthetic models that enable us to study various aspects of the method while others are intended to be representative of basic geological features such as salt domes. We also present a theoretical model to understand the performance of the pWFC algorithm. We evaluate the performance of the proposed parallel wavefront construction algorithm on an IBM Power 5 cluster. We study the effect of using different mesh types, varying the position of source and their number etc. The method is shown to provide good scalable performance for different models. Load balancing is also shown to be the major factor hindering the performance of the algorithm. We provide two load balancing algorithms to solve the load imbalance problem. These algorithms will be developed as an extension of the current work

    Nested Parallelism with Algorithmic Skeletons

    Get PDF
    New trend in design of computer architectures, from memory hierarchy design to grouping computing units in different hierarchical levels in CPUs, pushes developers toward algorithms that can exploit these hierarchical designs. This trend makes support of nested-parallelism an important feature for parallel programming models. It enables implementation of parallel programs that can then be mapped onto the system hierarchy. However, supporting nested-parallelism is not a trivial task due to complexity in spawning nested sections, destructing them and more importantly communication between these nested parallel sections. Structured parallel programming models are proven to be a good choice since while they hide the parallel programming complexities from the programmers, they allow programmers to customize the algorithm execution without going through radical changes to the other parts of the program. In this thesis, nested algorithm composition in the STAPL Skeleton Library (SSL) is presented, which uses a nested dataflow model as its internal representation. We show how a high level program specification using SSL allows for asynchronous computation and improved locality. We study both the specification and performance of the STAPL implementation of Kripke, a mini-app developed by Lawrence Livermore National Laboratory. Kripke has multiple levels of parallelism and a number of data layouts, making it an excellent test bed to exercise the effectiveness of a nested parallel programming approach. Performance results are provided for six different nesting orders of the benchmark under different degrees of nested-parallelism, demonstrating the flexibility and performance of nested algorithmic skeleton composition in STAPL
    corecore