241,003 research outputs found
The Paragraph: Design and Implementation of the STAPL Parallel Task Graph
Parallel programming is becoming mainstream due to the increased availability
of multiprocessor and multicore architectures and the need to solve larger and
more complex problems. Languages and tools available for the development of
parallel applications are often difficult to learn and use. The Standard Template
Adaptive Parallel Library (STAPL) is being developed to help programmers
address these difficulties.
STAPL is a parallel C++ library with functionality similar to STL, the ISO
adopted C++ Standard Template Library. STAPL provides
a collection of parallel pContainers for data storage and pViews that
provide uniform data access operations by abstracting away the details of
the pContainer data distribution. Generic pAlgorithms are written in terms of PARAGRAPHs,
high level task graphs expressed as a composition of common parallel patterns.
These task graphs define a set of operations on pViews as well as any
ordering (i.e., dependences) on these operations that must be enforced by
STAPL for a valid execution. The subject of this dissertation is the PARAGRAPH Executor,
a framework that manages the runtime instantiation and execution of STAPL
PARAGRAPHS.
We address several challenges present when using a task graph program representation
and discuss a novel approach to dependence specification which allows task graph creation
and execution to proceed concurrently. This overlapping increases scalability and
reduces the resources required by the PARAGRAPH Executor. We also describe the interface for task
specification as well as optimizations that address issues such as data locality.
We evaluate the performance of the PARAGRAPH Executor on several parallel machines including
massively parallel Cray XT4 and Cray XE6 systems and an IBM Power5 cluster.
Using tests including generic parallel algorithms, kernels from the NAS NPB suite,
and a nuclear particle transport application written in STAPL, we demonstrate that the
PARAGRAPH Executor enables STAPL to exhibit good scalability on more than processors
Advances in the Design and Implementation of a Multi-Tier Architecture in the GIPSY Environment
We present advances in the software engineering design and implementation of
the multi-tier run-time system for the General Intensional Programming System
(GIPSY) by further unifying the distributed technologies used to implement the
Demand Migration Framework (DMF) in order to streamline distributed execution
of hybrid intensional-imperative programs using Java.Comment: 11 pages, 3 figure
Group Communication Patterns for High Performance Computing in Scala
We developed a Functional object-oriented Parallel framework (FooPar) for
high-level high-performance computing in Scala. Central to this framework are
Distributed Memory Parallel Data structures (DPDs), i.e., collections of data
distributed in a shared nothing system together with parallel operations on
these data. In this paper, we first present FooPar's architecture and the idea
of DPDs and group communications. Then, we show how DPDs can be implemented
elegantly and efficiently in Scala based on the Traversable/Builder pattern,
unifying Functional and Object-Oriented Programming. We prove the correctness
and safety of one communication algorithm and show how specification testing
(via ScalaCheck) can be used to bridge the gap between proof and
implementation. Furthermore, we show that the group communication operations of
FooPar outperform those of the MPJ Express open source MPI-bindings for Java,
both asymptotically and empirically. FooPar has already been shown to be
capable of achieving close-to-optimal performance for dense matrix-matrix
multiplication via JNI. In this article, we present results on a parallel
implementation of the Floyd-Warshall algorithm in FooPar, achieving more than
94 % efficiency compared to the serial version on a cluster using 100 cores for
matrices of dimension 38000 x 38000
PaPaS: A Portable, Lightweight, and Generic Framework for Parallel Parameter Studies
The current landscape of scientific research is widely based on modeling and
simulation, typically with complexity in the simulation's flow of execution and
parameterization properties. Execution flows are not necessarily
straightforward since they may need multiple processing tasks and iterations.
Furthermore, parameter and performance studies are common approaches used to
characterize a simulation, often requiring traversal of a large parameter
space. High-performance computers offer practical resources at the expense of
users handling the setup, submission, and management of jobs. This work
presents the design of PaPaS, a portable, lightweight, and generic workflow
framework for conducting parallel parameter and performance studies. Workflows
are defined using parameter files based on keyword-value pairs syntax, thus
removing from the user the overhead of creating complex scripts to manage the
workflow. A parameter set consists of any combination of environment variables,
files, partial file contents, and command line arguments. PaPaS is being
developed in Python 3 with support for distributed parallelization using SSH,
batch systems, and C++ MPI. The PaPaS framework will run as user processes, and
can be used in single/multi-node and multi-tenant computing systems. An example
simulation using the BehaviorSpace tool from NetLogo and a matrix multiply
using OpenMP are presented as parameter and performance studies, respectively.
The results demonstrate that the PaPaS framework offers a simple method for
defining and managing parameter studies, while increasing resource utilization.Comment: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
The STAPL Parallel Container Framework
The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming infrastructure that extends C with support for parallelism. STAPL provides a run-time system, a collection of distributed data structures (pContainers) and parallel algorithms (pAlgorithms), and a generic methodology for extending them to provide customized functionality.
Parallel containers are data structures addressing issues related to data partitioning, distribution, communication, synchronization, load balancing, and thread safety. This dissertation presents the STAPL Parallel Container Framework (PCF), which is designed to facilitate the development of generic parallel containers. We introduce a set of concepts and a methodology for assembling a pContainer from existing sequential or parallel containers without requiring the programmer to deal with concurrency or data distribution issues. The STAPL PCF provides a large number of basic data parallel structures (e.g., pArray, pList, pVector, pMatrix, pGraph, pMap, pSet). The STAPL PCF is distinguished from existing work by offering a class hierarchy and a composition mechanism which allows users to extend and customize the current container base for improved application expressivity and performance.
We evaluate the performance of the STAPL pContainers on various parallel machines including a massively parallel CRAY XT4 system and an IBM P5-575 cluster. We show that the pContainer methods, generic pAlgorithms, and different applications, all provide good scalability on more than 10^4 processors
- …