254 research outputs found
Parallel Program Composition with Paragraphs in Stapl
Languages and tools currently available for the development of parallel applications are difficult to learn and use. The Standard Template Adaptive Parallel Library (STAPL) is being developed to make it easier for programmers to implement a parallel
application.
STAPL is a parallel programming library for C++ that adopts the
generic programming philosophy of the C++ Standard Template Library. STAPL provides collections of parallel algorithms (pAlgorithms) and containers (pContainers) that allow a developer to write their application without reimplementing the algorithms and data structures commonly used in parallel computing. pViews in STAPL are abstract data types that provide generic data access operations independently of the type of pContainer used to store the data.
Algorithms and applications have a formal, high level representation
in STAPL. A computation in STAPL is represented as a parallel task graph, which we call a PARAGRAPH. A PARAGRAPH contains a representation of the algorithm's input data, the operations that are used to transform individual data elements, and the ordering between the application of operations that transform the same data element. Just as programs are the result of a composition of algorithms, STAPL programs are the result of a composition of PARAGRAPHs.
This dissertation develops the PARAGRAPH program representation and its compositional methods. PARAGRAPHs improve the developer's difficult situation by simplifying what she must specify when writing a parallel algorithm.
The performance of the PARAGRAPH is evaluated using parallel generic
algorithms, benchmarks from the NAS suite, and a nuclear particle transport application that has been written using STAPL. Our experiments were performed on Cray XT4 and Cray XE6 massively parallel
systems and an IBM Power5 cluster, and show that scalable performance
beyond 16,000 processors is possible using the PARAGRAPH
Goal-based h-adaptivity of the 1-D diamond difference discrete ordinate method.
The quantity of interest (QoI) associated with a solution of a partial differential equation (PDE) is not, in general, the solution itself, but a functional of the solution. Dual weighted residual (DWR) error estimators are one way of providing an estimate of the error in the QoI resulting from the discretisation of the PDE. This paper aims to provide an estimate of the error in the QoI due to the spatial discretisation, where the discretisation scheme being used is the diamond difference (DD) method in space and discrete ordinate (SNSN) method in angle. The QoI are reaction rates in detectors and the value of the eigenvalue (Keff)(Keff) for 1-D fixed source and eigenvalue (KeffKeff criticality) neutron transport problems respectively. Local values of the DWR over individual cells are used as error indicators for goal-based mesh refinement, which aims to give an optimal mesh for a given QoI
Exploiting pipelined executions in OpenMP
We propose a set of extensions to the OpenMP programming model to express point-to-point synchronisation schemes. This is accomplished by defining, in the form of directives, precedence relations among the tasks that are originated from OpenMP work-sharing constructs. The proposal is based on the definition of a name space that identifies the work parceled out by these work-sharing constructs. Then the programmer defines the precedence relations using this name space. This relieves the programmer from the burden of defining complex synchronization data structures and the insertion of explicit synchronization actions in the program that make the program difficult to understand and maintain. We briefly describe the main aspects of the runtime implementation required to support precedence relations in OpenMP. We focus on the evaluation of the proposal through its use two benchmarks: NAS LU and ASCI Seep3dThis research has been supported by the Ministry of Science and Technology of Spain and the European Union (FEDER funds) under contract TIC2001-0995-C02-01.Peer ReviewedPostprint (author's final draft
Algorithm-Level Optimizations for Scalable Parallel Graph Processing
Efficiently processing large graphs is challenging, since parallel graph algorithms suffer from
poor scalability and performance due to many factors, including heavy communication and load-imbalance.
Furthermore, it is difficult to express graph algorithms, as users need to understand
and effectively utilize the underlying execution of the algorithm on the distributed system. The
performance of graph algorithms depends not only on the characteristics of the system (such as
latency, available RAM, etc.), but also on the characteristics of the input graph (small-world scalefree,
mesh, long-diameter, etc.), and characteristics of the algorithm (sparse computation vs. dense
communication). The best execution strategy, therefore, often heavily depends on the combination
of input graph, system and algorithm.
Fine-grained expression exposes maximum parallelism in the algorithm and allows the user to
concentrate on a single vertex, making it easier to express parallel graph algorithms. However,
this often loses information about the machine, making it difficult to extract performance and
scalability from fine-grained algorithms.
To address these issues, we present a model for expressing parallel graph algorithms using a
fine-grained expression. Our model decouples the algorithm-writer from the underlying details
of the system, graph, and execution and tuning of the algorithm. We also present various graph
paradigms that optimize the execution of graph algorithms for various types of input graphs and
systems. We show our model is general enough to allow graph algorithms to use the various graph
paradigms for the best/fastest execution, and demonstrate good performance and scalability for
various different graphs, algorithms, and systems to 100,000+ cores
- …