254 research outputs found

    Parallel Program Composition with Paragraphs in Stapl

    Get PDF
    Languages and tools currently available for the development of parallel applications are difficult to learn and use. The Standard Template Adaptive Parallel Library (STAPL) is being developed to make it easier for programmers to implement a parallel application. STAPL is a parallel programming library for C++ that adopts the generic programming philosophy of the C++ Standard Template Library. STAPL provides collections of parallel algorithms (pAlgorithms) and containers (pContainers) that allow a developer to write their application without reimplementing the algorithms and data structures commonly used in parallel computing. pViews in STAPL are abstract data types that provide generic data access operations independently of the type of pContainer used to store the data. Algorithms and applications have a formal, high level representation in STAPL. A computation in STAPL is represented as a parallel task graph, which we call a PARAGRAPH. A PARAGRAPH contains a representation of the algorithm's input data, the operations that are used to transform individual data elements, and the ordering between the application of operations that transform the same data element. Just as programs are the result of a composition of algorithms, STAPL programs are the result of a composition of PARAGRAPHs. This dissertation develops the PARAGRAPH program representation and its compositional methods. PARAGRAPHs improve the developer's difficult situation by simplifying what she must specify when writing a parallel algorithm. The performance of the PARAGRAPH is evaluated using parallel generic algorithms, benchmarks from the NAS suite, and a nuclear particle transport application that has been written using STAPL. Our experiments were performed on Cray XT4 and Cray XE6 massively parallel systems and an IBM Power5 cluster, and show that scalable performance beyond 16,000 processors is possible using the PARAGRAPH

    Goal-based h-adaptivity of the 1-D diamond difference discrete ordinate method.

    Get PDF
    The quantity of interest (QoI) associated with a solution of a partial differential equation (PDE) is not, in general, the solution itself, but a functional of the solution. Dual weighted residual (DWR) error estimators are one way of providing an estimate of the error in the QoI resulting from the discretisation of the PDE. This paper aims to provide an estimate of the error in the QoI due to the spatial discretisation, where the discretisation scheme being used is the diamond difference (DD) method in space and discrete ordinate (SNSN) method in angle. The QoI are reaction rates in detectors and the value of the eigenvalue (Keff)(Keff) for 1-D fixed source and eigenvalue (KeffKeff criticality) neutron transport problems respectively. Local values of the DWR over individual cells are used as error indicators for goal-based mesh refinement, which aims to give an optimal mesh for a given QoI

    Exploiting pipelined executions in OpenMP

    Get PDF
    We propose a set of extensions to the OpenMP programming model to express point-to-point synchronisation schemes. This is accomplished by defining, in the form of directives, precedence relations among the tasks that are originated from OpenMP work-sharing constructs. The proposal is based on the definition of a name space that identifies the work parceled out by these work-sharing constructs. Then the programmer defines the precedence relations using this name space. This relieves the programmer from the burden of defining complex synchronization data structures and the insertion of explicit synchronization actions in the program that make the program difficult to understand and maintain. We briefly describe the main aspects of the runtime implementation required to support precedence relations in OpenMP. We focus on the evaluation of the proposal through its use two benchmarks: NAS LU and ASCI Seep3dThis research has been supported by the Ministry of Science and Technology of Spain and the European Union (FEDER funds) under contract TIC2001-0995-C02-01.Peer ReviewedPostprint (author's final draft

    Algorithm-Level Optimizations for Scalable Parallel Graph Processing

    Get PDF
    Efficiently processing large graphs is challenging, since parallel graph algorithms suffer from poor scalability and performance due to many factors, including heavy communication and load-imbalance. Furthermore, it is difficult to express graph algorithms, as users need to understand and effectively utilize the underlying execution of the algorithm on the distributed system. The performance of graph algorithms depends not only on the characteristics of the system (such as latency, available RAM, etc.), but also on the characteristics of the input graph (small-world scalefree, mesh, long-diameter, etc.), and characteristics of the algorithm (sparse computation vs. dense communication). The best execution strategy, therefore, often heavily depends on the combination of input graph, system and algorithm. Fine-grained expression exposes maximum parallelism in the algorithm and allows the user to concentrate on a single vertex, making it easier to express parallel graph algorithms. However, this often loses information about the machine, making it difficult to extract performance and scalability from fine-grained algorithms. To address these issues, we present a model for expressing parallel graph algorithms using a fine-grained expression. Our model decouples the algorithm-writer from the underlying details of the system, graph, and execution and tuning of the algorithm. We also present various graph paradigms that optimize the execution of graph algorithms for various types of input graphs and systems. We show our model is general enough to allow graph algorithms to use the various graph paradigms for the best/fastest execution, and demonstrate good performance and scalability for various different graphs, algorithms, and systems to 100,000+ cores
    corecore