38 research outputs found

    Extreme scale parallel NBody algorithm with event driven constraint based execution model

    Get PDF
    Traditional scientific applications such as Computational Fluid Dynamics, Partial Differential Equations based numerical methods (like Finite Difference Methods, Finite Element Methods) achieve sufficient efficiency on state of the art high performance computing systems and have been widely studied / implemented using conventional programming models. For emerging application domains such as Graph applications scalability and efficiency is significantly constrained by the conventional systems and their supporting programming models. Furthermore technology trends like multicore, manycore, heterogeneous system architectures are introducing new challenges and possibilities. Emerging technologies are requiring a rethinking of approaches to more effectively expose the underlying parallelism to the applications and the end-users. This thesis explores the space of effective parallel execution of ephemeral graphs that are dynamically generated. The standard particle based simulation, solved using the Barnes-Hut algorithm is chosen to exemplify the dynamic workloads. In this thesis the workloads are expressed using sequential execution semantics, a conventional parallel programming model - shared memory semantics and semantics of an innovative execution model designed for efficient scalable performance towards Exascale computing called ParalleX. The main outcomes of this research are parallel processing of dynamic ephemeral workloads, enabling dynamic load balancing during runtime, and using advanced semantics for exposing parallelism in scaling constrained applications

    Parallel Processes in HPX: Designing an Infrastructure for Adaptive Resource Management

    Get PDF
    Advancement in cutting edge technologies have enabled better energy efficiency as well as scaling computational power for the latest High Performance Computing(HPC) systems. However, complexity, due to hybrid architectures as well as emerging classes of applications, have shown poor computational scalability using conventional execution models. Thus alternative means of computation, that addresses the bottlenecks in computation, is warranted. More precisely, dynamic adaptive resource management feature, both from systems as well as application\u27s perspective, is essential for better computational scalability and efficiency. This research presents and expands the notion of Parallel Processes as a placeholder for procedure definitions, targeted at one or more synchronous domains, meta data for computation and resource management as well as infrastructure for dynamic policy deployment. In addition to this, the research presents additional guidelines for a framework for resource management in HPX runtime system. Further, this research also lists design principles for scalability of Active Global Address Space (AGAS), a necessary feature for Parallel Processes. Also, to verify the usefulness of Parallel Processes, a preliminary performance evaluation of different task scheduling policies is carried out using two different applications. The applications used are: Unbalanced Tree Search, a reference dynamic graph application, implemented by this research in HPX and MiniGhost, a reference stencil based application using bulk synchronous parallel model. The results show that different scheduling policies provide better performance for different classes of applications; and for the same application class, in certain instances, one policy fared better than the others, while vice versa in other instances, hence supporting the hypothesis of the need of dynamic adaptive resource management infrastructure, for deploying different policies and task granularities, for scalable distributed computing

    Adaptive Data Migration in Load-Imbalanced HPC Applications

    Get PDF
    Distributed parallel applications need to maximize and maintain computer resource utilization and be portable across different machines. Balanced execution of some applications requires more effort than others because their data distribution changes over time. Data re-distribution at runtime requires elaborate schemes that are expensive and may benefit particular applications. This dissertation discusses a solution for HPX applications to monitor application execution with APEX and use AGAS migration to adaptively redistribute data and load balance applications at runtime to improve application performance and scaling behavior. This dissertation provides evidence for the practicality of using the Active Global Address Space as is proposed by the ParalleX model and implemented in HPX. It does so by using migration for the transparent moving of objects at runtime and using the Autonomic Performance Environment for eXascale library with experiments that run on homogeneous and heterogeneous machines at Louisiana State University, CSCS Swiss National Supercomputing Centre, and National Energy Research Scientific Computing Center

    Compiler and Runtime Optimization Techniques for Implementation Scalable Parallel Applications

    Get PDF
    The compiler is able to detect the data dependencies in an application and is able to analyze the specific sections of code for parallelization potential. However, all of these techniques provided by a compiler are usually applied at compile time, so they rely on static analysis, which is insufficient for achieving maximum parallelism and desired application scalability. These compiler techniques should consider both the static information gathered at compile time and dynamic analysis captured at runtime about the system to generate a safe parallel application. On the other hand, runtime information is often speculative. Solely relying on it doesn\u27t guarantee maximal parallel performance. So collecting information at compile time could significantly improve the runtime techniques performance. The goal is achieved in this research by introducing new techniques proposed for both compiler and runtime system that enable them to contribute with each other and utilize both static and dynamic analysis information to maximize application parallel performance. In the proposed framework, a compiler can implement dynamic runtime methods in its parallelization optimizations and a runtime system can apply static information in its parallelization methods implementation. The proposed techniques are able to use high-level programming abstractions and machine learning to relieve the programmer of difficult and tedious decisions that can significantly affect program behavior and performance

    Octo-Tiger: Binary star systems with HPX on Nvidia P100

    Get PDF
    Stellar mergers between two suns are a significant field of study since they can lead to astrophysical phenomena such as type Ia supernovae. Octo-Tiger simulates merging stars by computing self-gravitating astrophysical fluids. By relying on the high-level library HPX for parallelization and Vc for vectorization, Octo-Tiger combines high performance with ease of development. For accurate simulations, Octo-Tiger requires massive computational resources. To improve hardware utilization, we introduce a stencil-based approach for computing the gravitational field using the fast multipole method. This approach was tailored for machines with wide vector units like Intel's Knights Landing or modern GPUs. Our implementation targets AVX512 enabled processors and is backward compatible with older vector extensions (AVX2, AVX, SSE). We further extended our approach to make use of available NVIDIA GPUs as coprocessors. We developed a tasking system that processes critical compute kernels on the GPU or the processor, depending on their utilization. Using the stencil-based fast multipole method, we gain a consistent speedup on all platforms, over the classical interaction-list-based implementation. On an Intel Xeon Phi 7210, we achieve a speedup of 1.9x. On a heterogeneous node with an Intel Xeon E5-2690 v3, we can obtain a speedup of 1.46x by adding an NVIDIA P100 GPU
    corecore