198 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationRecent trends in high performance computing present larger and more diverse computers using multicore nodes possibly with accelerators and/or coprocessors and reduced memory. These changes pose formidable challenges for applications code to attain scalability. Software frameworks that execute machine-independent applications code using a runtime system that shields users from architectural complexities oer a portable solution for easy programming. The Uintah framework, for example, solves a broad class of large-scale problems on structured adaptive grids using fluid-flow solvers coupled with particle-based solids methods. However, the original Uintah code had limited scalability as tasks were run in a predefined order based solely on static analysis of the task graph and used only message passing interface (MPI) for parallelism. By using a new hybrid multithread and MPI runtime system, this research has made it possible for Uintah to scale to 700K central processing unit (CPU) cores when solving challenging fluid-structure interaction problems. Those problems often involve moving objects with adaptive mesh refinement and thus with highly variable and unpredictable work patterns. This research has also demonstrated an ability to run capability jobs on the heterogeneous systems with Nvidia graphics processing unit (GPU) accelerators or Intel Xeon Phi coprocessors. The new runtime system for Uintah executes directed acyclic graphs of computational tasks with a scalable asynchronous and dynamic runtime system for multicore CPUs and/or accelerators/coprocessors on a node. Uintah's clear separation between application and runtime code has led to scalability increases without significant changes to application code. This research concludes that the adaptive directed acyclic graph (DAG)-based approach provides a very powerful abstraction for solving challenging multiscale multiphysics engineering problems. Excellent scalability with regard to the different processors and communications performance are achieved on some of the largest and most powerful computers available today

    Master of Science

    Get PDF
    thesisRecent advancements in High Performance Computing (HPC) infrastructure with tradi- tional computing systems augmented with accelerators like graphic processing units (GPUs) and coprocessors like Intel Xeon Phi have successfully enabled predictive simulations specifi- cally Computational Fluid Dynamics (CFD) with more accuracy and speed. One of the most significant challenges in high-performance computing is to provide a software framework that can scale efficiently and minimize rewriting code to support diverse hardware configurations. Algorithms and framework support have been developed to deal with complexities and provide abstractions for a task to be compatible with various hardware targets. Software is written in C++ and represented as a Directed Acyclic Graph (DAG) with nodes that implement actual mathematical calculations. This thesis will present an improved approach for scheduling and execution of computational tasks within a heterogeneous CPU-GPU com- puting system insulting application developers with the inherent complexity in parallelism. The details will be presented within a context to facilitate the solution of partial differential equations on large clusters using graph theory

    DAG-based software frameworks for PDEs

    Get PDF
    pre-printThe task-based approach to software and parallelism is well-known and has been proposed as a potential candidate, named the silver model, for exas-cale software. This approach is not yet widely used in the large-scale multi-core parallel computing of complex systems of partial differential equations. After surveying task-based approaches we investigate how well the Uintah software and an extension named Wasatch fit in the task-based paradigm and how well they perform on large scale parallel computers. The conclusion is that these approaches show great promise for petascale but that considerable algorithmic challenges remain

    Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system

    Get PDF
    journal articleThe Uintah Computational Framework was developed to provide an environment for solving fluid-structure interaction problems on structured adaptive grids on large-scale, long-running, data-intensive problems. Uintah uses a combination of fluid-flow solvers and particle-based methods for solids, together with a novel asynchronous task-based approach with fully automated load balancing. Uintah demonstrates excellent weak and strong scalability at full machine capacity on XSEDE resources such as Ranger and Kraken, and through the use of a hybrid memory approach based on a combination of MPI and Pthreads, Uintah now runs on up to 262k cores on the DOE Jaguar system. In order to extend Uintah to heterogeneous systems, with ever-increasing CPU core counts and additional on-node GPUs, a new dynamic CPU-GPU task scheduler is designed and evaluated in this study. This new scheduler enables Uintah to fully exploit these architectures with support for asynchronous, out-of-order scheduling of both CPU and GPU computational tasks. A new runtime system has also been implemented with an added multi-stage queuing architecture for efficient scheduling of CPU and GPU tasks. This new runtime system automatically handles the details of asynchronous memory copies to and from the GPU and introduces a novel method of pre-fetching and preparing GPU memory prior to GPU task execution. In this study this new design is examined in the context of a developing, hierarchical GPU-based ray tracing radiation transport model that provides Uintah with additional capabilities for heat transfer and electromagnetic wave propagation. The capabilities of this new scheduler design are tested by running at large scale on the modern heterogeneous systems, Keeneland and TitanDev, with up to 360 and 960 GPUs respectively. On these systems, we demonstrate significant speedups per GPU against a standard CPU core for our radiation problem

    Investigating applications portability with the Uintah DAG-based runtime system on PetaScale supercomputers

    Get PDF
    pre-printPresent trends in high performance computing present formidable challenges for applications code using multicore nodes possibly with accelerators and/or co-processors and reduced memory while still attaining scalability. Software frameworks that execute machine-independent applications code using a runtime system that shields users from architectural complexities offer a possible solution. The Uintah framework for example, solves a broad class of large-scale problems on structured adaptive grids using fluid-flow solvers coupled with particle-based solids methods. Uintah executes directed acyclic graphs of computational tasks with a scalable asynchronous and dynamic runtime system for CPU cores and/or accelerators/coprocessors on a node. Uintah's clear separation between application and runtime code has led to scalability increases of 1000x without significant changes to application code. This methodology is tested on three leading Top500 machines; OLCF Titan, TACC Stampede and ALCF Mira using three diverse and challenging applications problems. This investigation of scalability with regard to the different processors and communications performance leads to the overall conclusion that the adaptive DAG-based approach provides a very powerful abstraction for solving challenging multi-scale multi-physics engineering problems on some of the largest and most powerful computers available today

    Performance analysis integration in the Uintah software development cycle

    Get PDF
    ManuscriptThe increasing complexity of high-performance computing environments and programming methodologies presents challenges for empirical performance evaluation. Evolving parallel and distributed systems require performance technology that can be flexibly configured to observe different events and associated performance data of interest. It must also be possible to integrate performance evaluation techniques with the programming paradigms and software engineering methods. This is particularly important for tracking performance on parallel software projects involving many code teams over many stages of development. This paper describes the integration of the TAU and XPARE tools in the Uintah Computational Framework (UCF). Discussed is the use of performance mapping techniques to associate low-level performance data to higher levels of abstraction in UCF and the use of performance regression testing to provides a historical portfolio of the evolution of application performance. A scalability study shows the benefits of integrating performance technology in building large-scale parallel applications

    A survey of high level frameworks in block-structured adaptive mesh refinement packages

    Get PDF
    pre-printOver the last decade block-structured adaptive mesh refinement (SAMR) has found increasing use in large, publicly available codes and frameworks. SAMR frameworks have evolved along different paths. Some have stayed focused on specific domain areas, others have pursued a more general functionality, providing the building blocks for a larger variety of applications. In this survey paper we examine a representative set of SAMR packages and SAMR-based codes that have been in existence for half a decade or more, have a reasonably sized and active user base outside of their home institutions, and are publicly available. The set consists of a mix of SAMR packages and application codes that cover a broad range of scientific domains. We look at their high-level frameworks, their design trade-offs and their approach to dealing with the advent of radical changes in hardware architecture. The codes included in this survey are BoxLib, Cactus, Chombo, Enzo, FLASH, and Uintah

    Systematic debugging methods for large-scale HPC computational frameworks

    Get PDF
    pre-printParallel computational frameworks for high-performance computing are central to the advancement of simulation-based studies in science and engineering. Finding and fixing bugs in these frameworks can be time consuming. If left unchecked, these bugs diminish the amount of new science performed. A systematic study of the Uintah Computational Framework investigates debugging approaches, leveraging the framework's modular structure

    Doctor of Philosophy

    Get PDF
    dissertationSolutions to Partial Di erential Equations (PDEs) are often computed by discretizing the domain into a collection of computational elements referred to as a mesh. This solution is an approximation with an error that decreases as the mesh spacing decreases. However, decreasing the mesh spacing also increases the computational requirements. Adaptive mesh re nement (AMR) attempts to reduce the error while limiting the increase in computational requirements by re ning the mesh locally in regions of the domain that have large error while maintaining a coarse mesh in other portions of the domain. This approach often provides a solution that is as accurate as that obtained from a much larger xed mesh simulation, thus saving on both computational time and memory. However, historically, these AMR operations often limit the overall scalability of the application. Adapting the mesh at runtime necessitates scalable regridding and load balancing algorithms. This dissertation analyzes the performance bottlenecks for a widely used regridding algorithm and presents two new algorithms which exhibit ideal scalability. In addition, a scalable space- lling curve generation algorithm for dynamic load balancing is also presented. The performance of these algorithms is analyzed by determining their theoretical complexity, deriving performance models, and comparing the observed performance to those performance models. The models are then used to predict performance on larger numbers of processors. This analysis demonstrates the necessity of these algorithms at larger numbers of processors. This dissertation also investigates methods to more accurately predict workloads based on measurements taken at runtime. While the methods used are not new, the application of these methods to the load balancing process is. These methods are shown to be highly accurate and able to predict the workload within 3% error. By improving the accuracy of these estimations, the load imbalance of the simulation can be reduced, thereby increasing the overall performance

    Master of Science

    Get PDF
    thesisAt the beginning of the 21st century, it became apparent that the performance gains associated with continual die shrinks and the resulting increases in core central processing unit (CPU) speeds were beginning to flatten. This realization has gradually shifted the focus of CPU design away from single core speed increases and toward the idea of obtaining performance through increased concurrency. The resulting design paradigm has given us multi- and many-core CPUs, vector processing units, and more recently, programmable, massively parallel hardware coprocessors, such as graphics processing units from nVidia and Advanced Micro Devices, along with more recent general purpose devices such as Intel's "Knights Corner." One of the most significant resulting challenges in high-performance computing is to provide a framework in which the software development process is platform agnostic to its end users, while at the same time being capable of scaling efficiently on diverse hardware configurations. This thesis will present an improved approach for the analysis and scheduling of computational tasks within a heterogeneous hardware environment, while removing implementation details from end users. This will be presented within the context of the "Expressions" framework, a component within a computational fluid dynamics solver, known as "Wasatch," developed at the University of Utah
    • …
    corecore