842 research outputs found

    Radiation modeling using the Uintah heterogeneous CPU/GPU runtime system

    Get PDF
    journal articleThe Uintah Computational Framework was developed to provide an environment for solving fluid-structure interaction problems on structured adaptive grids on large-scale, long-running, data-intensive problems. Uintah uses a combination of fluid-flow solvers and particle-based methods for solids, together with a novel asynchronous task-based approach with fully automated load balancing. Uintah demonstrates excellent weak and strong scalability at full machine capacity on XSEDE resources such as Ranger and Kraken, and through the use of a hybrid memory approach based on a combination of MPI and Pthreads, Uintah now runs on up to 262k cores on the DOE Jaguar system. In order to extend Uintah to heterogeneous systems, with ever-increasing CPU core counts and additional on-node GPUs, a new dynamic CPU-GPU task scheduler is designed and evaluated in this study. This new scheduler enables Uintah to fully exploit these architectures with support for asynchronous, out-of-order scheduling of both CPU and GPU computational tasks. A new runtime system has also been implemented with an added multi-stage queuing architecture for efficient scheduling of CPU and GPU tasks. This new runtime system automatically handles the details of asynchronous memory copies to and from the GPU and introduces a novel method of pre-fetching and preparing GPU memory prior to GPU task execution. In this study this new design is examined in the context of a developing, hierarchical GPU-based ray tracing radiation transport model that provides Uintah with additional capabilities for heat transfer and electromagnetic wave propagation. The capabilities of this new scheduler design are tested by running at large scale on the modern heterogeneous systems, Keeneland and TitanDev, with up to 360 and 960 GPUs respectively. On these systems, we demonstrate significant speedups per GPU against a standard CPU core for our radiation problem

    GPU-based Monte Carlo simulation for the design Sea Ice Load

    Get PDF
    Modern Graphics Processing Units (GPUs) with massive number of threads and many-core architectural components support both graphics and general purpose computing. NVIDIA’s compute unified device architecture (CUDA) takes advantage of parallel computing and utilizes the tremendous power of GPUs. The present study demonstrates a high performance computing (HPC) framework for a Monte Carlo simulation to determine design sea ice loads which is then implemented in both GPU and CPU (central processing unit). Results show a speedup of up to 130 times for the 4 Tesla K80 GPUs over an optimized CPU OpenMP (Open Multi-Processing) implementation and a speedup of up to 8 times for the 4 Tesla K80 over a single Tesla K80 GPU implementation. The elapsed time of the different implementations reduced from about 2.5 hours to 0.7 seconds

    Doctor of Philosophy

    Get PDF
    dissertationRecent trends in high performance computing present larger and more diverse computers using multicore nodes possibly with accelerators and/or coprocessors and reduced memory. These changes pose formidable challenges for applications code to attain scalability. Software frameworks that execute machine-independent applications code using a runtime system that shields users from architectural complexities oer a portable solution for easy programming. The Uintah framework, for example, solves a broad class of large-scale problems on structured adaptive grids using fluid-flow solvers coupled with particle-based solids methods. However, the original Uintah code had limited scalability as tasks were run in a predefined order based solely on static analysis of the task graph and used only message passing interface (MPI) for parallelism. By using a new hybrid multithread and MPI runtime system, this research has made it possible for Uintah to scale to 700K central processing unit (CPU) cores when solving challenging fluid-structure interaction problems. Those problems often involve moving objects with adaptive mesh refinement and thus with highly variable and unpredictable work patterns. This research has also demonstrated an ability to run capability jobs on the heterogeneous systems with Nvidia graphics processing unit (GPU) accelerators or Intel Xeon Phi coprocessors. The new runtime system for Uintah executes directed acyclic graphs of computational tasks with a scalable asynchronous and dynamic runtime system for multicore CPUs and/or accelerators/coprocessors on a node. Uintah's clear separation between application and runtime code has led to scalability increases without significant changes to application code. This research concludes that the adaptive directed acyclic graph (DAG)-based approach provides a very powerful abstraction for solving challenging multiscale multiphysics engineering problems. Excellent scalability with regard to the different processors and communications performance are achieved on some of the largest and most powerful computers available today

    Fast algorithm for real-time rings reconstruction

    Get PDF
    The GAP project is dedicated to study the application of GPU in several contexts in which real-time response is important to take decisions. The definition of real-time depends on the application under study, ranging from answer time of ÎĽs up to several hours in case of very computing intensive task. During this conference we presented our work in low level triggers [1] [2] and high level triggers [3] in high energy physics experiments, and specific application for nuclear magnetic resonance (NMR) [4] [5] and cone-beam CT [6]. Apart from the study of dedicated solution to decrease the latency due to data transport and preparation, the computing algorithms play an essential role in any GPU application. In this contribution, we show an original algorithm developed for triggers application, to accelerate the ring reconstruction in RICH detector when it is not possible to have seeds for reconstruction from external trackers

    libcloudph++ 0.2: single-moment bulk, double-moment bulk, and particle-based warm-rain microphysics library in C++

    Full text link
    This paper introduces a library of algorithms for representing cloud microphysics in numerical models. The library is written in C++, hence the name libcloudph++. In the current release, the library covers three warm-rain schemes: the single- and double-moment bulk schemes, and the particle-based scheme with Monte-Carlo coalescence. The three schemes are intended for modelling frameworks of different dimensionality and complexity ranging from parcel models to multi-dimensional cloud-resolving (e.g. large-eddy) simulations. A two-dimensional prescribed-flow framework is used in example simulations presented in the paper with the aim of highlighting the library features. The libcloudph++ and all its mandatory dependencies are free and open-source software. The Boost.units library is used for zero-overhead dimensional analysis of the code at compile time. The particle-based scheme is implemented using the Thrust library that allows to leverage the power of graphics processing units (GPU), retaining the possibility to compile the unchanged code for execution on single or multiple standard processors (CPUs). The paper includes complete description of the programming interface (API) of the library and a performance analysis including comparison of GPU and CPU setups.Comment: The library description has been updated to the new library API (i.e. v0.1 -> v0.2 update). The key difference is that the model state variables are now mixing ratios as opposed to densities. The particle-based scheme was supplemented with the "particle recycling" process. Numerous editorial corrections were mad

    GPU accelerated risk quantification

    Get PDF
    Factor Analysis of Information Risk (FAIR) is a standard model for quantitatively estimating cybersecurity risks and has been implemented as a sequential Monte Carlo simulation in the RiskLens and FAIR-U applications. Monte Carlo simulations employ random sampling techniques to model certain systems through the course of many iterations. Due to their sequential nature, FAIR simulations in these applications are limited in the number of iterations they can perform in a reasonable amount of time. One method that has been extensively used to speed up Monte Carlo simulations is to implement them to take advantage of the massive parallelization available when using modern Graphics Processing Units (GPUs). Such parallelized simulations have been shown to produce significant speedups, in some cases up to 3,000 times faster then the sequential versions. Due to the FAIR simulation\u27s need for many samples from various beta distributions, three methods of generating these samples via inverse transform sampling on the GPU are investigated. One method calculates the inverse incomplete beta function directly, and the other two methods approximate this function - trading accuracy for improved parallelism. This method is then utilized in a GPU accelerated implementation of the FAIR simulation from RiskLens and FAIR-U using NVIDIA\u27s CUDA technology

    A Framework for Management of Distributed Data Processing and Event Selection for the Icecube Neutrino Observatory

    Get PDF
    IceCube is a one-gigaton neutrino detector designed to detect high-energy cosmic neutrinos. It is located at the geographic South Pole and was completed at the end of 2010. Simulation and data processing for IceCube requires a significant amount of computational power. We describe the design and functionality of IceProd, a management system based on Python, XMLRPC, and GridFTP. It is driven by a central database in order to coordinate and administer production of simulations and processing of data produced by the IceCube detector upon arrival in the northern hemisphere. IceProd runs as a separate layer on top of existing middleware and can take advantage of a variety of computing resources including grids and batch systems such as GLite, Condor, NorduGrid, PBS, and SGE. This is accomplished by a set of dedicated daemons that process job submission in a coordinated fashion through the use of middleware plug-ins that serve to abstract the details of job submission and job management. IceProd fills a gap between the user and existing middleware by making job scripting easier and collaboratively sharing productions more efficiently. We describe the implementation and performance of an extension to the IceProd framework that provides support for mapping workflow diagrams or DAGs consisting of interdependent tasks to an IceProd job that can span across multiple grid or cluster sites. We look at some use-cases where this new extension allows for optimal allocation of computing resources and addresses general aspects of this design, including security, data integrity, scalability, and throughput

    Efficient Algorithms for Artificial Neural Networks and Explainable AI

    Get PDF
    Artificial neural networks have allowed some remarkable progress in fields such as pattern recognition and computer vision. However, the increasing complexity of artificial neural networks presents a challenge for efficient computation. In this thesis, we first introduce a novel matrix multiplication method to reduce the complexity of artificial neural networks, where we demonstrate its suitability to compress fully connected layers of artificial neural networks. Our method outperforms other state-of-the-art methods when tested on standard publicly available datasets. This thesis then focuses on Explainable AI, which can be critical in fields like finance and medicine, as it can provide explanations for some decisions taken by sub-symbolic AI models behaving like a black box such as Artificial neural networks and transformation based learning approaches. We have also developed a new framework that facilitates the use of Explainable AI with tabular datasets. Our new framework Exmed, enables nonexpert users to prepare data, train models, and apply Explainable AI techniques effectively.Additionally, we propose a new algorithm that identifies the overall influence of input features and minimises the perturbations that alter the decision taken by a given model. Overall, this thesis introduces innovative and comprehensive techniques to enhance the efficiency of fully connected layers in artificial neural networks and provide a new approach to explain their decisions. These methods have significant practical applications in various fields, including portable medical devices
    • …
    corecore