730 research outputs found

    A GPU-accelerated package for simulation of flow in nanoporous source rocks with many-body dissipative particle dynamics

    Full text link
    Mesoscopic simulations of hydrocarbon flow in source shales are challenging, in part due to the heterogeneous shale pores with sizes ranging from a few nanometers to a few micrometers. Additionally, the sub-continuum fluid-fluid and fluid-solid interactions in nano- to micro-scale shale pores, which are physically and chemically sophisticated, must be captured. To address those challenges, we present a GPU-accelerated package for simulation of flow in nano- to micro-pore networks with a many-body dissipative particle dynamics (mDPD) mesoscale model. Based on a fully distributed parallel paradigm, the code offloads all intensive workloads on GPUs. Other advancements, such as smart particle packing and no-slip boundary condition in complex pore geometries, are also implemented for the construction and the simulation of the realistic shale pores from 3D nanometer-resolution stack images. Our code is validated for accuracy and compared against the CPU counterpart for speedup. In our benchmark tests, the code delivers nearly perfect strong scaling and weak scaling (with up to 512 million particles) on up to 512 K20X GPUs on Oak Ridge National Laboratory's (ORNL) Titan supercomputer. Moreover, a single-GPU benchmark on ORNL's SummitDev and IBM's AC922 suggests that the host-to-device NVLink can boost performance over PCIe by a remarkable 40\%. Lastly, we demonstrate, through a flow simulation in realistic shale pores, that the CPU counterpart requires 840 Power9 cores to rival the performance delivered by our package with four V100 GPUs on ORNL's Summit architecture. This simulation package enables quick-turnaround and high-throughput mesoscopic numerical simulations for investigating complex flow phenomena in nano- to micro-porous rocks with realistic pore geometries

    GPU Computing for Cognitive Robotics

    Get PDF
    This thesis presents the first investigation of the impact of GPU computing on cognitive robotics by providing a series of novel experiments in the area of action and language acquisition in humanoid robots and computer vision. Cognitive robotics is concerned with endowing robots with high-level cognitive capabilities to enable the achievement of complex goals in complex environments. Reaching the ultimate goal of developing cognitive robots will require tremendous amounts of computational power, which was until recently provided mostly by standard CPU processors. CPU cores are optimised for serial code execution at the expense of parallel execution, which renders them relatively inefficient when it comes to high-performance computing applications. The ever-increasing market demand for high-performance, real-time 3D graphics has evolved the GPU into a highly parallel, multithreaded, many-core processor extraordinary computational power and very high memory bandwidth. These vast computational resources of modern GPUs can now be used by the most of the cognitive robotics models as they tend to be inherently parallel. Various interesting and insightful cognitive models were developed and addressed important scientific questions concerning action-language acquisition and computer vision. While they have provided us with important scientific insights, their complexity and application has not improved much over the last years. The experimental tasks as well as the scale of these models are often minimised to avoid excessive training times that grow exponentially with the number of neurons and the training data. This impedes further progress and development of complex neurocontrollers that would be able to take the cognitive robotics research a step closer to reaching the ultimate goal of creating intelligent machines. This thesis presents several cases where the application of the GPU computing on cognitive robotics algorithms resulted in the development of large-scale neurocontrollers of previously unseen complexity enabling the conducting of the novel experiments described herein.European Commission Seventh Framework Programm

    The EU Center of Excellence for Exascale in Solid Earth (ChEESE): Implementation, results, and roadmap for the second phase

    Get PDF
    publishedVersio

    Astro - A Low-Cost, Low-Power Cluster for CPU-GPU Hybrid Computing using the Jetson TK1

    Get PDF
    With the rising costs of large scale distributed systems many researchers have began looking at utilizing low power architectures for clusters. In this paper, we describe our Astro cluster, which consists of 46 NVIDIA Jetson TK1 nodes each equipped with an ARM Cortex A15 CPU, 192 core Kepler GPU, 2 GB of RAM, and 16 GB of flash storage. The cluster has a number of advantages when compared to conventional clusters including lower power usage, ambient cooling, shared memory between the CPU and GPU, and affordability. The cluster is built using commodity hardware and can be setup for relatively low costs while providing up to 190 single precision GFLOPS of computing power per node due to its combined GPU/CPU architecture. The cluster currently uses one 48-port Gigabit Ethernet switch and runs Linux for Tegra, a modified version of Ubuntu provided by NVIDIA as its operating system. Common file systems such as PVFS, Ceph, and NFS are supported by the cluster and benchmarks such as HPL, LAPACK, and LAMMPS are used to evaluate the system. At peak performance, the cluster is able to produce 328 GFLOPS of double precision and a peak of 810W using the LINPACK benchmark placing the cluster at 324th place on the Green500. Single precision benchmarks result in a peak performance of 6800 GFLOPs. The Astro cluster aims to be a proof-of-concept for future low power clusters that utilize a similar architecture. The cluster is installed with many of the same applications used by top supercomputers and is validated using the several standard supercomputing benchmarks. We show that with the rise of low-power CPUs and GPUs, and the need for lower server costs, this cluster provides insight into how ARM and CPU-GPU hybrid chips will perform in high-performance computing
    corecore