730 research outputs found
A GPU-accelerated package for simulation of flow in nanoporous source rocks with many-body dissipative particle dynamics
Mesoscopic simulations of hydrocarbon flow in source shales are challenging,
in part due to the heterogeneous shale pores with sizes ranging from a few
nanometers to a few micrometers. Additionally, the sub-continuum fluid-fluid
and fluid-solid interactions in nano- to micro-scale shale pores, which are
physically and chemically sophisticated, must be captured. To address those
challenges, we present a GPU-accelerated package for simulation of flow in
nano- to micro-pore networks with a many-body dissipative particle dynamics
(mDPD) mesoscale model. Based on a fully distributed parallel paradigm, the
code offloads all intensive workloads on GPUs. Other advancements, such as
smart particle packing and no-slip boundary condition in complex pore
geometries, are also implemented for the construction and the simulation of the
realistic shale pores from 3D nanometer-resolution stack images. Our code is
validated for accuracy and compared against the CPU counterpart for speedup. In
our benchmark tests, the code delivers nearly perfect strong scaling and weak
scaling (with up to 512 million particles) on up to 512 K20X GPUs on Oak Ridge
National Laboratory's (ORNL) Titan supercomputer. Moreover, a single-GPU
benchmark on ORNL's SummitDev and IBM's AC922 suggests that the host-to-device
NVLink can boost performance over PCIe by a remarkable 40\%. Lastly, we
demonstrate, through a flow simulation in realistic shale pores, that the CPU
counterpart requires 840 Power9 cores to rival the performance delivered by our
package with four V100 GPUs on ORNL's Summit architecture. This simulation
package enables quick-turnaround and high-throughput mesoscopic numerical
simulations for investigating complex flow phenomena in nano- to micro-porous
rocks with realistic pore geometries
GPU Computing for Cognitive Robotics
This thesis presents the first investigation of the impact of GPU
computing on cognitive robotics by providing a series of novel experiments in
the area of action and language acquisition in humanoid robots and computer
vision. Cognitive robotics is concerned with endowing robots with high-level
cognitive capabilities to enable the achievement of complex goals in complex
environments. Reaching the ultimate goal of developing cognitive robots will
require tremendous amounts of computational power, which was until
recently provided mostly by standard CPU processors. CPU cores are
optimised for serial code execution at the expense of parallel execution, which
renders them relatively inefficient when it comes to high-performance
computing applications. The ever-increasing market demand for
high-performance, real-time 3D graphics has evolved the GPU into a highly
parallel, multithreaded, many-core processor extraordinary computational
power and very high memory bandwidth. These vast computational resources
of modern GPUs can now be used by the most of the cognitive robotics models
as they tend to be inherently parallel. Various interesting and insightful
cognitive models were developed and addressed important scientific questions
concerning action-language acquisition and computer vision. While they have
provided us with important scientific insights, their complexity and
application has not improved much over the last years. The experimental
tasks as well as the scale of these models are often minimised to avoid
excessive training times that grow exponentially with the number of neurons
and the training data. This impedes further progress and development of
complex neurocontrollers that would be able to take the cognitive robotics
research a step closer to reaching the ultimate goal of creating intelligent
machines. This thesis presents several cases where the application of the GPU
computing on cognitive robotics algorithms resulted in the development of
large-scale neurocontrollers of previously unseen complexity enabling the
conducting of the novel experiments described herein.European Commission Seventh Framework
Programm
The EU Center of Excellence for Exascale in Solid Earth (ChEESE): Implementation, results, and roadmap for the second phase
publishedVersio
Astro - A Low-Cost, Low-Power Cluster for CPU-GPU Hybrid Computing using the Jetson TK1
With the rising costs of large scale distributed systems many researchers have began looking at utilizing low power architectures for clusters. In this paper, we describe our Astro cluster, which consists of 46 NVIDIA Jetson TK1 nodes each equipped with an ARM Cortex A15 CPU, 192 core Kepler GPU, 2 GB of RAM, and 16 GB of flash storage. The cluster has a number of advantages when compared to conventional clusters including lower power usage, ambient cooling, shared memory between the CPU and GPU, and affordability. The cluster is built using commodity hardware and can be setup for relatively low costs while providing up to 190 single precision GFLOPS of computing power per node due to its combined GPU/CPU architecture. The cluster currently uses one 48-port Gigabit Ethernet switch and runs Linux for Tegra, a modified version of Ubuntu provided by NVIDIA as its operating system. Common file systems such as PVFS, Ceph, and NFS are supported by the cluster and benchmarks such as HPL, LAPACK, and LAMMPS are used to evaluate the system. At peak performance, the cluster is able to produce 328 GFLOPS of double precision and a peak of 810W using the LINPACK benchmark placing the cluster at 324th place on the Green500. Single precision benchmarks result in a peak performance of 6800 GFLOPs. The Astro cluster aims to be a proof-of-concept for future low power clusters that utilize a similar architecture. The cluster is installed with many of the same applications used by top supercomputers and is validated using the several standard supercomputing benchmarks. We show that with the rise of low-power CPUs and GPUs, and the need for lower server costs, this cluster provides insight into how ARM and CPU-GPU hybrid chips will perform in high-performance computing
- …