Search CORE

730 research outputs found

A GPU-accelerated package for simulation of flow in nanoporous source rocks with many-body dissipative particle dynamics

Author: Andrew Matthew
Blumers Ansel
Deo Milind
Goral Jan
Huang Hai
Kane Joshua
Li Zhen
Luo Lixiang
Tang Yu-Hang
Xia Yidong
Publication venue: 'Elsevier BV'
Publication date: 25/03/2019
Field of study

Mesoscopic simulations of hydrocarbon flow in source shales are challenging, in part due to the heterogeneous shale pores with sizes ranging from a few nanometers to a few micrometers. Additionally, the sub-continuum fluid-fluid and fluid-solid interactions in nano- to micro-scale shale pores, which are physically and chemically sophisticated, must be captured. To address those challenges, we present a GPU-accelerated package for simulation of flow in nano- to micro-pore networks with a many-body dissipative particle dynamics (mDPD) mesoscale model. Based on a fully distributed parallel paradigm, the code offloads all intensive workloads on GPUs. Other advancements, such as smart particle packing and no-slip boundary condition in complex pore geometries, are also implemented for the construction and the simulation of the realistic shale pores from 3D nanometer-resolution stack images. Our code is validated for accuracy and compared against the CPU counterpart for speedup. In our benchmark tests, the code delivers nearly perfect strong scaling and weak scaling (with up to 512 million particles) on up to 512 K20X GPUs on Oak Ridge National Laboratory's (ORNL) Titan supercomputer. Moreover, a single-GPU benchmark on ORNL's SummitDev and IBM's AC922 suggests that the host-to-device NVLink can boost performance over PCIe by a remarkable 40\%. Lastly, we demonstrate, through a flow simulation in realistic shale pores, that the CPU counterpart requires 840 Power9 cores to rival the performance delivered by our package with four V100 GPUs on ORNL's Summit architecture. This simulation package enables quick-turnaround and high-throughput mesoscopic numerical simulations for investigating complex flow phenomena in nano- to micro-porous rocks with realistic pore geometries

arXiv.org e-Print Archive

GPU Computing for Cognitive Robotics

Author: Peniak Martin
Publication venue: Plymouth University
Publication date: 01/01/2014
Field of study

This thesis presents the first investigation of the impact of GPU computing on cognitive robotics by providing a series of novel experiments in the area of action and language acquisition in humanoid robots and computer vision. Cognitive robotics is concerned with endowing robots with high-level cognitive capabilities to enable the achievement of complex goals in complex environments. Reaching the ultimate goal of developing cognitive robots will require tremendous amounts of computational power, which was until recently provided mostly by standard CPU processors. CPU cores are optimised for serial code execution at the expense of parallel execution, which renders them relatively inefficient when it comes to high-performance computing applications. The ever-increasing market demand for high-performance, real-time 3D graphics has evolved the GPU into a highly parallel, multithreaded, many-core processor extraordinary computational power and very high memory bandwidth. These vast computational resources of modern GPUs can now be used by the most of the cognitive robotics models as they tend to be inherently parallel. Various interesting and insightful cognitive models were developed and addressed important scientific questions concerning action-language acquisition and computer vision. While they have provided us with important scientific insights, their complexity and application has not improved much over the last years. The experimental tasks as well as the scale of these models are often minimised to avoid excessive training times that grow exponentially with the number of neurons and the training data. This impedes further progress and development of complex neurocontrollers that would be able to take the cognitive robotics research a step closer to reaching the ultimate goal of creating intelligent machines. This thesis presents several cases where the application of the GPU computing on cognitive robotics algorithms resulted in the development of large-scale neurocontrollers of previously unseen complexity enabling the conducting of the novel experiments described herein.European Commission Seventh Framework Programm

Plymouth Electronic Archive and Research Library

The EU Center of Excellence for Exascale in Solid Earth (ChEESE): Implementation, results, and roadmap for the second phase

Author: Abril Claudia
Afanasiev Michael
Amati Giorgio
Aniko Wirp Sara
Bader Michael
Badia Rosa M.
Barsotti Sara
Basili Roberto
Bayraktar Hafize B.
Bernardi Fabrizio
Boehm Christian
Brizuela Beatriz
Brogi Federico
Cabrera Eduardo
Casarotti Emanuele
Castro Manuel J.
Cerminara Matteo
Cheptsov Alexey
Cirella Antonella
Conejero Javier
Costa Antonio
de la Asunción Marc
de la Puente Josep
Djuric Marco
Dorozhinskii Ravil
Espinosa Gabriela
Esposti-Ongaro Tomaso
Farnós Joan
Favretto-Cristini Nathalie
Fichtner Andreas
Folch Arnau
Fournier Alexandre
Gabriel Alice-Agnes
Gallard Jean-Matthieu
Gibbons Steven John
Glimsdal Sylfest
González-Vida José Manuel
Gracia Jose
Gregorio Rose
Gutierrez Natalia
Halldorsson Benedikt
Hamitou Okba
Houzeaux Guillaume
Jaure Stephan
Kessar Mouloud
Krenz Lukas
Krischer Lion
Laforet Soline
Lanucara Piero
Li Bo
Lorenzino Maria Concetta
Lorito Stefano
Løvholt Finn
Macedonio Giovanni
Macías Jorge
Martínez Montesinos Beatriz
Marín Guillermo
Mingari Leonardo
Moguilny Geneviève
Montellier Vadim
Monterrubio-Velasco Marisol
Moulard Georges Emmanuel
Nagaso Masaru
Nazaria Massimo
Niethammer Christoph
Pardini Federica
Pienkowska Marta
Pizzimenti Luca
Poiata Natalia
Rannabauer Leonhard
Rodriguez Juan Esteban
Rojas Otilio
Romano Fabrizio
Rudyy Oleksandr
Ruggiero Vittorio
Samfass Philipp
Sanchez Sabrina
Sandri Laura
Scala Antonio
Schaeffer Nathanael
Schuchart Joseph
Selva Jacopo
Sergeant Amadine
Stallone Angela
Sánchez-Linares Carlos
Taroni Matteo
Thrastarson Soelvi
Titos Manuel
Tonelllo Nadia
Tonini Roberto
Ulrich Thomas
Vilotte Jean-Pierre
Volpe Manuela
Vöge Malte
Wössner Uwe
Publication venue
Publication date: 01/01/2023
Field of study

publishedVersio

HAL AMU

Norwegian Geotechnical Institute (NGI) Digital Archive

Astro - A Low-Cost, Low-Power Cluster for CPU-GPU Hybrid Computing using the Jetson TK1

Author: Sheen Sean Kai
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2016
Field of study

With the rising costs of large scale distributed systems many researchers have began looking at utilizing low power architectures for clusters. In this paper, we describe our Astro cluster, which consists of 46 NVIDIA Jetson TK1 nodes each equipped with an ARM Cortex A15 CPU, 192 core Kepler GPU, 2 GB of RAM, and 16 GB of flash storage. The cluster has a number of advantages when compared to conventional clusters including lower power usage, ambient cooling, shared memory between the CPU and GPU, and affordability. The cluster is built using commodity hardware and can be setup for relatively low costs while providing up to 190 single precision GFLOPS of computing power per node due to its combined GPU/CPU architecture. The cluster currently uses one 48-port Gigabit Ethernet switch and runs Linux for Tegra, a modified version of Ubuntu provided by NVIDIA as its operating system. Common file systems such as PVFS, Ceph, and NFS are supported by the cluster and benchmarks such as HPL, LAPACK, and LAMMPS are used to evaluate the system. At peak performance, the cluster is able to produce 328 GFLOPS of double precision and a peak of 810W using the LINPACK benchmark placing the cluster at 324th place on the Green500. Single precision benchmarks result in a peak performance of 6800 GFLOPs. The Astro cluster aims to be a proof-of-concept for future low power clusters that utilize a similar architecture. The cluster is installed with many of the same applications used by top supercomputers and is validated using the several standard supercomputing benchmarks. We show that with the rise of low-power CPUs and GPUs, and the need for lower server costs, this cluster provides insight into how ARM and CPU-GPU hybrid chips will perform in high-performance computing