Search CORE

2,956 research outputs found

Anelastic sensitivity kernels with parsimonious storage for adjoint tomography and full waveform inversion

Author: Bozdag Ebru
de Andrade Elliott Sales
Komatitsch Dimitri
Liu Qinya
Peter Daniel
Tromp Jeroen
Xie Zhinan
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

We introduce a technique to compute exact anelastic sensitivity kernels in the time domain using parsimonious disk storage. The method is based on a reordering of the time loop of time-domain forward/adjoint wave propagation solvers combined with the use of a memory buffer. It avoids instabilities that occur when time-reversing dissipative wave propagation simulations. The total number of required time steps is unchanged compared to usual acoustic or elastic approaches. The cost is reduced by a factor of 4/3 compared to the case in which anelasticity is partially accounted for by accommodating the effects of physical dispersion. We validate our technique by performing a test in which we compare the

K_\alpha

sensitivity kernel to the exact kernel obtained by saving the entire forward calculation. This benchmark confirms that our approach is also exact. We illustrate the importance of including full attenuation in the calculation of sensitivity kernels by showing significant differences with physical-dispersion-only kernels

arXiv.org e-Print Archive

vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design

Author: Clemons Jason
Gimelshein Natalia
Keckler Stephen W.
Rhu Minsoo
Zulfiqar Arslan
Publication venue
Publication date: 28/07/2016
Field of study

The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU. This restriction hampers a researcher's flexibility to study different machine learning algorithms, forcing them to either use a less desirable network architecture or parallelize the processing across multiple GPUs. We propose a runtime memory manager that virtualizes the memory usage of DNNs such that both GPU and CPU memory can simultaneously be utilized for training larger DNNs. Our virtualized DNN (vDNN) reduces the average GPU memory usage of AlexNet by up to 89%, OverFeat by 91%, and GoogLeNet by 95%, a significant reduction in memory requirements of DNNs. Similar experiments on VGG-16, one of the deepest and memory hungry DNNs to date, demonstrate the memory-efficiency of our proposal. vDNN enables VGG-16 with batch size 256 (requiring 28 GB of memory) to be trained on a single NVIDIA Titan X GPU card containing 12 GB of memory, with 18% performance loss compared to a hypothetical, oracular GPU with enough memory to hold the entire DNN.Comment: Published as a conference paper at the 49th IEEE/ACM International Symposium on Microarchitecture (MICRO-49), 201

arXiv.org e-Print Archive

Crossref

포항공과대학교

Recommended from our members

Experiences in porting mini-applications to OpenACC and OpenMP on heterogeneous systems

Author: Budiardja RD
Daley C
Gayatri R
Hernandez O
Joubert W
Vergara Larrea VG
Publication venue: eScholarship, University of California
Publication date: 25/10/2020
Field of study

This article studies mini-applications—Minisweep, GenASiS, GPP, and FF—that use computational methods commonly encountered in HPC. We have ported these applications to develop OpenACC and OpenMP versions, and evaluated their performance on Titan (Cray XK7 with K20x GPUs), Cori (Cray XC40 with Intel KNL), Summit (IBM AC922 with Volta GPUs), and Cori-GPU (Cray CS-Storm 500NX with Intel Skylake and Volta GPUs). Our goals are for these new ports to be useful to both application and compiler developers, to document and describe the lessons learned and the methodology to create optimized OpenMP and OpenACC versions, and to provide a description of possible migration paths between the two specifications. Cases where specific directives or code patterns result in improved performance for a given architecture are highlighted. We also include discussions of the functionality and maturity of the latest compilers available on the above platforms with respect to OpenACC or OpenMP implementations

eScholarship - University of California

Hera-JVM: abstracting processor heterogeneity behind a virtual machine

Author: McIlroy R.
Sventek J.
Publication venue
Publication date: 01/03/2009
Field of study

Heterogeneous multi-core processors, such as the Cell processor, can deliver exceptional performance, however, they are notoriously difficult to program effectively. We present Hera-JVM, a runtime system which hides a processor’s heterogeneity behind a homogeneous virtual machine interface. Preliminary results of three benchmarks running under Hera-JVM are presented. These results suggest a set of application behaviour characteristics that the runtime system should take into account when placing threads on different core types.

CiteSeerX

Enlighten

Solving the global atmospheric equations through heterogeneous reconfigurable platforms

Author: Fu H
Gan L
Huang X
Luk W
Xue W
Yang C
Yang G
Zhang Y
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/03/2014
Field of study

Spiral - Imperial College Digital Repository