4,833 research outputs found
Correcting soft errors online in fast fourier transform
While many algorithm-based fault tolerance (ABFT) schemes have been proposed to detect soft errors offline in the fast Fourier transform (FFT) after computation finishes, none of the existing ABFT schemes detect soft errors online before the computation finishes. This paper presents an online ABFT scheme for FFT so that soft errors can be detected online and the corrupted computation can be terminated in a much more timely manner. We also extend our scheme to tolerate both arithmetic errors and memory errors, develop strategies to reduce its fault tolerance overhead and improve its numerical stability and fault coverage, and finally incorporate it into the widely used FFTW library - one of the today's fastest FFT software implementations. Experimental results demonstrate that: (1) the proposed online ABFT scheme introduces much lower overhead than the existing offline ABFT schemes; (2) it detects errors in a much more timely manner; and (3) it also has higher numerical stability and better fault coverage
GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics
We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh
Refinement code), which has adopted a novel approach to improve the performance
of adaptive mesh refinement (AMR) astrophysical simulations by a large factor
with the use of the graphic processing unit (GPU). The AMR implementation is
based on a hierarchy of grid patches with an oct-tree data structure. We adopt
a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a
multi-level relaxation scheme for the Poisson solver. Both solvers have been
implemented in GPU, by which hundreds of patches can be advanced in parallel.
The computational overhead associated with the data transfer between CPU and
GPU is carefully reduced by utilizing the capability of asynchronous memory
copies in GPU, and the computing time of the ghost-zone values for each patch
is made to diminish by overlapping it with the GPU computations. We demonstrate
the accuracy of the code by performing several standard test problems in
astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster
system. We measure the performance of the code by performing purely-baryonic
cosmological simulations in different hardware implementations, in which
detailed timing analyses provide comparison between the computations with and
without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are
demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with
8192^3 effective resolution, respectively.Comment: 60 pages, 22 figures, 3 tables. More accuracy tests are included.
Accepted for publication in ApJ
Synthetic aperture radar signal processing on the MPP
Satellite-borne Synthetic Aperture Radars (SAR) sense areas of several thousand square kilometers in seconds and transmit phase history signal data several tens of megabits per second. The Shuttle Imaging Radar-B (SIR-B) has a variable swath of 20 to 50 km and acquired data over 100 kms along track in about 13 seconds. With the simplification of separability of the reference function, the processing still requires considerable resources; high speed I/O, large memory and fast computation. Processing systems with regular hardware take hours to process one Seasat image and about one hour for a SIR-B image. Bringing this processing time closer to acquisition times requires an end-to-end system solution. For the purpose of demonstration, software was implemented on the present Massively Parallel Processor (MPP) configuration for processing Seasat and SIR-B data. The software takes advantage of the high processing speed offered by the MPP, the large Staging Buffer, and the high speed I/O between the MPP array unit and the Staging Buffer. It was found that with unoptimized Parallel Pascal code, the processing time on the MPP for a 4096 x 4096 sample subset of signal data ranges between 18 and 30.2 seconds depending on options
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs
Deep learning frameworks have been widely deployed on GPU servers for deep
learning applications in both academia and industry. In training deep neural
networks (DNNs), there are many standard processes or algorithms, such as
convolution and stochastic gradient descent (SGD), but the running performance
of different frameworks might be different even running the same deep model on
the same GPU hardware. In this study, we evaluate the running performance of
four state-of-the-art distributed deep learning frameworks (i.e., Caffe-MPI,
CNTK, MXNet, and TensorFlow) over single-GPU, multi-GPU, and multi-node
environments. We first build performance models of standard processes in
training DNNs with SGD, and then we benchmark the running performance of these
frameworks with three popular convolutional neural networks (i.e., AlexNet,
GoogleNet and ResNet-50), after that, we analyze what factors that result in
the performance gap among these four frameworks. Through both analytical and
experimental analysis, we identify bottlenecks and overheads which could be
further optimized. The main contribution is that the proposed performance
models and the analysis provide further optimization directions in both
algorithmic design and system configuration.Comment: Published at DataCom'201
A new data analysis framework for the search of continuous gravitational wave signals
Continuous gravitational wave signals, like those expected by asymmetric
spinning neutron stars, are among the most promising targets for LIGO and Virgo
detectors. The development of fast and robust data analysis methods is crucial
to increase the chances of a detection. We have developed a new and flexible
general data analysis framework for the search of this kind of signals, which
allows to reduce the computational cost of the analysis by about two orders of
magnitude with respect to current procedures. This can correspond, at fixed
computing cost, to a sensitivity gain of up to 10%-20%, depending on the search
parameter space. Some possible applications are discussed, with a particular
focus on a directed search for sources in the Galactic center. Validation
through the injection of artificial signals in the data of Advanced LIGO first
observational science run is also shown.Comment: 21 pages, 8 figure
- …