3,674 research outputs found
Developing a pattern discovery method in time series data and its GPU acceleration
The Dynamic Time Warping (DTW) algorithm is widely used in finding the global alignment of time series. Many time series data mining and analytical problems can be solved by the DTW algorithm. However, using the DTW algorithm to find similar subsequences is computationally expensive or unable to perform accurate analysis. Hence, in the literature, the parallelisation technique is used to speed up the DTW algorithm. However, due to the nature of DTW algorithm, parallelising this algorithm remains an open challenge. In this paper, we first propose a novel method that finds the similar local subsequence. Our algorithm first searches for the possible start positions of subsequence, and then finds the best-matching alignment from these positions. Moreover, we parallelise the proposed algorithm on GPUs using CUDA and further propose an optimisation technique to improve the performance of our parallelization implementation on GPU. We conducted the extensive experiments to evaluate the proposed method. Experimental results demonstrate that the proposed algorithm is able to discover time series subsequences efficiently and that the proposed GPU-based parallelization technique can further speedup the processing
Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond
In this and a set of companion whitepapers, the USQCD Collaboration lays out
a program of science and computing for lattice gauge theory. These whitepapers
describe how calculation using lattice QCD (and other gauge theories) can aid
the interpretation of ongoing and upcoming experiments in particle and nuclear
physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers
A GPU implementation of the Correlation Technique for Real-time Fourier Domain Pulsar Acceleration Searches
The study of binary pulsars enables tests of general relativity. Orbital
motion in binary systems causes the apparent pulsar spin frequency to drift,
reducing the sensitivity of periodicity searches. Acceleration searches are
methods that account for the effect of orbital acceleration. Existing methods
are currently computationally expensive, and the vast amount of data that will
be produced by next generation instruments such as the Square Kilometre Array
(SKA) necessitates real-time acceleration searches, which in turn requires the
use of High Performance Computing (HPC) platforms. We present our
implementation of the Correlation Technique for the Fourier Domain Acceleration
Search (FDAS) algorithm on Graphics Processor Units (GPUs). The correlation
technique is applied as a convolution with multiple Finite Impulse Response
filters in the Fourier domain. Two approaches are compared: the first uses the
NVIDIA cuFFT library for applying Fast Fourier Transforms (FFTs) on the GPU,
and the second contains a custom FFT implementation in GPU shared memory. We
find that the FFT shared memory implementation performs between 1.5 and 3.2
times faster than our cuFFT-based application for smaller but sufficient filter
sizes. It is also 4 to 6 times faster than the existing GPU and OpenMP
implementations of FDAS. This work is part of the AstroAccelerate project, a
many-core accelerated time-domain signal processing library for radio
astronomy.Comment: 20 pages, 9 figures. Accepted for publication in ApJ
Towards a Scalable Hardware/Software Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device
Currently, most designers face a daunting task to
research different design flows and learn the intricacies of
specific software from various manufacturers in
hardware/software co-design. An urgent need of creating a
scalable hardware/software co-design platform has become a key
strategic element for developing hardware/software integrated
systems. In this paper, we propose a new design flow for building
a scalable co-design platform on FPGA-based system-on-chip.
We employ an integrated approach to implement a histogram
oriented gradients (HOG) and a support vector machine (SVM)
classification on a programmable device for pedestrian tracking.
Not only was hardware resource analysis reported, but the
precision and success rates of pedestrian tracking on nine open
access image data sets are also analysed. Finally, our proposed
design flow can be used for any real-time image processingrelated
products on programmable ZYNQ-based embedded
systems, which benefits from a reduced design time and provide a
scalable solution for embedded image processing products
Molecular dynamics simulation: a tool for exploration and discovery using simple models
Emergent phenomena share the fascinating property of not being obvious
consequences of the design of the system in which they appear. This
characteristic is no less relevant when attempting to simulate such phenomena,
given that the outcome is not always a foregone conclusion. The present survey
focuses on several simple model systems that exhibit surprisingly rich emergent
behavior, all studied by MD simulation. The examples are taken from the
disparate fields of fluid dynamics, granular matter and supramolecular
self-assembly. In studies of fluids modeled at the detailed microscopic level
using discrete particles, the simulations demonstrate that complex hydrodynamic
phenomena in rotating and convecting fluids, the Taylor-Couette and
Rayleigh-B\'enard instabilities, can not only be observed within the limited
length and time scales accessible to MD, but even quantitative agreement can be
achieved. Simulation of highly counterintuitive segregation phenomena in
granular mixtures, again using MD methods, but now augmented by forces
producing damping and friction, leads to results that resemble experimentally
observed axial and radial segregation in the case of a rotating cylinder, and
to a novel form of horizontal segregation in a vertically vibrated layer.
Finally, when modeling self-assembly processes analogous to the formation of
the polyhedral shells that package spherical viruses, simulation of suitably
shaped particles reveals the ability to produce complete, error-free assembly,
and leads to the important general observation that reversible growth steps
contribute to the high yield. While there are limitations to the MD approach,
both computational and conceptual, the results offer a tantalizing hint of the
kinds of phenomena that can be explored, and what might be discovered when
sufficient resources are brought to bear on a problem.Comment: 21 pages, 20 figures (v2 - minor text addition
- …