569 research outputs found
Genetic Algorithm Modeling with GPU Parallel Computing Technology
We present a multi-purpose genetic algorithm, designed and implemented with
GPGPU / CUDA parallel computing technology. The model was derived from a
multi-core CPU serial implementation, named GAME, already scientifically
successfully tested and validated on astrophysical massive data classification
problems, through a web application resource (DAMEWARE), specialized in data
mining based on Machine Learning paradigms. Since genetic algorithms are
inherently parallel, the GPGPU computing paradigm has provided an exploit of
the internal training features of the model, permitting a strong optimization
in terms of processing performances and scalability.Comment: 11 pages, 2 figures, refereed proceedings; Neural Nets and
Surroundings, Proceedings of 22nd Italian Workshop on Neural Nets, WIRN 2012;
Smart Innovation, Systems and Technologies, Vol. 19, Springe
A Multi-GPU Programming Library for Real-Time Applications
We present MGPU, a C++ programming library targeted at single-node multi-GPU
systems. Such systems combine disproportionate floating point performance with
high data locality and are thus well suited to implement real-time algorithms.
We describe the library design, programming interface and implementation
details in light of this specific problem domain. The core concepts of this
work are a novel kind of container abstraction and MPI-like communication
methods for intra-system communication. We further demonstrate how MGPU is used
as a framework for porting existing GPU libraries to multi-device
architectures. Putting our library to the test, we accelerate an iterative
non-linear image reconstruction algorithm for real-time magnetic resonance
imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs
and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us
to conclude that multi-GPU systems are a viable solution for real-time MRI
reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure
Counting Triangles in Large Graphs on GPU
The clustering coefficient and the transitivity ratio are concepts often used
in network analysis, which creates a need for fast practical algorithms for
counting triangles in large graphs. Previous research in this area focused on
sequential algorithms, MapReduce parallelization, and fast approximations.
In this paper we propose a parallel triangle counting algorithm for CUDA GPU.
We describe the implementation details necessary to achieve high performance
and present the experimental evaluation of our approach. Our algorithm achieves
8 to 15 times speedup over the CPU implementation and is capable of finding 3.8
billion triangles in an 89 million edges graph in less than 10 seconds on the
Nvidia Tesla C2050 GPU.Comment: 2016 IEEE International Parallel and Distributed Processing Symposium
Workshops (IPDPSW
Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library
Application programming for GPUs (Graphics Processing Units) is complex and error-prone, because the popular approaches — CUDA and OpenCL — are intrinsically low-level and offer no special support for systems consisting of multiple GPUs. The SkelCL library presented in this paper
is built on top of the OpenCL standard and offers preimplemented recurring computation and communication patterns (skeletons) which greatly simplify programming for multiGPU systems. The library also provides an abstract vector data type and a high-level data (re)distribution mechanism to shield the programmer from the low-level data transfers between the system’s main memory and multiple GPUs. In this
paper, we focus on the specific support in SkelCL for systems with multiple GPUs and use a real-world application study from the area of medical imaging to demonstrate the reduced programming effort and competitive performance of SkelCL as compared to OpenCL and CUDA. Besides, we illustrate how SkelCL adapts to large-scale, distributed heterogeneous
systems in order to simplify their programming
Enabling a High Throughput Real Time Data Pipeline for a Large Radio Telescope Array with GPUs
The Murchison Widefield Array (MWA) is a next-generation radio telescope
currently under construction in the remote Western Australia Outback. Raw data
will be generated continuously at 5GiB/s, grouped into 8s cadences. This high
throughput motivates the development of on-site, real time processing and
reduction in preference to archiving, transport and off-line processing. Each
batch of 8s data must be completely reduced before the next batch arrives.
Maintaining real time operation will require a sustained performance of around
2.5TFLOP/s (including convolutions, FFTs, interpolations and matrix
multiplications). We describe a scalable heterogeneous computing pipeline
implementation, exploiting both the high computing density and FLOP-per-Watt
ratio of modern GPUs. The architecture is highly parallel within and across
nodes, with all major processing elements performed by GPUs. Necessary
scatter-gather operations along the pipeline are loosely synchronized between
the nodes hosting the GPUs. The MWA will be a frontier scientific instrument
and a pathfinder for planned peta- and exascale facilities.Comment: Version accepted by Comp. Phys. Com
- …