57,526 research outputs found
A Multi-GPU Programming Library for Real-Time Applications
We present MGPU, a C++ programming library targeted at single-node multi-GPU
systems. Such systems combine disproportionate floating point performance with
high data locality and are thus well suited to implement real-time algorithms.
We describe the library design, programming interface and implementation
details in light of this specific problem domain. The core concepts of this
work are a novel kind of container abstraction and MPI-like communication
methods for intra-system communication. We further demonstrate how MGPU is used
as a framework for porting existing GPU libraries to multi-device
architectures. Putting our library to the test, we accelerate an iterative
non-linear image reconstruction algorithm for real-time magnetic resonance
imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs
and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us
to conclude that multi-GPU systems are a viable solution for real-time MRI
reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure
Cost-effective aperture arrays for SKA Phase 1: single or dual-band?
An important design decision for the first phase of the Square Kilometre
Array is whether the low frequency component (SKA1-low) should be implemented
as a single or dual-band aperture array; that is, using one or two antenna
element designs to observe the 70-450 MHz frequency band. This memo uses an
elementary parametric analysis to make a quantitative, first-order cost
comparison of representative implementations of a single and dual-band system,
chosen for comparable performance characteristics. A direct comparison of the
SKA1-low station costs reveals that those costs are similar, although the
uncertainties are high. The cost impact on the broader telescope system varies:
the deployment and site preparation costs are higher for the dual-band array,
but the digital signal processing costs are higher for the single-band array.
This parametric analysis also shows that a first stage of analogue tile
beamforming, as opposed to only station-level, all-digital beamforming, has the
potential to significantly reduce the cost of the SKA1-low stations. However,
tile beamforming can limit flexibility and performance, principally in terms of
reducing accessible field of view. We examine the cost impacts in the context
of scientific performance, for which the spacing and intra-station layout of
the antenna elements are important derived parameters. We discuss the
implications of the many possible intra-station signal transport and processing
architectures and consider areas where future work could improve the accuracy
of SKA1-low costing.Comment: 64 pages, 23 figures, submitted to the SKA Memo serie
Extending and Implementing the Self-adaptive Virtual Processor for Distributed Memory Architectures
Many-core architectures of the future are likely to have distributed memory
organizations and need fine grained concurrency management to be used
effectively. The Self-adaptive Virtual Processor (SVP) is an abstract
concurrent programming model which can provide this, but the model and its
current implementations assume a single address space shared memory. We
investigate and extend SVP to handle distributed environments, and discuss a
prototype SVP implementation which transparently supports execution on
heterogeneous distributed memory clusters over TCP/IP connections, while
retaining the original SVP programming model
A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)
Neuromorphic computing systems comprise networks of neurons that use
asynchronous events for both computation and communication. This type of
representation offers several advantages in terms of bandwidth and power
consumption in neuromorphic electronic systems. However, managing the traffic
of asynchronous events in large scale systems is a daunting task, both in terms
of circuit complexity and memory requirements. Here we present a novel routing
methodology that employs both hierarchical and mesh routing strategies and
combines heterogeneous memory structures for minimizing both memory
requirements and latency, while maximizing programming flexibility to support a
wide range of event-based neural network architectures, through parameter
configuration. We validated the proposed scheme in a prototype multi-core
neuromorphic processor chip that employs hybrid analog/digital circuits for
emulating synapse and neuron dynamics together with asynchronous digital
circuits for managing the address-event traffic. We present a theoretical
analysis of the proposed connectivity scheme, describe the methods and circuits
used to implement such scheme, and characterize the prototype chip. Finally, we
demonstrate the use of the neuromorphic processor with a convolutional neural
network for the real-time classification of visual symbols being flashed to a
dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure
- …