57,526 research outputs found

    A Multi-GPU Programming Library for Real-Time Applications

    Full text link
    We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real-time algorithms. We describe the library design, programming interface and implementation details in light of this specific problem domain. The core concepts of this work are a novel kind of container abstraction and MPI-like communication methods for intra-system communication. We further demonstrate how MGPU is used as a framework for porting existing GPU libraries to multi-device architectures. Putting our library to the test, we accelerate an iterative non-linear image reconstruction algorithm for real-time magnetic resonance imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us to conclude that multi-GPU systems are a viable solution for real-time MRI reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure

    Cost-effective aperture arrays for SKA Phase 1: single or dual-band?

    Full text link
    An important design decision for the first phase of the Square Kilometre Array is whether the low frequency component (SKA1-low) should be implemented as a single or dual-band aperture array; that is, using one or two antenna element designs to observe the 70-450 MHz frequency band. This memo uses an elementary parametric analysis to make a quantitative, first-order cost comparison of representative implementations of a single and dual-band system, chosen for comparable performance characteristics. A direct comparison of the SKA1-low station costs reveals that those costs are similar, although the uncertainties are high. The cost impact on the broader telescope system varies: the deployment and site preparation costs are higher for the dual-band array, but the digital signal processing costs are higher for the single-band array. This parametric analysis also shows that a first stage of analogue tile beamforming, as opposed to only station-level, all-digital beamforming, has the potential to significantly reduce the cost of the SKA1-low stations. However, tile beamforming can limit flexibility and performance, principally in terms of reducing accessible field of view. We examine the cost impacts in the context of scientific performance, for which the spacing and intra-station layout of the antenna elements are important derived parameters. We discuss the implications of the many possible intra-station signal transport and processing architectures and consider areas where future work could improve the accuracy of SKA1-low costing.Comment: 64 pages, 23 figures, submitted to the SKA Memo serie

    Extending and Implementing the Self-adaptive Virtual Processor for Distributed Memory Architectures

    Get PDF
    Many-core architectures of the future are likely to have distributed memory organizations and need fine grained concurrency management to be used effectively. The Self-adaptive Virtual Processor (SVP) is an abstract concurrent programming model which can provide this, but the model and its current implementations assume a single address space shared memory. We investigate and extend SVP to handle distributed environments, and discuss a prototype SVP implementation which transparently supports execution on heterogeneous distributed memory clusters over TCP/IP connections, while retaining the original SVP programming model

    A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)

    Full text link
    Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multi-core neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure
    • …
    corecore