730 research outputs found

    Fast algorithm for real-time rings reconstruction

    Get PDF
    The GAP project is dedicated to study the application of GPU in several contexts in which real-time response is important to take decisions. The definition of real-time depends on the application under study, ranging from answer time of ÎĽs up to several hours in case of very computing intensive task. During this conference we presented our work in low level triggers [1] [2] and high level triggers [3] in high energy physics experiments, and specific application for nuclear magnetic resonance (NMR) [4] [5] and cone-beam CT [6]. Apart from the study of dedicated solution to decrease the latency due to data transport and preparation, the computing algorithms play an essential role in any GPU application. In this contribution, we show an original algorithm developed for triggers application, to accelerate the ring reconstruction in RICH detector when it is not possible to have seeds for reconstruction from external trackers

    MorphIC: A 65-nm 738k-Synapse/mm2^2 Quad-Core Binary-Weight Digital Neuromorphic Processor with Stochastic Spike-Driven Online Learning

    Full text link
    Recent trends in the field of neural network accelerators investigate weight quantization as a means to increase the resource- and power-efficiency of hardware devices. As full on-chip weight storage is necessary to avoid the high energy cost of off-chip memory accesses, memory reduction requirements for weight storage pushed toward the use of binary weights, which were demonstrated to have a limited accuracy reduction on many applications when quantization-aware training techniques are used. In parallel, spiking neural network (SNN) architectures are explored to further reduce power when processing sparse event-based data streams, while on-chip spike-based online learning appears as a key feature for applications constrained in power and resources during the training phase. However, designing power- and area-efficient spiking neural networks still requires the development of specific techniques in order to leverage on-chip online learning on binary weights without compromising the synapse density. In this work, we demonstrate MorphIC, a quad-core binary-weight digital neuromorphic processor embedding a stochastic version of the spike-driven synaptic plasticity (S-SDSP) learning rule and a hierarchical routing fabric for large-scale chip interconnection. The MorphIC SNN processor embeds a total of 2k leaky integrate-and-fire (LIF) neurons and more than two million plastic synapses for an active silicon area of 2.86mm2^2 in 65nm CMOS, achieving a high density of 738k synapses/mm2^2. MorphIC demonstrates an order-of-magnitude improvement in the area-accuracy tradeoff on the MNIST classification task compared to previously-proposed SNNs, while having no penalty in the energy-accuracy tradeoff.Comment: This document is the paper as accepted for publication in the IEEE Transactions on Biomedical Circuits and Systems journal (2019), the fully-edited paper is available at https://ieeexplore.ieee.org/document/876400

    Cloud RAN for Mobile Networks - a Technology Overview

    Get PDF
    Cloud Radio Access Network (C-RAN) is a novel mobile network architecture which can address a number of challenges the operators face while trying to support growing end-user’s needs. The main idea behind C-RAN is to pool the Baseband Units (BBUs) from multiple base stations into centralized BBU Pool for statistical multiplexing gain, while shifting the burden to the high-speed wireline transmission of In-phase and Quadrature (IQ) data. C-RAN enables energy efficient network operation and possible cost savings on base- band resources. Furthermore, it improves network capacity by performing load balancing and cooperative processing of signals originating from several base stations. This article surveys the state-of-the-art literature on C-RAN. It can serve as a starting point for anyone willing to understand C-RAN architecture and advance the research on C-RA

    Computational Modeling of Biological Neural Networks on GPUs: Strategies and Performance

    Get PDF
    Simulating biological neural networks is an important task for computational neuroscientists attempting to model and analyze brain activity and function. As these networks become larger and more complex, the computational power required grows significantly, often requiring the use of supercomputers or compute clusters. An emerging low-cost, highly accessible alternative to many of these resources is the Graphics Processing Unit (GPU) - specialized massively-parallel graphics hardware that has seen increasing use as a general purpose computational accelerator thanks largely due to NVIDIA\u27s CUDA programming interface. We evaluated the relative benefits and limitations of GPU-based tools for large-scale neural network simulation and analysis, first by developing an agent-inspired spiking neural network simulator then by adapting a neural signal decoding algorithm. Under certain network configurations, the simulator was able to outperform an equivalent MPI-based parallel implementation run on a dedicated compute cluster, while the decoding algorithm implementation consistently outperformed its serial counterpart. Additionally, the GPU-based simulator was able to readily visualize network spiking activity in real-time due to the close integration with standard computer graphics APIs. The GPU was shown to provide significant performance benefits under certain circumstances while lagging behind in others. Given the complex nature of these research tasks, a hybrid strategy that combines GPU- and CPU-based approaches provides greater performance than either separately

    The Level-0 Muon Trigger for the LHCb Experiment

    Get PDF
    A very compact architecture has been developed for the first level Muon Trigger of the LHCb experiment that processes 40 millions of proton-proton collisions per second. For each collision, it receives 3.2 kBytes of data and it finds straight tracks within a 1.2 microseconds latency. The trigger implementation is massively parallel, pipelined and fully synchronous with the LHC clock. It relies on 248 high density Field Programable Gate arrays and on the massive use of multigigabit serial link transceivers embedded inside FPGAs.Comment: 33 pages, 16 figures, submitted to NIM
    • …
    corecore