49,279 research outputs found

    Computational Modeling of Biological Neural Networks on GPUs: Strategies and Performance

    Get PDF
    Simulating biological neural networks is an important task for computational neuroscientists attempting to model and analyze brain activity and function. As these networks become larger and more complex, the computational power required grows significantly, often requiring the use of supercomputers or compute clusters. An emerging low-cost, highly accessible alternative to many of these resources is the Graphics Processing Unit (GPU) - specialized massively-parallel graphics hardware that has seen increasing use as a general purpose computational accelerator thanks largely due to NVIDIA\u27s CUDA programming interface. We evaluated the relative benefits and limitations of GPU-based tools for large-scale neural network simulation and analysis, first by developing an agent-inspired spiking neural network simulator then by adapting a neural signal decoding algorithm. Under certain network configurations, the simulator was able to outperform an equivalent MPI-based parallel implementation run on a dedicated compute cluster, while the decoding algorithm implementation consistently outperformed its serial counterpart. Additionally, the GPU-based simulator was able to readily visualize network spiking activity in real-time due to the close integration with standard computer graphics APIs. The GPU was shown to provide significant performance benefits under certain circumstances while lagging behind in others. Given the complex nature of these research tasks, a hybrid strategy that combines GPU- and CPU-based approaches provides greater performance than either separately

    Multi-GPU Development of a Neural Networks Based Reconstructor for Adaptive Optics

    Get PDF
    Aberrations introduced by the atmospheric turbulence in large telescopes are compensated using adaptive optics systems, where the use of deformable mirrors and multiple sensors relies on complex control systems. Recently, the development of larger scales of telescopes as the E-ELT or TMT has created a computational challenge due to the increasing complexity of the new adaptive optics systems. The Complex Atmospheric Reconstructor based on Machine Learning (CARMEN) is an algorithm based on artificial neural networks, designed to compensate the atmospheric turbulence. During recent years, the use of GPUs has been proved to be a great solution to speed up the learning process of neural networks, and different frameworks have been created to ease their development. The implementation of CARMEN in different Multi-GPU frameworks is presented in this paper, along with its development in a language originally developed for GPU, like CUDA. This implementation offers the best response for all the presented cases, although its advantage of using more than one GPU occurs only in large networks

    Workload-aware Automatic Parallelization for Multi-GPU DNN Training

    Full text link
    Deep neural networks (DNNs) have emerged as successful solutions for variety of artificial intelligence applications, but their very large and deep models impose high computational requirements during training. Multi-GPU parallelization is a popular option to accelerate demanding computations in DNN training, but most state-of-the-art multi-GPU deep learning frameworks not only require users to have an in-depth understanding of the implementation of the frameworks themselves, but also apply parallelization in a straight-forward way without optimizing GPU utilization. In this work, we propose a workload-aware auto-parallelization framework (WAP) for DNN training, where the work is automatically distributed to multiple GPUs based on the workload characteristics. We evaluate WAP using TensorFlow with popular DNN benchmarks (AlexNet and VGG-16), and show competitive training throughput compared with the state-of-the-art frameworks, and also demonstrate that WAP automatically optimizes GPU assignment based on the workload's compute requirements, thereby improving energy efficiency.Comment: This paper is accepted in ICASSP201

    GPGPU Accelerated Deep Object Classification on a Heterogeneous Mobile Platform

    Get PDF
    Deep convolutional neural networks achieve state-of-the-art performance in image classification. The computational and memory requirements of such networks are however huge, and that is an issue on embedded devices due to their constraints. Most of this complexity derives from the convolutional layers and in particular from the matrix multiplications they entail. This paper proposes a complete approach to image classification providing common layers used in neural networks. Namely, the proposed approach relies on a heterogeneous CPU-GPU scheme for performing convolutions in the transform domain. The Compute Unified Device Architecture(CUDA)-based implementation of the proposed approach is evaluated over three different image classification networks on a Tegra K1 CPU-GPU mobile processor. Experiments show that the presented heterogeneous scheme boasts a 50 speedup over the CPU-only reference and outperforms a GPU-based reference by 2, while slashing the power consumption by nearly 30%
    corecore