12 research outputs found
Optimization and parallelization of tensor and ODE/PDE computations on GPU
We propose a multi-level GPU-based parallelization algorithm to solve the multi-compartment
Hodgkin Huxley (HH) model equation that requires solving the Hines matrix. We use
a ‘parallel-in-time’ algorithm (like the Parareal strategy) for obtaining outer level parallelism,
and an Exact Domain Decomposition (EDD) algorithm with fine-decomposition for
inner-level parallelism. We show that our technique can also be applied to any differential
equation like the heat equations which induce tridiagonal systems.
Typically, a solution to the HH equation runs for hundreds to tens of thousands of time-steps
while solving a Hines matrix at each time step. Previous solutions by Michael Mascagni
et al. (1991) and Hines et al. (2008) to this problem have tackled only solving the Hines
matrix in parallel.
Our approach uses the dynamic parallelism of CUDA to achieve multi-level parallelism
on GPUs. Our solution outperforms the sequential time method on standard neuron morphologies
upto 2.5x. We also show that iterative part of parareal method converges in 5-7
iterations on average with an accuracy of 10−6.
We also propose a GPU optimization for the Higher Order Tensor Renormalization Group
problem, where the tensor contraction operations inside HOTRG is optimized by a multi-
GPU implementation using cuBLAS xt API
Simulating the behavior of the human brain on GPUS
The simulation of the behavior of the Human Brain is one of the most important challenges in computing today. The main problem consists of finding efficient ways to manipulate and compute the huge volume of data that this kind of simulations need, using the current technology. In this sense, this work is focused on one of the main steps of such simulation, which consists of computing the Voltage on neurons’ morphology. This is carried out using the Hines Algorithm and, although this algorithm is the optimum method in terms of number of operations, it is in need of non-trivial modifications to be efficiently parallelized on GPUs. We proposed several optimizations to accelerate this algorithm on GPU-based architectures, exploring the limitations of both, method and architecture, to be able to solve efficiently a high number of Hines systems (neurons). Each of the optimizations are deeply analyzed and described. Two different approaches are studied, one for mono-morphology simulations (batch of neurons with the same shape) and one for multi-morphology simulations (batch of neurons where every neuron has a different shape). In mono-morphology simulations we obtain a good performance using just a single kernel to compute all the neurons. However this turns out to be inefficient on multi-morphology simulations. Unlike the previous scenario, in multi-morphology simulations a much more complex implementation is necessary to obtain a good performance. In this case, we must execute more than one single GPU kernel. In every execution (kernel call) one specific part of the batch of the neurons is solved. These parts can be seen as multiple and independent tridiagonal systems. Although the present paper is focused on the simulation of the behavior of the Human Brain, some of these techniques, in particular those related to the solving of tridiagonal systems, can be also used for multiple oil and gas simulations. Our studies have proven that the optimizations proposed in the present work can achieve high performance on those computations with a high number of neurons, being our GPU implementations about 4× and 8× faster than the OpenMP multicore implementation (16 cores), using one and two NVIDIA K80 GPUs respectively. Also, it is important to highlight that these optimizations can continue scaling, even when dealing with a very high number of neurons.This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 720270 (HBP SGA1),
from the Spanish Ministry of Economy and Competitiveness under the project Computación de Altas Prestaciones VII (TIN2015-65316-P), the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d’Execució Parallels (2014-SGR-1051). We thank the support of NVIDIA through the BSC/UPC NVIDIA GPU Center of Excellence, and the European Union’s Horizon 2020 Research and Innovation Program under the Marie Sklodowska-Curie Grant Agreement No. 749516.Peer ReviewedPostprint (published version
Computational Modeling of Biological Neural Networks on GPUs: Strategies and Performance
Simulating biological neural networks is an important task for computational neuroscientists attempting to model and analyze brain activity and function. As these networks become larger and more complex, the computational power required grows significantly, often requiring the use of supercomputers or compute clusters. An emerging low-cost, highly accessible alternative to many of these resources is the Graphics Processing Unit (GPU) - specialized massively-parallel graphics hardware that has seen increasing use as a general purpose computational accelerator thanks largely due to NVIDIA\u27s CUDA programming interface. We evaluated the relative benefits and limitations of GPU-based tools for large-scale neural network simulation and analysis, first by developing an agent-inspired spiking neural network simulator then by adapting a neural signal decoding algorithm. Under certain network configurations, the simulator was able to outperform an equivalent MPI-based parallel implementation run on a dedicated compute cluster, while the decoding algorithm implementation consistently outperformed its serial counterpart. Additionally, the GPU-based simulator was able to readily visualize network spiking activity in real-time due to the close integration with standard computer graphics APIs. The GPU was shown to provide significant performance benefits under certain circumstances while lagging behind in others. Given the complex nature of these research tasks, a hybrid strategy that combines GPU- and CPU-based approaches provides greater performance than either separately
A Scalable Parallel Algorithm for the Simulation of Structural Plasticity in the Brain
The neural network in the brain is not hard-wired. Even in the mature
brain, new connections between neurons are formed and existing ones are
deleted, which is called structural plasticity. The dynamics of the
connectome is key to understanding how learning, memory, and healing after
lesions such as stroke work. However, with current experimental techniques
even the creation of an exact static connectivity map, which is required
for various brain simulations, is very difficult.
One alternative is to use simulation based on network models to predict the
evolution of synapses between neurons based on their specified activity
targets. This is particularly useful as experimental measurements of the
spiking frequency of neurons are more easily accessible and reliable than
biological connectivity data. The Model of Structural Plasticity (MSP) by
Butz and van Ooyen is an example of this approach. In traditional models,
connectivity between neurons is fixed while plasticity merely arises from
changes in the strength of existing synapses, typically modeled as weight
factors. MSP, in contrast, models a synapse as a connection between an
"axonal" plug and a "dendritic" socket. These synaptic elements grow and
shrink independently on each neuron. When an axonal element of one neuron
connects to the dendritic element of another neuron, a new synapse is
formed. Conversely, when a synaptic element bound in a synapse retracts,
the corresponding synapse is removed. The governing idea of the model is
that plasticity in cortical networks is driven by the need of individual
neurons to homeostatically maintain their average electrical activity.
However, to predict which neurons connect to each other, the current MSP
model computes probabilities for all pairs of neurons, resulting in a
complexity O(n^2). To enable large-scale simulations with millions of
neurons and beyond, this quadratic term is prohibitive. Inspired by
hierarchical methods for solving n-body problems in particle physics, this
dissertation presents a scalable approximation algorithm for simulating
structural plasticity based on MSP.
To scale MSP to millions of neurons, we adapt the Barnes-Hut algorithm as
used in gravitational particle simulations to a scalable solution for the
simulation of structural plasticity in the brain with a time complexity of
O(n log^2 n) instead of O(n^2). Then, we show through experimental
validation that the approximation underlying the algorithm does not
adversely affect the quality of the results. For this purpose, we compare
neural networks created by the original MSP with those created by our
approximation of it using graph metrics.
Finally, we prove that our scalable approximation algorithm can simulate
the dynamics of the connectome with 10^9 neurons - four orders of
magnitude more than the naive O(n^2) version, and two orders less
than the human brain. We present an MPI-based scalable implementation of
the scalable algorithm and our performance extrapolations predict that,
given sufficient compute resources, even with today's technology a
full-scale simulation of the human brain with 10^11 neurons is possible
in principle.
Until now, the scale of the largest structural plasticity simulations of
MSP in terms of the number of neurons corresponded to that of a fruit fly.
Our approximation algorithm goes a significant step further, reaching a
scale similar to that of a galago primate. Additionally, large-scale brain
connectivity maps can now be grown from scratch and their evolution after
destructive events such as stroke can be simulated