8 research outputs found
Analytic Performance Modeling and Analysis of Detailed Neuron Simulations
Big science initiatives are trying to reconstruct and model the brain by
attempting to simulate brain tissue at larger scales and with increasingly more
biological detail than previously thought possible. The exponential growth of
parallel computer performance has been supporting these developments, and at
the same time maintainers of neuroscientific simulation code have strived to
optimally and efficiently exploit new hardware features. Current state of the
art software for the simulation of biological networks has so far been
developed using performance engineering practices, but a thorough analysis and
modeling of the computational and performance characteristics, especially in
the case of morphologically detailed neuron simulations, is lacking. Other
computational sciences have successfully used analytic performance engineering
and modeling methods to gain insight on the computational properties of
simulation kernels, aid developers in performance optimizations and eventually
drive co-design efforts, but to our knowledge a model-based performance
analysis of neuron simulations has not yet been conducted.
We present a detailed study of the shared-memory performance of
morphologically detailed neuron simulations based on the Execution-Cache-Memory
(ECM) performance model. We demonstrate that this model can deliver accurate
predictions of the runtime of almost all the kernels that constitute the neuron
models under investigation. The gained insight is used to identify the main
governing mechanisms underlying performance bottlenecks in the simulation. The
implications of this analysis on the optimization of neural simulation software
and eventually co-design of future hardware architectures are discussed. In
this sense, our work represents a valuable conceptual and quantitative
contribution to understanding the performance properties of biological networks
simulations.Comment: 18 pages, 6 figures, 15 table
Meeting the Memory Challenges of Brain-Scale Network Simulation
The development of high-performance simulation software is crucial for studying the brain connectome. Using connectome data to generate neurocomputational models requires software capable of coping with models on a variety of scales: from the microscale, investigating plasticity, and dynamics of circuits in local networks, to the macroscale, investigating the interactions between distinct brain regions. Prior to any serious dynamical investigation, the first task of network simulations is to check the consistency of data integrated in the connectome and constrain ranges for yet unknown parameters. Thanks to distributed computing techniques, it is possible today to routinely simulate local cortical networks of around 105 neurons with up to 109 synapses on clusters and multi-processor shared-memory machines. However, brain-scale networks are orders of magnitude larger than such local networks, in terms of numbers of neurons and synapses as well as in terms of computational load. Such networks have been investigated in individual studies, but the underlying simulation technologies have neither been described in sufficient detail to be reproducible nor made publicly available. Here, we discover that as the network model sizes approach the regime of meso- and macroscale simulations, memory consumption on individual compute nodes becomes a critical bottleneck. This is especially relevant on modern supercomputers such as the Blue Gene/P architecture where the available working memory per CPU core is rather limited. We develop a simple linear model to analyze the memory consumption of the constituent components of neuronal simulators as a function of network size and the number of cores used. This approach has multiple benefits. The model enables identification of key contributing components to memory saturation and prediction of the effects of potential improvements to code before any implementation takes place. As a consequence, development cycles can be shorter and less expensive. Applying the model to our freely available Neural Simulation Tool (NEST), we identify the software components dominant at different scales, and develop general strategies for reducing the memory consumption, in particular by using data structures that exploit the sparseness of the local representation of the network. We show that these adaptations enable our simulation software to scale up to the order of 10,000 processors and beyond. As memory consumption issues are likely to be relevant for any software dealing with complex connectome data on such architectures, our approach and our findings should be useful for researchers developing novel neuroinformatics solutions to the challenges posed by the connectome project
Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers
Simulation is a third pillar next to experiment and theory in the study of
complex dynamic systems such as biological neural networks. Contemporary
brain-scale networks correspond to directed graphs of a few million nodes, each
with an in-degree and out-degree of several thousands of edges, where nodes and
edges correspond to the fundamental biological units, neurons and synapses,
respectively. When considering a random graph, each node's edges are
distributed across thousands of parallel processes. The activity in neuronal
networks is also sparse. Each neuron occasionally transmits a brief signal,
called spike, via its outgoing synapses to the corresponding target neurons.
This spatial and temporal sparsity represents an inherent bottleneck for
simulations on conventional computers: Fundamentally irregular memory-access
patterns cause poor cache utilization. Using an established neuronal network
simulation code as a reference implementation, we investigate how common
techniques to recover cache performance such as software-induced prefetching
and software pipelining can benefit a real-world application. The algorithmic
changes reduce simulation time by up to 50%. The study exemplifies that
many-core systems assigned with an intrinsically parallel computational problem
can overcome the von Neumann bottleneck of conventional computer architectures
Neuronal computation on complex dendritic morphologies
When we think about neural cells, we immediately recall the wealth of electrical
behaviour which, eventually, brings about consciousness. Hidden deep in the
frequencies and timings of action potentials, in subthreshold oscillations, and in
the cooperation of tens of billions of neurons, are synchronicities and emergent behaviours
that result in high-level, system-wide properties such as thought and cognition.
However, neurons are even more remarkable for their elaborate morphologies,
unique among biological cells. The principal, and most striking, component of neuronal
morphologies is the dendritic tree.
Despite comprising the vast majority of the surface area and volume of a
neuron, dendrites are often neglected in many neuron models, due to their sheer
complexity. The vast array of dendritic geometries, combined with heterogeneous
properties of the cell membrane, continue to challenge scientists in predicting neuronal
input-output relationships, even in the case of subthreshold dendritic currents.
In this thesis, we will explore the properties of neuronal dendritic trees, and
how they alter and integrate the electrical signals that diffuse along them. After
an introduction to neural cell biology and membrane biophysics, we will review
Abbott's dendritic path integral in detail, and derive the theoretical convergence
of its infinite sum solution. On certain symmetric structures, closed-form solutions
will be found; for arbitrary geometries, we will propose algorithms using various
heuristics for constructing the solution, and assess their computational convergences
on real neuronal morphologies. We will demonstrate how generating terms for the
path integral solution in an order that optimises convergence is non-trivial, and how a computationally-significant number of terms is required for reasonable accuracy.
We will, however, derive a highly-efficient and accurate algorithm for application to
discretised dendritic trees. Finally, a modular method for constructing a solution in
the Laplace domain will be developed