research

The NEST neuronal network simulator: Performance optimization techniques for high performance computing platforms

Abstract

NEST (http://www.nest-initiative.org) is a spiking neural network simulator used in computational neuroscience to simulate interaction dynamics between neurons. It runs small networks on local machines and large brain-scale networks on the world’s leading supercomputers. To reach both of these scales, NEST is hybrid-parallel, using OpenMP for shared memory parallelism and MPI to handle distributed memory parallelism. To extend simulations from short runs of 109 neurons toward long runs of 1011 neurons, increased performance is essential. That performance goal can only be achieved through a feedback loop between modeling of the software, profiling to identify bottlenecks, and improvement to the code-base. HPCToolkit and SCORE-P toolkit were used to profile performance for a standard benchmark, the balanced Brunel network. We have additionally developed a performance model of the simulation stage of neural dynamics after network initialization and proxy code used to reduce the resources required to model production runs. We have pursued a semi-empirical approach by specifying a theoretical model with free parameters specified by fitting the model to empirical data (see figure). Thus we can extrapolate the scaling efficiency of NEST and by comparing components, identify algorithmic bottlenecks and performance issues which only show up at large simulation sizes. Performance issues identified include: 1) buffering of random number generation lead to extended wait times at MPI barriers; and 2) inefficiencies in the construction of time stamps consumed inordinate computational resources during spike delivery. Feature 1 appears primarily for smaller simulations, while feature 2 is only apparent at the current limit of neural networks on the largest supercomputing and can only be identified through the use of profiling in light of clear computing models. By improving the underlying code, NEST performance has been significantly improved (on the order of 25% for each feature) and we have improved weak-scaling for simulations at HPC scales

    Similar works