22 research outputs found

    Routing Brain Traffic Through the Von Neumann Bottleneck: Parallel Sorting and Refactoring.

    Get PDF
    Generic simulation code for spiking neuronal networks spends the major part of the time in the phase where spikes have arrived at a compute node and need to be delivered to their target neurons. These spikes were emitted over the last interval between communication steps by source neurons distributed across many compute nodes and are inherently irregular and unsorted with respect to their targets. For finding those targets, the spikes need to be dispatched to a three-dimensional data structure with decisions on target thread and synapse type to be made on the way. With growing network size, a compute node receives spikes from an increasing number of different source neurons until in the limit each synapse on the compute node has a unique source. Here, we show analytically how this sparsity emerges over the practically relevant range of network sizes from a hundred thousand to a billion neurons. By profiling a production code we investigate opportunities for algorithmic changes to avoid indirections and branching. Every thread hosts an equal share of the neurons on a compute node. In the original algorithm, all threads search through all spikes to pick out the relevant ones. With increasing network size, the fraction of hits remains invariant but the absolute number of rejections grows. Our new alternative algorithm equally divides the spikes among the threads and immediately sorts them in parallel according to target thread and synapse type. After this, every thread completes delivery solely of the section of spikes for its own neurons. Independent of the number of threads, all spikes are looked at only two times. The new algorithm halves the number of instructions in spike delivery which leads to a reduction of simulation time of up to 40 %. Thus, spike delivery is a fully parallelizable process with a single synchronization point and thereby well suited for many-core systems. Our analysis indicates that further progress requires a reduction of the latency that the instructions experience in accessing memory. The study provides the foundation for the exploration of methods of latency hiding like software pipelining and software-induced prefetching

    Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers

    Full text link
    Simulation is a third pillar next to experiment and theory in the study of complex dynamic systems such as biological neural networks. Contemporary brain-scale networks correspond to directed graphs of a few million nodes, each with an in-degree and out-degree of several thousands of edges, where nodes and edges correspond to the fundamental biological units, neurons and synapses, respectively. When considering a random graph, each node's edges are distributed across thousands of parallel processes. The activity in neuronal networks is also sparse. Each neuron occasionally transmits a brief signal, called spike, via its outgoing synapses to the corresponding target neurons. This spatial and temporal sparsity represents an inherent bottleneck for simulations on conventional computers: Fundamentally irregular memory-access patterns cause poor cache utilization. Using an established neuronal network simulation code as a reference implementation, we investigate how common techniques to recover cache performance such as software-induced prefetching and software pipelining can benefit a real-world application. The algorithmic changes reduce simulation time by up to 50%. The study exemplifies that many-core systems assigned with an intrinsically parallel computational problem can overcome the von Neumann bottleneck of conventional computer architectures

    Source-based nomenclature for single-strand homopolymers and copolymers (IUPAC Recommendations 2016)

    Get PDF
    IUPAC recommendations on source-based nomenclature for single-strand polymers have so far addressed its application mainly to copolymers, non-linear polymers and polymer assemblies, and within generic source-based nomenclature of polymers. In this document, rules are formulated for devising a satisfactory source-based name for a polymer, whether homopolymer or copolymer, which are as clear and rigorous as possible. Thus, the source-based system for naming polymers is presented in a totality that serves as a user-friendly alternative to the structure-based system of polymer nomenclature. In addition, because of their widespread and established use, recommendations for the use of traditional names of polymers are also elaborated

    Execution Performance Analysis of the ABySS Genome Sequence Assembler using Scalasca on the K computer

    Get PDF
    Performance analysis of the ABySS genome sequence assembler (ABYSS-P) executing on the K computer with up to 8192 compute nodes is described which identified issues that limited scalability to less than 1024 compute nodes and required prohibitive message buffer memory with 16384 or more compute nodes. The open-source Scalasca toolset was employed to analyse executions, revealing the impact of massive amounts of MPI point-to-point communication used particularly for master/worker process coordination, and inefficient parallel file operations that manifest as waiting time at later MPI collective synchronisations and communications. Initial remediation via use of collective communication operations and alternate strategies for parallel file handling show large performance and scalability improvements, with partial executions validated on the full 82,944 compute nodes of the K computer

    Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers

    No full text
    Simulation is a third pillar next to experiment and theory in the study of complex dynamic systems such as biological neural networks. Contemporary brain-scale networks correspond to directed graphs of a few million nodes, each with an in-degree and out-degree of several thousands of edges, where nodes and edges correspond to the fundamental biological units, neurons and synapses, respectively. When considering a random graph, each node's edges are distributed across thousands of parallel processes. The activity in neuronal networks is also sparse. Each neuron occasionally transmits a brief signal, called spike, via its outgoing synapses to the corresponding target neurons. This spatial and temporal sparsity represents an inherent bottleneck for simulations on conventional computers: Fundamentally irregular memory-access patterns cause poor cache utilization. Using an established neuronal network simulation code as a reference implementation, we investigate how common techniques to recover cache performance such as software-induced prefetching and software pipelining can benefit a real-world application. The algorithmic changes reduce simulation time by up to 50%. The study exemplifies that many-core systems assigned with an intrinsically parallel computational problem can overcome the von Neumann bottleneck of conventional computer architectures

    Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers

    Get PDF
    State-of-the-art software tools for neuronal network simulations scale to the largest computing systems available today and enable investigations of large-scale networks of up to 10 % of the human cortex at a resolution of individual neurons and synapses. Due to an upper limit on the number of incoming connections of a single neuron, network connectivity becomes extremely sparse at this scale. To manage computational costs, simulation software ultimately targeting the brain scale needs to fully exploit this sparsity. Here we present a two-tier connection infrastructure and a framework for directed communication among compute nodes accounting for the sparsity of brain-scale networks. We demonstrate the feasibility of this approach by implementing the technology in the NEST simulation code and we investigate its performance in different scaling scenarios of typical network simulations. Our results show that the new data structures and communication scheme prepare the simulation kernel for post-petascale high-performance computing facilities without sacrificing performance in smaller systems

    Including Gap Junctions into Distributed Neuronal Network Simulations

    No full text
    Contemporary simulation technology for neuronal networks enables the simulation of brain-scale networks using neuron models with a single or a few compartments. However, distributed simulations at full cell density are still lacking the electrical coupling between cells via so called gap junctions. This is due to the absence of efficient algorithms to simulate gap junctions on large parallel computers. The difficulty is that gap junctions require an instantaneous interaction between the coupled neurons, whereas the efficiency of simulation codes for spiking neurons relies on delayed communication. In a recent paper [15] we describe a technology to overcome this obstacle. Here, we give an overview of the challenges to include gap junctions into a distributed simulation scheme for neuronal networks and present an implementation of the new technology available in the NEural Simulation Tool (NEST 2.10.0). Subsequently we introduce the usage of gap junctions in model scripts as well as benchmarks assessing the performance and overhead of the technology on the supercomputers JUQUEEN and K computer

    Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers

    No full text
    State-of-the-art software tools for neuronal network simulations scale to the largest computing systems available today and enable investigations of large-scale networks of up to 10 % of the human cortex at a resolution of individual neurons and synapses. Due to an upper limit on the number of incoming connections of a single neuron, network connectivity becomes extremely sparse at this scale. To manage computational costs, simulation software ultimately targeting the brain scale needs to fully exploit this sparsity. Here we present a two-tier connection infrastructure and a framework for directed communication among compute nodes accounting for the sparsity of brain-scale networks. We demonstrate the feasibility of this approach by implementing the technology in the NEST simulation code and we investigate its performance in different scaling scenarios of typical network simulations. Our results show that the new data structures and communication scheme prepare the simulation kernel for post-petascale high-performance computing facilities without sacrificing performance in smaller systems
    corecore