Search CORE

22 research outputs found

Routing Brain Traffic Through the Von Neumann Bottleneck: Parallel Sorting and Refactoring.

Author: Diesmann Markus
Jordan Jakob
Kitayama Itaru
Kunkel Susanne
Pronold Jari
Wylie Brian J N
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2022
Field of study

Generic simulation code for spiking neuronal networks spends the major part of the time in the phase where spikes have arrived at a compute node and need to be delivered to their target neurons. These spikes were emitted over the last interval between communication steps by source neurons distributed across many compute nodes and are inherently irregular and unsorted with respect to their targets. For finding those targets, the spikes need to be dispatched to a three-dimensional data structure with decisions on target thread and synapse type to be made on the way. With growing network size, a compute node receives spikes from an increasing number of different source neurons until in the limit each synapse on the compute node has a unique source. Here, we show analytically how this sparsity emerges over the practically relevant range of network sizes from a hundred thousand to a billion neurons. By profiling a production code we investigate opportunities for algorithmic changes to avoid indirections and branching. Every thread hosts an equal share of the neurons on a compute node. In the original algorithm, all threads search through all spikes to pick out the relevant ones. With increasing network size, the fraction of hits remains invariant but the absolute number of rejections grows. Our new alternative algorithm equally divides the spikes among the threads and immediately sorts them in parallel according to target thread and synapse type. After this, every thread completes delivery solely of the section of spikes for its own neurons. Independent of the number of threads, all spikes are looked at only two times. The new algorithm halves the number of instructions in spike delivery which leads to a reduction of simulation time of up to 40 %. Thus, spike delivery is a fully parallelizable process with a single synchronization point and thereby well suited for many-core systems. Our analysis indicates that further progress requires a reduction of the latency that the instructions experience in accessing memory. The study provides the foundation for the exploration of methods of latency hiding like software pipelining and software-induced prefetching

arXiv.org e-Print Archive

PubMed Central

Juelich Shared Electronic Resources

Bern Open Repository and Information System (BORIS)

Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers

Author: Diesmann Markus
Jordan Jakob
Kitayama Itaru
Kunkel Susanne
Pronold Jari
Wylie Brian J. N.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Simulation is a third pillar next to experiment and theory in the study of complex dynamic systems such as biological neural networks. Contemporary brain-scale networks correspond to directed graphs of a few million nodes, each with an in-degree and out-degree of several thousands of edges, where nodes and edges correspond to the fundamental biological units, neurons and synapses, respectively. When considering a random graph, each node's edges are distributed across thousands of parallel processes. The activity in neuronal networks is also sparse. Each neuron occasionally transmits a brief signal, called spike, via its outgoing synapses to the corresponding target neurons. This spatial and temporal sparsity represents an inherent bottleneck for simulations on conventional computers: Fundamentally irregular memory-access patterns cause poor cache utilization. Using an established neuronal network simulation code as a reference implementation, we investigate how common techniques to recover cache performance such as software-induced prefetching and software pipelining can benefit a real-world application. The algorithmic changes reduce simulation time by up to 50%. The study exemplifies that many-core systems assigned with an intrinsically parallel computational problem can overcome the von Neumann bottleneck of conventional computer architectures

arXiv.org e-Print Archive

Juelich Shared Electronic Resources

Source-based nomenclature for single-strand homopolymers and copolymers (IUPAC Recommendations 2016)

Author: Hellwich Karl-Heinz
Hess Michael
Jenkins Aubrey D
Jones Richard G
Kahovec Jaroslav
Kitayama Tatsuki
Kratochvil Pavel
Mita Itaru
Mormann Werner
Ober Christopher
Penczek Stanislaw
Stepto Robert
Thurlow Kevin
Vohlídal Jiří
Wilks Edward
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/11/2016
Field of study

IUPAC recommendations on source-based nomenclature for single-strand polymers have so far addressed its application mainly to copolymers, non-linear polymers and polymer assemblies, and within generic source-based nomenclature of polymers. In this document, rules are formulated for devising a satisfactory source-based name for a polymer, whether homopolymer or copolymer, which are as clear and rigorous as possible. Thus, the source-based system for naming polymers is presented in a totality that serves as a user-friendly alternative to the structure-based system of polymer nomenclature. In addition, because of their widespread and established use, recommendations for the use of traditional names of polymers are also elaborated

Crossref

Kent Academic Repository

Execution Performance Analysis of the ABySS Genome Sequence Assembler using Scalasca on the K computer

Author: Kitayama Itaru
Maeda Toshiyuki
Wylie Brian J. N.
Publication venue
Publication date: 01/01/2015
Field of study

Performance analysis of the ABySS genome sequence assembler (ABYSS-P) executing on the K computer with up to 8192 compute nodes is described which identified issues that limited scalability to less than 1024 compute nodes and required prohibitive message buffer memory with 16384 or more compute nodes. The open-source Scalasca toolset was employed to analyse executions, revealing the impact of massive amounts of MPI point-to-point communication used particularly for master/worker process coordination, and inefficient parallel file operations that manifest as waiting time at later MPI collective synchronisations and communications. Initial remediation via use of collective communication operations and alternate strategies for parallel file handling show large performance and scalability improvements, with partial executions validated on the full 82,944 compute nodes of the K computer

Juelich Shared Electronic Resources

Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers

Author: Diesmann Markus
Jordan Jakob
Kitayama Itaru
Kunkel Susanne
Pronold Jari
Wylie Brian J. N.
Publication venue: arXiv
Publication date: 01/01/2021
Field of study

Juelich Shared Electronic Resources

Corrigendum: Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers

Author: Itaru Kitayama
Jakob Jordan
Jun Igarashi
Markus Diesmann
Mitsuhisa Sato
Moritz Helias
Susanne Kunkel
Tammo Ippen
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

Frontiers - Publisher Connector

Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers

Author: Diesmann Markus
Helias Moritz
Igarashi Jun
Ippen Tammo
Jordan Jakob
Kitayama Itaru
Kunkel Susanne
Sato Mitsuhisa
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

State-of-the-art software tools for neuronal network simulations scale to the largest computing systems available today and enable investigations of large-scale networks of up to 10 % of the human cortex at a resolution of individual neurons and synapses. Due to an upper limit on the number of incoming connections of a single neuron, network connectivity becomes extremely sparse at this scale. To manage computational costs, simulation software ultimately targeting the brain scale needs to fully exploit this sparsity. Here we present a two-tier connection infrastructure and a framework for directed communication among compute nodes accounting for the sparsity of brain-scale networks. We demonstrate the feasibility of this approach by implementing the technology in the NEST simulation code and we investigate its performance in different scaling scenarios of typical network simulations. Our results show that the new data structures and communication scheme prepare the simulation kernel for post-petascale high-performance computing facilities without sacrificing performance in smaller systems

Brage NMBU

Crossref

Frontiers - Publisher Connector

Publikationsserver der RWTH Aachen University

Juelich Shared Electronic Resources

NORA - Norwegian Open Research Archives

Corrigendum: Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers

Author: Diesmann Markus
Helias Moritz
Igarashi Jun
Ippen Tammo
Jordan Jakob
Kitayama Itaru
Kunkel Susanne
Sato Mitsuhisa
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

Including Gap Junctions into Distributed Neuronal Network Simulations

Author: Bolten Matthias
Diesmann Markus
Frommer Andreas
Hahne Jan
Helias Moritz
Igarashi Jun
Kitayama Itaru
Kunkel Susanne
Wylie Brian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Contemporary simulation technology for neuronal networks enables the simulation of brain-scale networks using neuron models with a single or a few compartments. However, distributed simulations at full cell density are still lacking the electrical coupling between cells via so called gap junctions. This is due to the absence of efficient algorithms to simulate gap junctions on large parallel computers. The difficulty is that gap junctions require an instantaneous interaction between the coupled neurons, whereas the efficiency of simulation codes for spiking neurons relies on delayed communication. In a recent paper [15] we describe a technology to overcome this obstacle. Here, we give an overview of the challenges to include gap junctions into a distributed simulation scheme for neuronal networks and present an implementation of the new technology available in the NEural Simulation Tool (NEST 2.10.0). Subsequently we introduce the usage of gap junctions in model scripts as well as benchmarks assessing the performance and overhead of the technology on the supercomputers JUQUEEN and K computer

Crossref

Juelich Shared Electronic Resources

Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers

Author: Itaru Kitayama
Jakob Jordan
Jun Igarashi
Markus Diesmann
Markus Diesmann
Markus Diesmann
Mitsuhisa Sato
Moritz Helias
Moritz Helias
Susanne Kunkel
Susanne Kunkel
Tammo Ippen
Tammo Ippen
Publication venue: 'Frontiers Media SA'
Publication date: 01/02/2018
Field of study

Directory of Open Access Journals