Inference with Artificial Neural Networks on Analog Neuromorphic
  Hardware by Weis, Johannes et al.
Inference with Artificial Neural Networks on the
Analog BrainScaleS-2 Hardware
Johannes Weis, Philipp Spilger, Sebastian Billaudelle, Yannik Stradmann, Arne
Emmel, Eric Müller, Oliver Breitwieser, Andreas Grübl, Joscha Ilmberger,
Vitali Karasenko, Mitja Kleider, Christian Mauch, Korbinian Schreiber, and
Johannes Schemmel
Kirchhoff-Institute for Physics
Ruprecht-Karls-Universität Heidelberg, Germany
johannes.weis@kip.uni-heidelberg.de
Abstract. The neuromorphic BrainScaleS-2 ASIC comprises mixed-sig-
nal neurons and synapse circuits as well as two versatile digital micro-
processors. Primarily designed to emulate spiking neural networks, the
system can also operate in a vector-matrix multiplication and accumu-
lation mode for artificial neural networks. Analog multiplication is then
carried out in the synapse circuits, while the results are accumulated on
the neurons’ membrane capacitors. Designed as an analog, in-memory
computing device, it promises high energy efficiency. Fixed-pattern noise
and trial-to-trial variations, however, require the implemented networks
to cope with a certain level of perturbations. Further limitations are
imposed by the digital resolution of the input values (5-bit), matrix
weights (6-bit) and resulting neuron activations (8-bit). In this paper, we
discuss BrainScaleS-2 as an analog inference accelerator and present cal-
ibration as well as optimization strategies, highlighting the advantages
of training with hardware in the loop. Among other benchmarks, we
classify the MNIST handwritten digits dataset using a two-dimensional
convolution and two dense layers. We reach 98.0% test accuracy, closely
matching the performance of the same network evaluated in software.
Keywords: Analog Accelerator · Neural Network Processor · Neuro-
morphic Hardware · Convolutional Neural Networks · Machine Learning
· In-memory Computing · MNIST
1 Introduction
Artificial neural networks (ANN) find application in a wide variety of fields
and problems. With networks growing in depth and complexity, the increase of
computational cost becomes more and more significant [1]. In fact, execution time
and power consumption often represent the crucial limiting factors in further
scaling and in the application of ANNs [2].
A large fraction of the computational cost for neural network-based inference
is spent on vector-matrix multiplications [3]. With their massive parallelization
ar
X
iv
:2
00
6.
13
17
7v
1 
 [c
s.N
E]
  2
3 J
un
 20
20
2 J. Weis et al.
CADC
PPU
signed
synapse
CADC
PPU
signed
synapse
A B
Fig. 1: Overview of the BrainScaleS-2 system. A: Block diagram of the analog
core, showing synapse drivers (triangles), neurons (large circles), and synapses
(small circles in matrix). Signed weights are achieved by using two synapse rows
for positive and negative weights. Figure adopted from [12]. B: Chip photograph.
of floating point calculations, GPUs already cut runtime significantly compared
to CPUs. Computational complexity can often be cut by representing and pro-
cessing data with reduced precision [4]. Specialized digital inference accelerators
have been presented [5, 6] that offer further efficiency improvements over imple-
mentations on general-purpose hardware. Potentially even more efficient ASICs
could be based on mixed-signal circuit designs by exploiting physical processes
for computational purposes [7]. Drawbacks of such systems can include vulnera-
bility to fixed-pattern and trial-to-trial variations, resulting in distorted network
configurations and reduced reproducibility. Similar to digital solutions with re-
duced precision, networks have to cope with limited weight resolution, which can
be as low as one bit [8].
In this work, we demonstrate BrainScaleS-2 [9] as an analog inference accel-
erator. We describe the hardware configuration and operating principle of analog
vector-matrix multiplication on the ASIC and benchmark the system’s perfor-
mance by training and classifying the MNIST dataset of handwritten digits [10].
We further discuss calibration and the benefits of training with hardware in the
loop as strategies to counter chip-specific fixed-pattern variations [11, 12].
2 Methods
BrainScaleS-2 is a mixed-signal ASIC fabricated in a 65 nm CMOS process by
TSMC that has originally been designed as an accelerator for biologically plau-
sible spiking neural networks. It features analog circuits emulating neurons and
synapses as well as digital periphery for communication, parameter storage, and
realtime control. Recent additions to the system allow for in-memory compu-
tation of multiply-accumulate operations within the chip’s analog core, thereby
making the system applicable for inference with artificial neural networks [9].
Matrix multiplication can also be combined with spiking operation, thus seam-
Inference with Artificial Neural Networks on the Analog BSS-2 Hardware 3
31
5
20
63
t
I
t
I
- +
t
V
0 0
A
0 5 10 15 20
Time [us]
0.5
0.6
V
ol
ta
ge
[V
]
integration decayB
Fig. 2: A: Illustration of a multiply-accumulate operation. The vector value con-
trols the length of the current pulses, the matrix weight their amplitude. The
currents are integrated on the neuron’s membrane. B: The recorded membrane
trace clearly shows the integration phase where the synaptic inputs are inte-
grated, after which the result is digitized (dotted line). Afterwards, the voltage
decays exponentially towards the resting potential.
less integration of a partially spiking network is possible. The chip contains 512
analog neurons arranged in two hemispheres, each neuron receives input from
256 synapses. Thus, BrainScaleS-2 can be used to multiply a vector with 256
entries to a matrix comprising 512 rows.An architectural overview of the cir-
cuitry for processing vector-matrix multiplications on BrainScaleS-2 is depicted
in fig. 1. Digitally encoded input vectors are injected from the left, converted to
the analog domain and multiplied within the central synapse array. Each neuron
accumulates values from its corresponding synapse column. The resulting vec-
tor of neuron activations is read out in parallel via a columnar analog-to-digital
converter (CADC).
Multiplication in analog synapse circuits. Within the synapse array, the multi-
plication of an input value with the synaptic weight is modelled as the electrical
chargeQ = I ·∆t emitted during a current pulse of variable length and amplitude.
The current I is determined by a 6-bit weight stored locally in each synapse. The
time window ∆t during which that current is emitted is modulated by circuitry
in the synapse drivers (triangles on the left in fig. 1). The value is set by the
payload of input events, which is otherwise used to select a subset of synapses
from a row. More specifically, we use 5 bit of this label to encode the pulse length
∆t. The remaining label bit is still available to differentiate two sets of synapses,
therefore two different multiplications can be executed side by side.
Each row of synapses can be connected to the afferent neurons with either
positive or negative sign. To achieve signed weights, two rows of synapses can be
combined to represent a single logical row. This configuration, however, reduces
the number of available vector entries from 256 to 128.
4 J. Weis et al.
set up matrix
reset neurons send inputs read activations
next vector
resends
Fig. 3: Pattern for executing multiply-accumulate operations between a matrix
and a batch of vectors.
Neurons integrate synaptic currents. Each neuron uses its membrane to accu-
mulate the individual multiplication results from its respective synaptic column.
They integrate the positive and negative charge contributions, as sketched in
fig. 2A. Motivated by spiking operation, the input signals are low-pass filtered
with finite time constant. To speed up integration and for reduction of synaptic
input saturation, the minimum time constant of approximately 1µs was config-
ured.
The neurons’ dynamics are based on a leaky integrator model commonly
used for spiking networks :A resistor continuously pulls the membrane voltage,
which is physically represented across a capacitor, towards a resting potential.
The resulting dynamics constitute a crucial part for the emulation of spiking
networks. In contrast, they can lead to distortions in the accumulation of vector
matrix multiplication results which do not contain implicit timing information.
In order to stabilize the accumulated voltages and reduce the effect of noise, we
configured a rather large but finite resistance. The leak resistance leads to an
exponential decay of the integrated charge, which can be seen in the right plot
of fig. 2B.
Digitization of results. The membrane potentials are digitized in parallel for all
256 neurons on a chip hemisphere, using the CADCs and stored via the on-chip
microprocessors. The resulting 8-bit values represent the neuron activations and
are the result of the multiply-accumulate (MAC) operation.
By aligning the choice of the resting potential with the dynamic range of the
ADC, two operating modes can be selected: In case the lower end of the ADC
range coincides with the resting potential, negative activations are cut off. In this
configuration, the neurons behave as hardware rectified linear units (ReLU).
In case the number of inputs a neuron receives can not be directly mapped to
the synapse matrix, the network can be partitioned into smaller matrices which
are evaluated in a time multiplexed fashion [13]. Since the activation function
needs to be applied after combining the individual results, negative activations
must be representable. For this purpose, the resting potential can be chosen
centered in the ADCs’ dynamic ranges.
2.1 Structure of a multiply-accumulate operation
To compile a MAC operation from the elements outlined above, the sequence
shown in fig. 3 can be applied.
Inference with Artificial Neural Networks on the Analog BSS-2 Hardware 5
1. To begin with, the weight matrix is written to the synapses. Writing all
256×512 values takes about 5ms. Batched execution can minimize the amount
of expensive reconfiguration.
2. A reset of the membrane potentials removes any previous state accumulated
by the neurons. An immediate read of the voltages establishes a baseline
activation to suppress low frequency noise. Resetting all neurons takes ap-
proximately 1 µs.
3. The inputs are sent sequentially to the chip. Between events, wait times
of 8–200 ns are inserted. These mitigate saturation effects in the neurons’
synaptic inputs, which can occur in case multiple inputs of large amplitude
are sent in a short period of time. To improve the signal-to-noise ratio, the
activations on the membranes can be increased by incorporating resends of
the input vectors within a single integration phase. Wait times as well as the
number of resends must be optimized considering the neurons’ decay times,
which limit the maximum integration time. Alternatively, the membrane
capacitance can be reduced, yielding higher activations, but also shorter
decay time constants. Zero suppressionreduces the overall runtime, especially
in conjunction with ReLU activation functions.
4. The activations are digitized after the accumulation of charges. Considering
the finite time constant of the synaptic inputs, a waiting period of 2µs is
inserted for the membrane potential to settle. The ADC conversion takes
1.5µs.
2.2 Calibration
Transistor-level mismatch in the manufacturing process of an ASIC leads to in-
homogeneous electrical properties of the fabricated circuits. Due to the analog
nature of BrainScaleS-2, the resulting fixed-pattern variations cause each neuron
and synapse to behave differently when presented with similar input. Without
calibration, networks sensitive to such perturbations can not perform up to their
full potential on the analog substrate, as weights and activations would be dis-
torted due to fixed-pattern noise. BrainScaleS-2 therefore provides a substantial
amount of digital parameters that target equalization of all computational units
through calibration. While training with hardware in the loop can substitute
calibration for some parameters, the main goal is to match the dynamic ranges
of different components.
The neuron circuits require a certain degree of calibration. The operating
point of the individual components is determined by a set of internal parameters
and references. Most importantly, the synaptic currents need to be equalized
across neurons. Histograms of amplitudes observed on all neurons, before and
after calibration, are shown in the left part of fig. 4. This calibration, along
with finding other technical parameters, ensures comparable activations across
neurons for equivalent stimuli.
Calibration is also applied to the pulse generation circuits in the synapse
drivers. The pulse widths are modulated according to the 5 bit vector entries.
The pulse widths are subject to offsets, some synapse drivers generate longer
6 J. Weis et al.
pulses than others. On the current chip generation, the available calibration
range does not suffice to equalize all pulses. Thus some additive mismatch of
the vector entries remains, which can not be fully compensated by tuning the
multiplicative weight matrix.
A collection of calibration routines for BSS-2 was developed, which allow
calibration of individual parameters. All code is based on the Python API of
Müller et al. [14]. Usage is not limited to matrix multiplication mode as the
same parameters are used during spiking operation. It is possible to provide one
calibrated set of parameters for typical matrix multiplication usage.
2.3 Training with hardware in the loop
Training on hardware allows for compensating most of the fixed-pattern noise
still present after calibration by tuning the synaptic weights accordingly [11, 12].
The results of inference on chip are used to calculate weight updates on the host
computer. Gradients are obtained assuming linearity of synaptic weights. Spilger
et al. [13] describe the implementation in more detail.
3 Results
3.1 Characterization
The performance of BSS-2’s matrix multiplication mode was first evaluated us-
ing a synthetic test matrix.In the left half, weights increase linearly from left to
right, all synapses in column i are set to weight w = i − 63. In the right half,
each synapse is set to a random weight, drawn uniformly from -63 to 63.Mul-
tiple homogeneous vectors of different amplitude were used to characterize the
linearity of both vector entries and matrix weights (fig. 4B).
For lower weights and inputs, the multiplication follows the expected linear
behavior. For higher activations, saturation occurs, which is most clearly ob-
servable for the vector with entries of 15. However, we expect most real-world
networks to use sparser matrices with more balanced excitatory and inhibitory
weights than this test.In the right part of the matrix, the randomly weighted
sum is slightly positive or negative for individual columns, but close to zero.
The vector entry only changes the absolute value of the result, not its sign: for
a column with weights chosen predominantly negative, stronger inputs further
decrease activations, and vice versa. We conclude that excitatory and inhibitory
inputs are added correctly and have been tuned to the same strength.
The presented measurements indicate that our substrate can be used to per-
form MAC operations. To investigate its performance on common benchmarks,
the MNIST dataset is used. Further, the Human Activity Recognition dataset
was brought to the chip in Spilger et al. [13].
Inference with Artificial Neural Networks on the Analog BSS-2 Hardware 7
50 100
Positive amps [LSB]
0
100
#
ne
ur
on
s
50 100
Negative amps [LSB]
0
200
#
ne
ur
on
s
A
−60−30 0 30 60
Matrix weight [LSB] (on individual neurons)
−100
−50
0
50
100
A
m
pl
it
ud
e
[L
SB
]
random weights
vector entry 0
vector entry 3
vector entry 7
vector entry 15
B
Fig. 4: A: Histogram of amplitudes received on all neurons for equal inputs.
After calibration (colored) the width of the distribution is decreased compared
to the uncalibrated state (gray). B: Characterization of analog multiplication
results, sweeping both matrix weights and vector entries. The left half of the
synapse matrix was configured such that weights increased from a value of -63
in the leftmost column to +63 in the 127th column with an increment of one.
Within each column, all weights were set to the same value. The right half was
set to random weights for each synapse. We injected four constant input vectors,
each consisting of 128 entries of 0, 3, 7 and 15. Error bars indicate the standard
deviation within 30 runs.
3.2 MNIST benchmark
Models The MNIST dataset of handwritten digits [10] was classified using two
models. All layers but the last use ReLU activation functions, the final layer uses
a softmax. No layer uses a bias.
The convolutional model. The larger model starts with a zero-padding of
1pixel to make the image shape (30, 30). It uses a 2d convolution with 20 filters,
a (10, 10) kernel, and strides (5, 5). The 500 results are fed into a dense layer
with 128 neurons, another dense layer with 10 neurons yields the final result.
The dense model. The smaller model only uses two dense layers. The first
one maps all 784 MNIST pixel inputs to 64 neurons. The second layer uses 10
neurons to yield the classification result.
Accuracy Both models were trained in TensorFlow [15] using 32-bit float
weights. While transforming them to 6-bit integer weights did impact the perfor-
mance, the quantized performance on CPU was still significantly higher than on
the chip (table 1). Training with hardware in the loop helped to restore accuracy
which dropped due to temporal noise and remaining fixed pattern noise after cal-
8 J. Weis et al.
Table 1: MNIST classification accuracy in percent for two models in different
conditions. The networks were trained in software using 32-bit float weights.
Further digitization using signed 6-bit integer weights has little impact on the
performance. Taking the network to the chip, the performance drops, but can
be increased again when training with hardware in the loop.
software hardware
32-bit 6-bit calibration trained
float int only in the loop
convolutional model 98.29 98.10 92.13 98.01
dense model 97.43 97.36 92.46 96.30
0 1 2 3 4 5 6 7 8 9
Obtained label
0
1
2
3
4
5
6
7
8
9
T
ru
e
la
be
l
0 1 2 3 4 5 6 7 8 9
Obtained label
0
1
2
3
4
5
6
7
8
9 100
101
102
103
Fig. 5: Confusion matrix of the dense network running MNIST. Left: Executed
pre-trained model on hardware, no re-training. Right: Results after training one
epoch with hardware in the loop. Note the logarithmic colorbar in both plots.
ibration. Simply loading the weight matrix obtained from software yielded an
accuracy of 92.1% for the convolutional model, which refers to the held-out test
set of 10 000 images. Retraining on hardware increased accuracy to 98.0%, just
below software level.
The difference between software and hardware accuracy is greater for the
dense network. We expect that with the smaller number of synapses involved,
the potential for correction is also lower as there is less redundancy. For the
dense network, confusion matrices are shown in fig. 5. Classification works for
all digits, no systematic misclassification is observable.
Energy consumption Fully utilizing the resources on hardware, vectors of up
to 256 entries can be multiplied with matrices of up to 512 rows.One vector-
matrix multiplication takes 5ms of time, which, at a power consumption of
0.3W [12], uses 1.5mJ of energy. This one operation can be split into up to four
independent smaller vector-matrix multiplications executed in parallel, utilizing
Inference with Artificial Neural Networks on the Analog BSS-2 Hardware 9
both synapse matrices and the label bits to handle four different vectors. Each
vector can still have 256 entries, but the combined number of matrix rows per
synapse matrix must not exceed 256.
The convolutional network’s first layer is comprised of 25 vector-matrix mul-
tiplications of 100 entries each.For this layer, parallel execution requires changes
during training, as the fixed pattern noise can not be learned using the same
weights in all instances of the convolution matrix. The dense layers require 5
further vector-matrix multiplication operations, but can parallelize four of them
without special adjustments. Thus, 9 vector-matrix multiplication are required
per image, resulting in 45ms of runtime and an energy consumption of 13.5mJ
per image.
The dense network again allows for parallelization. This results in 3 vector-
matrix multiplications per image or 15ms of runtime. The energy consumption
is therefore 4.5mJ per image. Note that the presented networks have not been
optimized for power consumption.
These numbers are far off the optimum performance of BSS-2: a bug in the
current chip revision requires reconfiguring the synapse matrix for each input
vector, not only when changing weights. Runtime and therefore energy con-
sumption are expected to be reduced by two to three orders of magnitude for
the next chip revision. The exact numbers depend on input repetitions, which
could still be necessary to counter noise. Currently, we achieve the best results
at two vector sends and reduced membrane capacitance.
Calibration vs. learning Training with hardware in the loop compensates
fixed pattern noise by tuning weights accordingly. Therefore, even a less well
calibrated chip can reach a high accuracy. However, the weight resolution for
the network is decreased, as some range is spent to replace calibration. We
detuned the synaptic input amplitudes and the leak conductivity and observed
performance before and after training in the loop. While all technical parameters
were still calibrated, the distribution shown in fig. 4A was slowly shifted from
the narrow to the broad one.
We observed that the network adapts well to the changed conditions within
one epoch of training (fig. 6). While the results before training got worse as ex-
pected, the accuracy after training only dropped from 96.31% to 96.07%. With
the uncalibrated synaptic input amplitudes differing by up to a factor of 4, the
effective weight resolution was decreased by 2 bit. Decreasing the weight resolu-
tion to 4 bit in software reduced accuracy from 97.36% to 96.99%. Considering
we only use a subset of the neurons shown in fig. 4A, the actually used distri-
bution of amplitudes is a bit more narrow, making the results coincide with the
expectations.
4 Discussion
In this publication, we have shown that BrainScaleS-2 can be successfully used
in matrix multiplication mode to perform machine learning on analog hardware.
10 J. Weis et al.
0.0 0.5 1.0
Decalibration rate
0.80
0.85
0.90
0.95
1.00
T
es
t
ac
cu
ra
cy
before
after
0.00 0.25 0.50 0.75 1.00
Epochs trained
0.80
0.85
0.90
0.95
1.00
T
ra
in
in
g
ac
cu
ra
cy
Fig. 6: MNIST accuracy when weakening calibration, using the dense network.
Left: Accuracy before and after training one epoch with hardware in the loop.
Results show the mean and standard deviation of 10 runs classifying the 10 000
test images with unchanged parameters. Right: Accuracy per batch during the
one epoch of training. 200 images per batch, 300 batches per epoch. Colors
indicate the state of calibration, corresponding to the left plot.
Practical usage requires calibration and training with hardware in the loop, with
the latter being able to compensate calibration imperfections as well. While
some technical parameters always need to be calibrated, the precision and hence
runtime of the calibration can often be reduced. This however requires more
training on hardware and further limits the available weight precision, as a part
of the dynamic range is used for replacing the calibration. Also, the results of
calibration are applicable for all networks, requiring only a single run per chip,
while the training on hardware needs to happen for each network.
Energy consumption per vector-matrix multiplication is nowhere near com-
petitive on the current chip generation. This is caused by the necessity to re-
configure the synapse matrix before each vector (section 3.2). Also, the network
performance is affected by the noise from the synaptic inputs. This problem is
currently mitigated by sending the input vector multiple time during the inte-
gration period to increase signal. All these major issues are addressed in the
upcoming chip revision, which should increase energy efficiency by a factor of
100–1000.
5 Contributions
J. Weis developed calibration routines, conducted the presented experiments and
evaluations and wrote the initial manuscript. P. Spilger is the main developer of
REFERENCES 11
the software extensions providing support for BrainScaleS’ non-spiking operation
mode. S. Billaudelle designed neuron and synapse driver circuits and contributed
to commissioning of the chip. Y. Stradmann contributed to hardware design and
commissioning and gave conceptual advice. A. Emmel contributed to experi-
ment code. E. Müller is the lead developer and architect of the BrainScaleS-2
software stack. C. Mauch and O. Breitwieser contributed to the software ar-
chitecture and implementation. A. Grübl was responsible for chip assembly and
implemented the digital front- and backend. J. Ilmberger contributed to host-side
communication infrastructure. V. Karasenko is the main developer of the FPGA
firmware and developed the communication infrastructure between FPGA and
ASIC. M. Kleider contributed to FPGA firmware development as well as ini-
tial commissioning of the system. K. Schreiber designed and implemented the
CADC and the physical ASIC test setup. J. Schemmel is the lead designer and
architect of the BrainScaleS-2 neuromorphic system. All authors discussed and
contributed to the manuscript.
Acknowledgments
The authors wish to thank all present and former members of the Electronic
Vision(s) research group contributing to the BrainScaleS-2 hardware platform,
software development as well as operation methodologies. We especially express
our gratefulness to the late Karlheinz Meier who initiated and led the project
for most if its time.
This work has received funding from the EU ([H2020/2014-2020]) under grant
agreements 720270 (HBP) and 785907 (HBP) as well as from the BMBF (16ES1127
(HD-BIO-AI)).
References
1. Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Nee-
lakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A.,
Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Win-
ter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J.,
Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D.: Language
Models are Few-Shot Learners. (2020). arXiv: 2005.14165 [cs.CL]
2. Schwartz, R., Dodge, J., Smith, N.A., and Etzioni, O.: Green AI. (2019). arXiv:
1907.10597 [cs.CY]
3. Oh, K.-S., and Jung, K.: GPU implementation of neural networks. Pattern Recog-
nition 37(6), 1311–1314 (2004)
4. Blott, M., Halder, L., Leeser, M., and Doyle, L.: QuTiBench: Benchmarking Neural
Networks on Heterogeneous Hardware. (2019). arXiv: 1909.05009 [cs.AR]
5. Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S.,
Bhatia, S., Boden, N., Borchers, A.: In-datacenter performance analysis of a tensor
processing unit. In: Proceedings of the 44th Annual International Symposium on
Computer Architecture, pp. 1–12 (2017)
12 REFERENCES
6. Yin, S., Ouyang, P., Tang, S., Tu, F., Li, X., Liu, L., and Wei, S.: A 1.06-to-
5.09 TOPS/W reconfigurable hybrid-neural-network processor for deep learning
applications. In: 2017 Symposium on VLSI Circuits, pp. C26–C27 (2017)
7. Boser: An Analog Neural Network Processor with Programmable Topology. IEEE
Journal of Solid-State Circuits 26, 2017–2025 (1991)
8. Yamaguchi, M., Iwamoto, G., Tamukoh, H., and Morie, T.: An Energy-efficient
Time-domain Analog VLSI Neural Network Processor Based on a Pulse-width
Modulation Approach. arXiv preprint (2019). arXiv: 1902.07707 [cs.ET]. https:
//arxiv.org/abs/1902.07707
9. Schemmel, J., Billaudelle, S., Dauer, P., and Weis, J.: Accelerated Analog Neuro-
morphic Computing. arXiv preprint (2020). arXiv: 2003.11996 [cs.NE]. https:
//arxiv.org/abs/2003.11996
10. LeCun, Y., and Cortes, C.: The MNIST database of handwritten digits, (1998).
1998.
11. Schmitt, S., Klähn, J., Bellec, G., Grübl, A., Güttler, M., Hartel, A., Hartmann,
S., Husmann, D., Husmann, K., Jeltsch, S., Karasenko, V., Kleider, M., Koke,
C., Kononov, A., Mauch, C., Müller, E., Müller, P., Partzsch, J., Petrovici, M.A.,
Vogginger, B., Schiefer, S., Scholze, S., Thanasoulis, V., Schemmel, J., Legenstein,
R., Maass, W., Mayr, C., and Meier, K.: Classification With Deep Neural Networks
on an Accelerated Analog Neuromorphic System. Proceedings of the 2017 IEEE
International Joint Conference on Neural Networks (2017). doi: 10.1109/IJCNN.
2017.7966125. http://ieeexplore.ieee.org/document/7966125/
12. Cramer, B., Billaudelle, S., Kanya, S., Leibfried, A., Grübl, A., Karasenko, V.,
Pehle, C., Schreiber, K., Stradmann, Y., Weis, J., Schemmel, J., and Zenke, F.:
Training spiking multi-layer networks with surrogate gradients on an analog neu-
romorphic substrate. arXiv preprint (2020). arXiv: 2006.07239 [cs.NE]. https:
//arxiv.org/abs/2006.07239
13. Spilger, P., Müller, E., Emmel, A., Leibfried, A., Mauch, C., Pehle, C., Weis, J.,
Breitwieser, O., Billaudelle, S., Schmitt, S., Wunderlich, T.C., Stradmann, Y.,
and Schemmel, J.: hxtorch: PyTorch for ANNs on BrainScaleS-2. In: Proceed-
ings of the Workshop on IoT, Edge, and Mobile for Embedded Machine Learning
(ITEM)/ECML-PKDD 2020 (submitted) (2020)
14. Müller, E., Mauch, C., Spilger, P., Breitwieser, O.J., Klähn, J., Stöckel, D., Wun-
derlich, T., and Schemmel, J.: Extending BrainScaleS OS for BrainScaleS-2. arXiv
preprint (2020). arXiv: 2003.13750 [cs.NE]. http://arxiv.org/abs/2003.13750
15. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.,
Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving,
G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané,
D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner,
B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas,
F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X.:
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems,
(2015). http://download.tensorflow.org/paper/whitepaper2015.pdf. 2015.
