The Computational Cost of Asynchronous Neural Communication by Hitron, Yael et al.
The Computational Cost of Asynchronous Neural
Communication
Yael Hitron
Weizmann Institute of Science, Rehovot, Israel
yael.hitron@weizmann.ac.il
Merav Parter
Weizmann Institute of Science, Rehovot, Israel
merav.parter@weizmann.ac.il
Gur Perri
Weizmann Institute of Science, Rehovot, Israel
gur.perri@weizmann.ac.il
Abstract
Biological neural computation is inherently asynchronous due to large variations in neuronal spike
timing and transmission delays. So-far, most theoretical work on neural networks assumes the
synchronous setting where neurons fire simultaneously in discrete rounds. In this work we aim at
understanding the barriers of asynchronous neural computation from an algorithmic perspective. We
consider an extension of the widely studied model of synchronized spiking neurons [Maass, Neural
Networks 97] to the asynchronous setting by taking into account edge and node delays.
Edge Delays: We define an asynchronous model for spiking neurons in which the latency values
(i.e., transmission delays) of non self-loop edges vary adversarially over time. This extends the
recent work of [Hitron and Parter, ESA’19] in which the latency values are restricted to be fixed
over time. Our first contribution is an impossibility result that implies that the assumption that
self-loop edges have no delays (as assumed in Hitron and Parter) is indeed necessary. Interestingly,
in real biological networks self-loop edges (a.k.a. autapse) are indeed free of delays, and the
latter has been noted by neuroscientists to be crucial for network synchronization.
To capture the computational challenges in this setting, we first consider the implementation of
a single NOT gate. This simple function already captures the fundamental difficulties in the
asynchronous setting. Our key technical results are space and time upper and lower bounds
for the NOT function, our time bounds are tight. In the spirit of the distributed synchronizers
[Awerbuch and Peleg, FOCS’90] and following [Hitron and Parter, ESA’19], we then provide a
general synchronizer machinery. Our construction is very modular and it is based on efficient
circuit implementation of threshold gates. The complexity of our scheme is measured by the
overhead in the number of neurons and the computation time, both are shown to be polynomial
in the largest latency value, and the largest incoming degree ∆ of the original network.
Node Delays: We introduce the study of asynchronous communication due to variations in
the response rates of the neurons in the network. In real brain networks, the round duration
varies between different neurons in the network. Our key result is a simulation methodology
that allows one to transform the above mentioned synchronized solution under edge delays into
a synchronized under node delays while incurring a small overhead w.r.t space and time.
2012 ACM Subject Classification Theory of computation → Design and analysis of algorithms
Keywords and phrases asynchronous communication, asynchronous computation, spiking neurons,
synchronizers
Digital Object Identifier 10.4230/LIPIcs.ITCS.2020.48
Acknowledgements We are thankful to Roei Tell and Gil Cohen for helpful discussions on Boolean
Circuits. We also thank Yoram Moses for referring us to related work on asynchronous digital
circuits and discussing the connections between the two settings.
© Yael Hitron, Merav Parter, and Gur Perri;
licensed under Creative Commons License CC-BY
11th Innovations in Theoretical Computer Science Conference (ITCS 2020).
Editor: Thomas Vidick; Article No. 48; pp. 48:1–48:47
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
48:2 The Computational Cost of Asynchronous Neural Communication
1 Introduction
Understanding how the brain works, as a computational device, is a central challenge of
modern neuroscience, artificial intelligence, and lately also in theoretical computer science
and distributed computing [18, 19, 20, 17, 16, 32, 6, 39, 37]. This line of work usually
assumes a simple synchronized model [23, 24] in which neurons fire simultaneously in discrete
rounds in response to their neighboring neurons that fired in the previous round. This model,
while being very convenient for algorithm design, does not take into account the inherent
asynchronous nature of neural communication. In the neuroscience literature it has been
noted that the asynchronous nature of these networks mostly stems from two independent
sources [29]: edge delays (known as response latency1.) [35, 5] and node delays (known as
refractory period) [34, 3]. In this paper, we aim at understanding the computational cost
incurred by such asynchronous communication. The overhead is measured by the overhead
in the number of neurons and computation time required to compute a certain function.
We believe that understanding the computational power, limitations and the connections
between these models go beyond the setting of spiking neurons, and might also be relevant
for the theory of digital logic design and circuit computation in general.
The Standard Synchronous Model [23, 24]. Before describing our asynchronous models,
we first revise the standard synchronous model formally defined by Maass. In this model, the
network evolves in discrete, synchronous rounds as a Markov chain where each neuron u in
the network is a probabilistic threshold gate with a threshold (or bias) value b(u). In every
round t, the firing probability of neuron u only depends on the firing status of its incoming
neighbors in the preceding round t− 1. Formally, the potential pott(u) of neuron u in round
t is defined by the weighted sum over its incoming neighbors that fired in round t− 1. The
neuron u fires in round t with probability that depends on the quantity pott(u)− b(u).
1.1 Asynchronous Computation with Bounded Edge Delays
We define an asynchronous model with edge delays bounded2 by some given integer L. The
dynamic of the network N is specified by a latency function ` : V ×V ×N→ N≤L interpreted
as follows: For every neuron u firing in round τ , its spike reaches its outgoing neighbor
v within `(u, v, τ) rounds where `(u, v, τ) ∈ [1, L] might be chosen adversarially for every
u 6= v, and every round τ . The network solution N should then output the desired solution
for any adversarial choice of the latency function `. Setting L = 1 yields the standard
synchronous model.
Asynchronous computation with edge delays was recently introduced by Hitron and
Parter [12]. Their model is similar to ours only that in their model, the edge latencies are
required to be fixed over time, whereas in our model the adversary is allowed to change it
from round to round. The model of Hitron and Parter includes an additional restriction
on the adversary (that sets the latency values) by requiring that self-spikes, i.e., of the
form 〈u, u, τ〉 have no latency and arrive within a single round to their destination. This
assumption is justified in [12] by the experimental evidence that self-spikes in brain networks
have almost no delays [14]. It is commonly believed in the neuroscience community that this
no-delay property of self-edges is in fact essential for network synchronization [33, 21, 40, 8].
1 Throughout, we use the terms edge-delay and latency interchangeably.
2 This bound is crucial as will be later implied by our lower bound results that depend on L. E.g., both
the computation time and the size of the network in this model must depend on L.
Y. Hitron, M. Parter, and G. Perri 48:3
In this work, we provide a theoretical support for this hypothesis by showing that without
such an assumption, one cannot even implement a single AND gate in this model. This
impossibility result holds already in a setting where L = 2, the edge latencies are fixed over
time (as assumed in [12]), and the computation time and network size are allowed to be
unbounded. For this reason, we will mostly consider in this paper nice latency functions in
which all self-spikes have a latency value of one round.
1.2 Asynchronous Computation with Bounded Node Delays
We next turn to consider an alternative source for asynchronous communication due to
variations in the response timing of the individual neurons in the network. In real brain
networks, for every neuron u there is a predefined time interval between two consecutive
spikes of u. The length of this interval, which we call round, varies considerably among
different neurons in the network [34, 3]. This poses the challenge of creating a synchronized
response at the network level. To account this behavior, we consider a model in which
network’s evolution proceeds in seconds. A second in this context is simply the smallest
measurable time unit. For a given integer T ≥ 1, the dynamics is specified by a node-
delay function t : V → N≤T interpreted as follows: the round duration of each neuron
v consists of t(v) seconds. Specifically, the ith round of v is defined by the time interval
Ri(v) = [(i− 1)t(v) + 1, i · t(v)] for every i ≥ 1. The neuron u fires in the second i · t(v) (i.e.,
at the end of its ith round) only if the total potential due to spikes arriving in the interval
Ri(v) is sufficiently large. The network solution N should then output the desired solution
for any adversarial choice of the node-delay function t. The input parameter T sets a bound
on the differences between the round duration over all neurons in the network. Setting T = 1
yields the standard synchronous model.
Observe that the edge and node delay models do not imply one another. In the edge
latency model, even though the spikes arrive in adversarially chosen rounds, all neurons in
round τ still depend only on the spikes arriving in round τ . Thus, the duration of a round for
all the neurons in the network is the same: a single tick (or a second) of the global clock. In
contrast, in the node delay model, the adversary selects the round duration for each neuron
which has two physical interpretations. First, it determines the time duration over which
the potential due to arriving spikes is accumulated. In addition, it also determines the time
interval between two consecutive spikes by the given neuron. In one of our most technically
involved results, we show a non-trivial reduction between the edge delay and the node delay
models, provided that the input network satisfies certain properties.
Finally, we note that this model has several variations which are in fact supported by
our simulation results (from edge delays to node delays). In particular, one can consider
a more elaborated setting in which the duration of each round per node varies in an
adversarial manner over time, i.e., the node-delay function in such a model is of the form
t : V × N≥1 → N≤T interpreted as follows: the ith round duration on each neuron v consists
of t(v, i) seconds. For example, in such a model the ith round of node u can consists of 2
seconds while its (i+1)th round might consist of 100 seconds. In another variation, both edge
and node delays are combined and the dynamic is specified by an L-bounded edge latency
function ` : V × V ×N≥1 → N≤L and a T -bounded node-delay function t : V ×N≥1 → N≤T .
1.3 Synchronizers
In the spirit of Awerbuch and Peleg’s synchronizers for distributed networks [2] and the
recent work of [12], our primary goal with respect to upper bound results is to provide a
general simulation methodology that takes any n-neuron network N that solves the problem
ITCS 2020
48:4 The Computational Cost of Asynchronous Neural Communication
in the standard synchronized setting (i.e., in which all spikes arrive within a single round)
and transforms it into an “analogous” network Sync(N ) in the edge delay setting while
incurring a small overhead in the number of neurons and the computation time (w.r.t the
base network N ).
For the setting in which the edge latencies are fixed over time and bounded by some integer
L, Hitron and Parter [12] showed such a simulation using their efficient constructions for
neural timers and counters. Their synchronized network solution Sync(N ) has O(n+L logL)
neurons and O(rL3) rounds where r is the computation time of the base network N . While
being quite efficient in terms of space and time, the synchronizers of [12] are heavily built up
on the strong assumption that the latency values of the edges are fixed over time. In this
paper, we aimed at understanding the edge latency setting in its most general form, and ask:
Can one provide a general synchronization scheme in a setting that allows the latency
of the network edges to vary in an adversarial manner in each round?
A priori it is not so clear if one can compute even basic Boolean functions without assuming
that the latency values are fixed. We answer this question in the affirmative by presenting
a modular syncronization scheme using a different approach than that taken in [12]. The
benefit of this approach is in its modularity. We start by understanding the implementation
of a single NOT gate in this model in terms of upper and lower bounds on the space and the
time of the computation. We then use this synchronized NOT solution as a building block in
the final synchronized network solution. Specifically, the synchronized NOT gates are used
to build synchronized circuits (and hence threshold gates) which in turn combined into a
whole synchronized network solution. The space and time overheads incurred by our solution
are polynomial in L (the bound on the latency) and ∆, the maximal incoming degree in the
base network N .
We next turn to consider synchronizers for the node delay model. Our approach is based
on showing a simulation result that takes any synchronized solution syncE(N , L) obtained by
the synchronizer in the L-bounded edge latency model, and transforms it into a synchronized
solution syncV (N , T ) that works in the T -bounded node delay model for L = Θ(T 2).
I Remark. It is noteworthy that in contrast to the distributed setting of Awerbuch and
Peleg [2] where the network size does not depend on the latencies, in the neural setting it is
not the case. As our lower bound constructions, both the computation time and the network
size must depend (in fact, polynomially) on the largest edge latency. For this reason, for
any practical purposes, the study of asynchronous communication, in general, must assume
bounded delays.
1.4 Our Results
We study the cost and limitations of asynchronous neural computation in a biologically
plausible yet simple model of spiking neural networks. Our main focus is in the edge latency
model where the dynamic is specified by an L-bounded function ` : V × V × N→ N≤L. The
node delay model is concerned only towards the end of the paper (Appendix C), as it is
handled via reduction to the edge delay setting. In the first part of the paper, we show
several negative results for the L-bounded model. This includes an impossibility results for
delay on self-loop edges, as well as size and time lower bound on an implementation of a NOT
gate in this model. In the second part, we consider the construction of synchronizers in this
generalized setting. We note that these constructions are self-contained and are technically
different from [13].
Y. Hitron, M. Parter, and G. Perri 48:5
Negative Results. We first show that without assuming a minimal latency value on the
self-loop edges, one cannot compute AND(x, y) given two Boolean inputs x and y, even
when the edge latency values are fixed over time and the largest latency is L = 2.
I Theorem 1 (Impossibility for Arbitrary Latency Functions). There exists no network that
computes AND(x, y) in a setting that allows the adversary to pick latencies in {1, 2} for all
edges in the network.
The proof goes by showing that for any given candidate network solution N , there exists
a bad latency function ` under which N fails to compute AND(x, y). This holds even when
the latencies are fixed over time. From that point on, we restrict attention to nice latency
functions, that assign latency value 1 to the self-loop edges in the network.
I Definition 2. A latency function is nice if `(u, u, τ) = 1 for every u ∈ V and round τ .
Our key technical contributions are lower bounds on the network size and the computation
time for computing the NOT (x) function in the L-bounded setting. Informally speaking, the
NOT (x) function appears to be a “complete” function for the purpose of synchronization in
this asynchronous model. Indeed, our NOT (x) implementation captures most of the essence
of the L-bounded model. To obtain the final synchronization scheme we mainly glue together
synchronized NOT units. For this reason, we spend much attention into understanding the
tightness of our constructions by providing nearly matching lower bound results.
I Lemma 3 (Size and Time Lower Bounds for Async. Computation of NOT (x)). Any network
that computes NOT (x) in the L-bounded asynchronous setting must use Ω(L) neurons and
Ω(L3) time (the time lower bound is tight).
This should be compared with the size lower bound of Ω(logL) shown by [13] for their
simplified asynchronous setting.
Positive Results. Our end result is a synchronizer that given any network N in the standard
synchronous setting and an integer L, computes a network syncE(N , L) that performs the
“same” computation as N in the L-bounded edge delay setting.
I Theorem 4 (Synchronizers for Edge Delays). There exists a synchronizer that given a
network N with n neurons, maximum in-degree ∆, and maximum edge latency L, constructs
a network syncE(N , L) that has an “analogous” execution in the L-bounded edge-delay setting
with a total number of O˜(L4 · poly(∆) · n) neurons and a time overhead3 of O˜(L5 · log ∆).
Although the construction is inspired by the work of Awerbuch and Peleg [2], the imple-
mentation is very different as our neurons, unlike processors in a distributed network, are
memoryless. Thus, they cannot aggregate the incoming messages as in [2]. Our construction
is also different than that of [13], as the latter crucially depends on having fixed latencies
over time.
For the node delay model, in Appendix C we show that given a network N in the standard
synchronous setting and an integer T , one can compute an analogous network syncV (N , T )
in the node-delay model with bounded node delay T by taking the following approach. Apply
the algorithm of Theorem 4 with N and L = Θ(T 2). This results with a network syncE(N , L)
3 The O˜ hides a factor of poly(log(n · r)), where r is the number of simulation rounds.
ITCS 2020
48:6 The Computational Cost of Asynchronous Neural Communication
that performs the same computation as N in the L-bounded edge delay model. The desired
network syncV (N , T ) is then obtained by multiplying the edge weights of a carefully defined
edge subset in syncE(N , L) by a factor of T . The quite delicate analysis then shows that
the network syncV (N , T ) indeed simulates the original network N upon any selection of
the node-delay function t : V → N≤T . By setting L = O(T 2) in Theorem 4, we show the
following for the node delay model:
I Theorem 5 (Synchronizers for Node Delays). There exists a synchronizer that given a
network N with n neurons, maximum in-degree ∆, and maximum node delay T , constructs a
network syncV (N , T ) that has an “analogous” execution in the T -bounded node-delay setting
with a total number of O˜(T 8 · poly(∆) · n) neurons and a time overhead of O˜(T 10 · log ∆).
We note that our preference to take a modular approach rather then an optimized one
inevitably leads to suboptimal space and time bounds in both Theorems 4 and 5. For example,
Theorem 5 is shown via a simulation result, which further deepens our understanding of the
connections between these models. We believe that by employing a more direct approach
for building synchronizers in the node-delay model, one should get a considerably improved
dependency in the delay bound T .
1.5 Our Approach in a Nutshell
We next provide the high level ideas for the key contributions. Throughout, unless stated
otherwise we consider the edge delay model where the dynamics is specified by an arbitrary
latency function.
Size and Time Lower Bound for Computing NOT (x). A network N with input neuron
x and an output neuron z computes the function NOT (x) in the asynchronous setting if
the following holds: when x = 0, the output z fires in at least one round regardless of the
latency function; and when x = 1 the output z never fires for any latency function. To show
a size lower bound of Ω(L) we take the following approach. First, we reduce any network
N that computes NOT (x) into a simpler and yet not larger network Nsimple. In the latter
network the only inhibitor is the input x which also has a self-loop of large positive weight,
and outgoing edges of very large negative weight to all the excitatory neurons in N . The
second part of the proof shows a lower bound for Nsimple using its specialized structure. We
will assume towards a contradiction that the in-degree of each neuron in Nsimple is less than
L and exhibit two conflicting latency functions `0, `1 that satisfy the following. If Nsimple
computes NOT (x) with `0 and with x = 0, then it must fail to compute NOT (x) with the
function `1 and x = 1. To compute these latency functions, we partition the simulation into
blocks T0, . . . , each containing L rounds. In each phase i, we set the latency values for all
the edges and all the rounds in block Ti. Throughout, we will keep the invariant that there
exists no neuron that fires when x = 0 and with `0, but does not fire when x = 1 and with
`1. By the correctness of the network, the output neuron must fire at least once when x = 0,
thus leading to the contradiction. In the very high level, the fact that the in-degree of each
vertex is small is used in order to spread over the at most L incoming spikes of each neuron
u in a balanced manner over the L rounds of the block. This will prevent the firing of a
neuron z when x = 0. Our lower bound is complemented by an upper bound of O(L2) as
described next.
Y. Hitron, M. Parter, and G. Perri 48:7
The Generalized Synchronization Scheme. The scheme is based on gradual steps.
Step I: Synchronization of NOT and OR Gates. We start by considering the asynchronous
computation of simple Boolean functions NOT (x) and OR(x1, . . . , x`) with a small number
of neurons. The key challenge is in implementing the NOT gate. When x = 1, the output
gate is required not to fire (i.e., output 0) throughout the entire execution. In contrast, when
x = 0 the output gate should fire at least once during the execution. The construction is
combinatorial and uses a similar logic to the lower bound arguments. It contains a collection
of L+ 1 neurons with outgoing edges to the output z. The above mentioned lower bound
result shows that the incoming degree of z must be at least L− 2.
Step II: Synchronization of a Boolean Circuit. Any Boolean circuit A can be implemented
by NOT and OR gates. To simulate the computation of A in the asynchronous setting, we
replace each gate gi in A by its synchronized implementation Sync(gi) constructed in Step (I).
For a gate gi in layer j with incoming gates gi,1, . . . , gi,k, the input to the sub-network Sync(gi)
are the output neurons of the sub-networks Sync(gi,1), . . . ,Sync(gi,k). The synchronization
between the layers of the circuit is governed by a directed chain of O(dL3) neurons. The
head of the chain fires in the first round of the simulation and activates the network. The
sub-networks Sync(gi) of gates gi in layer j are activated by the Θ(j ·L3) neuron in this chain.
These parameters are set so that we can be sure that the modules of layer j are activated,
only after the spikes from the output neurons of the previous layer have reached the input
of this layer. Overall the synchronized transformed network Sync(A) has O(d · L3 +m · L2)
neurons, where d is the depth of A and m is the number of gates. The overtime in the
computation is O(d · L4) rounds.
Step III: Synchronization of a Single (Probabilistic) Threshold Gate. To synchronize a
single deterministic threshold gate, we use the fact that a threshold gate with incoming degree
∆ can be implemented by a Boolean circuit with poly(∆) neurons and depth O(log ∆). This
allows us to use the synchronized construction of the previous step. Turning to probabilistic
threshold gates, here it is much less clear how to implement such a gate by a Boolean circuit.
We take the following approach. First, we use the fact from [20] that a spiking neuron4 u
with bias b(u) is equivalent to a deterministic neuron u′ whose bias is sampled from the
Logistic distribution with mean b(u). Therefore our key challenge is in sampling a value from
a given Logistic distribution. To do that, we use a collection of k (input-less) spiking neurons
each fires independently with probability half. These neurons provide us the random bits for
this process of sampling. In fact, these fair coins tosses allows one to sample a value almost
uniformly at random in the range [0, 2k]. We will then use the method of inverse transform
sampling to convert this almost-uniform sampled value to a value that is sampled from the
Logistic distribution up to a small error in the sampling. Using Taylor expansion of the
natural log function, we implement this Uniform to Logistic transformation by a collection of
simple arithmetic operations applied on a collection of Boolean neurons. The total error in
our sampling is set to be small enough so that the output distribution of the Boolean circuit
is almost indistinguishable from that of the probabilistic threshold gate.
Grand Finale: Synchronization of a Spiking Neural Network. Finally, given an SNN
network N of (probabilistic) threshold gates the synchronized network sync(N ) is obtained as
follows. Each threshold gate gi in N is replaced by its synchronized implementation Sync(gi).
4 The probabilistic threshold gates of SNN.
ITCS 2020
48:8 The Computational Cost of Asynchronous Neural Communication
NOT Gates
(Lemma 14, Sec. 4.1)
Boolean Circuits
(Lemma 15, Sec. 4.1)
Probabilistic Threshold Gates
(Lemmas 16 and 17, Sec. 4.2)
Spiking Networks
(Sec. 4.3)
Figure 1 A road-map for synchronizing spiking neural networks.
The key challenge is in synchronizing these modules so that every neuron v in N (i.e., not
only the output neuron) has an equivalent neuron v′ in sync(N ) that simulates v for any
possible latency function throughout the entire simulation. See Fig. 1 for an illustration.
From Edge Delays to Node Delays. Given a network N to be simulated and an integer
T , our goal is to build a synchronizer syncV (N , T ) that simulates N in the T -bounded
node-delay model. To do that, we first compute a network syncE(N , L) that simulates N
in the L-bounded edge-delay model for L = Θ(T 2). Then the output network syncV (N , T )
is obtained by dividing some of the edge weights in syncE(N , L) by a factor of T . The
correctness argument is based on showing that for every node-delay function t : V → N≤T ,
there exists an edge-delay function ` : V × V × N≥0 → N≤L such that the execution of the
network syncE(N , L) with the edge-delay function ` (in the edge-delay model) is similar to
the execution of the network syncV (N , T ) with the node-delay function t (in the node-delay
model). Since the network syncE(N , L) simulates the original network N for any edge-delay
function, it will imply that the network syncV (N , T ) simulates the original network N for
any node-delay function as desired.
Additional Related Work. Asynchronized communication in spiking neural networks has
been studied in several settings. Maass [22, 25] considered a quite elaborated model for
deterministic neural networks with arbitrary response functions for the edges, and a vector
firing times for all neuron. The approach of [22] mostly concerned the computational power
of this model upon choosing the best parameters for the network. I.e., showing feasibility
results for various functions. In contrast, in this work our goal is to bound the computation
time and the network size under this asynchronous setting. Khun et al. [15] studied the
asynchronous dynamics under the stochastic model of DeVille and Peskin [7].
Turning to the setting of logical circuits, there is a long line of work on the asynchronous
setting under various model assumptions [1, 11, 36, 4] that do not quite fit the memory-less
setting of spiking neurons. A more related work to our setting is by Martin, Manohar and
Moses [28, 26, 27] who studied the computational power of asynchronous digital circuits. In
particular, they characterize the necessary and sufficient conditions for a valid operation of a
given circuit in the asynchronous setting. For example, they showed that if all edges and
nodes suffer from an unbounded delay then the computational power of the circuit must be
very limited. The focus in our work is quite different. Instead of studying the computational
power of the asynchronous setting, we bound the computational overhead for solving concrete
problems.
2 The Synchronous and Asynchronous SNN Models
A deterministic neuron u is modeled by a deterministic threshold gate. Letting b(u) to be
the threshold value of u, then u outputs 1 if the weighted sum of its incoming neighbors
exceeds b(u). A spiking neuron is modeled by a probabilistic threshold gate which fires with
a sigmoidal probability that depends on the difference between its weighted incoming sum
and b(u).
Y. Hitron, M. Parter, and G. Perri 48:9
Neural Network Definition. A Neural Network (NN) N = 〈X,Z, Y,w, b〉 consists of n input
neurons X = {x1, . . . , xn}, m output neurons Y = {y1, . . . , ym}, and k auxiliary neurons
Z = {z1, ..., zk}. In spiking neural network (SNN), the neurons can be either deterministic
threshold gates or probabilistic threshold gates. The directed weighted synaptic connections
between V = X ∪ Z ∪ Y are described by the weight function w : V × V → R. A weight
w(u, v) = 0 indicates that a connection is not present between neurons u and v. Finally,
for any neuron v, the value b(v) ∈ R is the threshold value (activation bias). The in-degree
of every input neuron xi is zero, i.e., w(u, x) = 0 for all u ∈ V and x ∈ X. Additionally,
each neuron is either inhibitory or excitatory: if v is inhibitory, then w(v, u) ≤ 0 and if v is
excitatory, then w(v, u) ≥ 0 for every u. This restriction arises from the biological structure
of the neurons.
Network Dynamics in the Synchronous Setting. The network evolves in discrete, syn-
chronous rounds as a Markov chain. The firing probability of every neuron in round τ
depends on the firing status of its neighbors in round τ − 1, via a standard sigmoid function,
with details given below. For each neuron u, and each round τ ≥ 0, let στ (u) = 1 if u
fires (i.e., generates a spike) in round τ . Let σ0(u) denote the initial firing state of the
neuron. The firing state of each input neuron xj in each round is the input to the network.
For each non-input neuron u and every round τ ≥ 1, let pot(u, τ) denote the membrane
potential at round τ and p(u, τ) denote the firing probability (Pr[στ (u) = 1]), calculated
as pot(u, τ) =
∑
v∈V w(v, u) · στ−1(v) − b(u) and p(u, τ) = 11+e− pot(u,τ)/λ where λ > 0 is a
temperature parameter which determines the steepness of the sigmoid. Clearly, λ does not
affect the computational power of the network, thus we set λ = 1.
2.1 Network Dynamics in the Edge Delay Setting
The dynamic of the network is governed by a latency function ` : V ×V ×N→ N interpreted
as follows. For every directed edge e = (u, v) and round τ , a spike generated by u in round τ
arrived at v after `(u, v, τ) rounds. In the synchronous setting, `(u, v, τ) = 1 for every u, v, τ .
For every neuron v and round τ , let A(u, τ) = {(v, τ ′)| v ∈ V, τ ′ + `(v, u, τ ′) = τ} denote
all the spike events that if occur, arrive to u at round τ . The state of u in round τ is given by:
pot(u, τ) =
∑
(v,τ ′)∈A(v,τ)
w(v, u) · στ ′(v) − b(u) and στ (u) = 1 iff pot(v, τ) ≥ 0 . (1)
If u is a probabilistic threshold gate then it fires with probability p(u, τ) = 11+e− pot(u,τ) .
When `(u, v, τ) = `(u, v, τ ′) for every u, v and τ ′ 6= τ , we may omit τ and write `(u, v).
I Definition 6 (The L-bounded Edge-Delay Setting). Given is a network N and an integer
L. It is assumed the network contains a special neuron, the starter, that fires in the first
round of the simulation. The dynamic is determined by a latency function `. This function `
can be chosen arbitrarily among all L-bounded nice functions.
I Definition 7 (Computation of a Boolean Function in the L-bounded Edge-Delay Setting). Let
f : {0, 1}n → {0, 1}k be a Boolean function. A network N with n input neurons x1, . . . , xn
and k output neurons z1, . . . , zk computes f in this setting if for every nice L-bounded function
` and for every fixed possible assignment to the input neurons b1, . . . , bn the following holds:
(i) If fi(b1, . . . , bn) = 1, then there exists a round in which zi fires, where fi(·) is the ith bit in
the output of f . (ii) If fi(b1, . . . , bn) = 0 then zi does not fire throughout the entire execution.
Furthermore, the network N computes the function f in r rounds if N computes f , and for
every nice L-bounded function `, input bits b1, . . . , bn and index i such that fi(b1, . . . , bn) = 1,
zi fires in some round τ ≤ r.
ITCS 2020
48:10 The Computational Cost of Asynchronous Neural Communication
Note that by this definition, for a network N that computes a Boolean function f within
r rounds, to evaluate the output of the function it is sufficient to inspect the state of the
output bits over the first r rounds of the network’s simulation. Furthermore, as edge delays
are allowed to be chosen in an adversarial manner, one cannot hope for having all output
neurons to fire in the exact same around. One mechanism that we use to keep the output
neurons fire simultaneously is by using self-loop edges whose latency values is fixed to be 1.
Synchronizers. A synchronizer ν is an algorithm that gets as input a network N and
outputs a network N ′ = sync(N ) that contains all the neurons of N , plus additional auxiliary
neurons. One of the auxiliary neurons in N ′ is a starter neuron that fires in the first round of
the simulation. The network N ′ works in the asynchronous setting and should have similar
execution to N in the sense that for every neuron v ∈ V (N ), the firing pattern of v in the
asynchronous network should be similar to the one in the synchronous network. The output
network N ′ simulates each round of the network N by a phase.
I Definition 8 (Phases). We partition the execution of N ′ into phases 1, 2, . . ., using a
function r : V (N )× N→ N that defines the beginning of phase p, i.e. the pth phase is the
round interval [r(v, p), r(v, p+ 1)).
I Definition 9 (Similar Executions (Deterministic Networks)). The synchronous execution Π
of a deterministic network N is specified by a list of states Π = {σ1, . . . , } where each σi is
a binary vector describing the firing status of the neurons in round i. The asynchronous
execution of the network N ′ = syncE(N , L) with a latency function ` denoted by Π′(`) is
defined analogously only when applying the asynchronous dynamic of Eq. (1). The execution
Π′(`) is divided into phases according the a function r : V (N )× N→ N.
The network N and the pair 〈N ′, `〉 have a similar execution if V (N ) ⊆ V (N ′), and
in addition, a neuron v ∈ V (N ) fires in round p in the execution Π iff v fires during phase p
in Π′(`). The networks N and N ′ are similar if N and 〈N ′, `〉 have a similar execution for
every nice latency function `.
Note that specifically, if a synchronous network N computes a Boolean function f by round
r and N and N ′ are similar, then N ′ computes f by phase r. Therefore, if we know that
each phase is of at most q rounds, we get that N ′ computes f in r · q rounds.
Finally, we note that the extension for randomized networks with probabilistic gates is
quite straightforward if one simply fixed the random coins used by the neurons over the
simulation. That is, to be able to faithfully compare the simulation of two random networks,
one has to fix the random coins of both of the simulations to be the same. For this reason,
given an input randomized network N in the synchronized model, we maintain all the random
coins generated by the neurons of the network over the simulation. These random coins
are then fed to the network N ′ (i.e., obtained by applying our synchronizers). Since we
compare two randomized networks that use the same set of random coins, we can treat
these networks as deterministic. In Appendix C we provide the analogous definitions for the
T -bounded node-delay model. Throughout the main paper, we consider only the edge-delay
model and to avoid cumbersome notation that synchronized network solutions for this model
are denoted by sync(N ) (rather than syncE(N , L)).
3 Negative Results
Impossibility Result for Arbitrary Latency Functions. We start by considering Theorem 1,
and show that if the latency values are allowed to be set in an adversarial manner in {1, 2},
then there exists no network that computes the AND of two Boolean inputs. In Appendix A,
we show:
Y. Hitron, M. Parter, and G. Perri 48:11
I Lemma 10. Given input neurons x, y and an output neuron z, there is no network
computing AND(x, y) under every latency function ` : V × V → {1, 2}.
In the high level, we show that one can set the latency values such that all the spikes that
depend on the value of x (resp., y) arrive at odd (resp., even) rounds. Therefore, at any
round, there is no neuron that fires as a function of both x and y.
Size and Time Lower Bound
In this section we show the proof for Lemma 3. Here we focus on the size lower bound
although the high level proof strategy for the time lower bound is quite similar. The time
lower bound is presented in Appendix A.1. Our proof strategy is as follows. First we reduce
any network N that computes NOT (x) in the asynchronous setting, to a network Nsimple
with a simpler structure that makes it easier to make arguments on it. The second part of
the argument shows the lower bound for simple networks. All missing proofs of this section
are in Appendix A.
I Definition 11 (Strong Neurons and Simple Networks). A neuron u is strong in a given
network if w(u, u) ≥ b(u), and otherwise it is weak. Note that specifically, an excitatory
neuron u with b(u) ≤ 0 is strong. Given a single input neuron x, we say that a network
N is simple if the following hold: (i) x is a strong neuron and has an outgoing edge of
infinite negative weights to all other neurons in the network; and (ii) all other neurons are
excitatory.
We note that the simple network is not a legally defined neural network: the input neuron
has an incoming edge (self-loop), and it is an inhibitor with a positive self-loop. However,
this network definition is only for the sake of the analysis and as such, it is not restricted to
follow any rule.
Reduction to Simple Networks. Given a network N with an input neuron x, define
Nsimple as follows. Exclude all the inhibitory neurons from Nsimple and take all edges
between excitatory neurons to be as in N . Then, add a self-loop of infinite weight to the
input neuron x, and connect it to every neuron with infinite5 negative weight.
I Lemma 12. If N computes NOT (x) within r rounds starting with the initial state σ¯,
then also Nsimple computes it within r rounds, when starting with the initial states as in σ¯
restricted to the vertices of Nsimple.
The proof goes by claiming that for any latency function `simple for Nsimple, we can show
the existence of a latency function ` for N whose performance is only worse than that of
Nsimple with `simple. That is, when x = 0 (resp., x = 1) then the potential of all neurons in
Nsimple, `simple is not decreased (resp. increased) when compared to N , `. Since the network
N computes NOT (x) with the latency function ` within r rounds, we conclude that also
Nsimple computes it with the latency function `simple within at most r rounds.
Fix a NOT (x) network N . For an integer r, a latency function ` for N is r-good with the
initial configuration σ¯, if the network computes NOT (x) within r rounds. I.e., when x = 0,
the output of N fires in some round τ ≤ r, and when x = 1 it never fires when all latencies
are given based on `. If ` is r-good for some integer r we say it is good, and otherwise the
5 By infinite we mean large enough so that when the spike by x arrives at some neuron v, v would not fire.
ITCS 2020
48:12 The Computational Cost of Asynchronous Neural Communication
latency function is bad (the network fails to compute NOT (x)). Note that in order for a
network to compute NOT (x) within r rounds, it is required that any latency function is
r-good for a fixed initial configuration.
Lower Bound for Simple Networks. Assume towards contradiction that there exists a
simple network N = Nsimple with maximum in-degree ∆in < L− 2 that computes NOT (x).
I.e., there exists an initial configuration σ¯ for all neurons but x such that every latency
function ` is good for 〈N , σ¯〉. In what follows, we define two conflicting latency functions `0
and `1, such that if `0 is good when the initial state of x is 0, then it implies that `1 is bad
when the initial state of x is 1.
Defining the Latency Functions `0 and `1. Recall that for every b ∈ {0, 1}, σ¯b = [b, σ¯] is
the initial state vector where x has the initial state b and the initial states of all other neurons
is specified by the vector σ¯. The construction of `0, `1 is inductive. To avoid cumbersome
notation, we start the simulation in round −1 rather than in round 0. For this first round
−1, let `b(v, u,−1) = 1 for v 6= x, and `b(x, u,−1) = L for every u and b ∈ {0, 1}. Thus, the
positive spikes (by any v 6= x) fired in round −1 arrive in round 0, and the negative spikes of
x arrive in round L− 1.
To define the latency of the edges in the remaining rounds τ ≥ 0, we partition them into
blocks, each of size L rounds where the ith block is Ti = [iL, iL+ (L− 1)] for every i ≥ 0. We
continue in steps i = 1, . . . where in step i, the latency values of `0(e, τ), `1(e, τ) are defined
for every edge e in the network N , and for every round τ ∈ Ti. For every b ∈ {0, 1} and a
block Ti, define Ai,b as the set of neurons that fire (hence active) in the first round of Ti
when executing N with the initial configuration σ¯b, and the latency function `b. Throughout
the process of defining the latency functions, we maintain these invariants at the beginning
of step i:
(I1) All the positive spikes generated at any round before the interval Ti arrive to their
destination by the first round of Ti. Furthermore, all negative spikes generated at any
round before the interval Ti arrive to their destination either by the first round of Ti or
on the last round of Ti, namely, round iL+ (L− 1).
(I2) Ai,0 \
⋃
i′≤iAi′,1 = ∅.
We define the latency for the rounds in Ti, and then show that the invariants are maintained.
Defining the latency function `1 for Ti. For every self-loop edge e and every τ ∈ Ti, let
`1(e, τ) = 1. For every edge e = (x, u) where u 6= x, and τ ∈ Ti \ {iL + (L − 1)}, let
`1(e, τ) = (iL+ (L− 1))− τ , i.e., the spike of x arrives u in the last round of the interval
Ti. For e = (x, u) and τ = iL+ (L− 1), let `1(e, τ) = L so that the spike arrives in the last
round of the next block, i.e., round (i+ 1)L+ (L− 1). For every other edge e = (v, u) with
v 6= x, let `1(e, τ) = (i+ 1)L− τ , i.e., the spike arrives at the first first round of the next
block Ti+1.
Defining the latency function `0 for Ti. As for `1, for a self-loop edge e, we set `0(e, τ) = 1.
For an edge e = (x, u) we set `0(e, τ) arbitrarily (since x = 0, those values are meaningless).
We now fix a neuron u, and set the latency values of all its incoming edges (v, u). Since we
have already defined the latency values of all edges up to block Ti, at the beginning of step i,
the sets Ai,0, Ai,1 can be computed. Let g1, . . . , gω be the weak incoming neighbors of u in
Ai,0, and h1, . . . , hs be the strong incoming neighbors of u in Ai,0. We Consider two cases.
The neuron u is said to have a dominant neighbor if it has a neighbor with a sufficiently
Y. Hitron, M. Parter, and G. Perri 48:13
large incoming weight, where the precise weight threshold depend on whether the incoming
neighbor is weak or strong. Specifically, it has a dominant neighbor if it has either a weak
neighbor gj with w(gj , u) ≥ b(u), or a strong neighbor hj with w(hj , u) ≥ b(u)/(L− 1).
Case 1: u has a dominant neighbor. Let `0(e, τ) = (i+ 1)L− τ for every incoming
edge e = (v, u). That is, we schedule all the incoming spikes of u in this block to arrive
at u in the first round of the next block Ti+1.
Case 2: u has no dominant neighbor. Since deg(u) < L−2, we have that ω+s < L−2,
and in particular ω ≤ L − 2. For each weak neuron gj , set `0(gj , u, iL) = j + 1. That
is, the spike from gj in round iL is scheduled to arrive at u in round iL+ (j + 1). For
each strong neighbor hj , we split all the spikes generated by hj during the ω + 1 rounds
iL, . . . , iL + ω in a balanced manner over L − (ω + 1) rounds. Specifically, we set the
latency values of the at most ω + 1 spikes by hj during the rounds iL, . . . , iL+ ω such
that in each round τ ∈ [iL+ (ω + 2), (i+ 1)L], u receives at most (ω + 1)/(L− (ω + 1))
spikes from hj6. For every τ ∈ [iL+ (ω + 1), iL+ (L− 1)], let `0(hj , u, τ) = 1, i.e., the
spike arrives one round later. The latency of all the remaining edges e and rounds τ in
Ti is set to `0(e, τ) = (i+ 1)L− τ , so that it arrives in round (i+ 1)L.
In AppendixA.1.2, we prove that the invariants hold by induction on the number of rounds.
Since the output z is required to fire when x = 0 but must not fire when x = 1, we get the
desired contradiction. In Appendix A.1, we show the time lower bound of Ω(L3) rounds.
This bound is tight, and the construction while having a similar high-level ideas is slightly
more involved than the size lower bound.
4 Upper Bounds
4.1 Synchronization of Logic Gates and Boolean Circuits
First observe that the simple implementation of an OR-gate works also in the asynchronous
setting.
I Observation 13 (OR gate). Given input neurons x1, . . . , xn and output neuron z, there
exists a deterministic network ORsync with no auxiliary neurons, that computes the OR gate
of x1, . . . , xn using L rounds. I.e, it holds that: (i) If σ0(x1)∨ . . .∨σ0(xn) = 0, then σt(z) = 0
for every t, and (ii) If σ0(x1) ∨ . . . ∨ σ0(xn) = 1, then there exists a round t ∈ [1, L] such
that σt(z) = 1. Moreover, if an input neuron fires in round τ , the output neuron z fires in
some round t ∈ [τ + 1, τ + L].
We next consider the more technically involved setting of synchronizing a NOT gate.
I Lemma 14 (NOT gate). There is a network NOTsync of size O(L2) with input neuron x
and output z, that computes NOT (x) within O(L3) rounds. I.e, it holds that: (i) If σ0(x) = 1,
then σt(z) = 0 for every t, and (ii) If σ0(x) = 0, then there exists a round t ∈ [L,Θ(L3)]
such that σt(z) = 1.
The following synchronous implementation assumes that the network contains a special
starter neuron v∗ that fires at the beginning of the simulation, regardless of the input value
of x. Later on in Section 4.3, when presenting the complete synchronization scheme, this
neuron v∗ will receive the starting firing signal from the global pulse generator.
6 For simplicity, we assume that (ω + 1) divides (L− (ω + 1))
ITCS 2020
48:14 The Computational Cost of Asynchronous Neural Communication
Network Description. The network consists of the following components, see Figure 2.
1. A chain C = [c0 = v∗, . . . , c5L2 ] containing 5L2 + 1 neurons. The head of the chain is
the starter neuron that fires in the first round. For every i ≥ 0, the neuron ci has bias
b(ci) = 1. Moreover, for every i ≥ 1 the neuron ci has an incoming edge from ci−1 with
weight 1.
2. A memory neuron m that remembers the initial state of x. The memory neuron has a
positive incoming edge from x, as well as a self-loop both with weight 1 and bias b(m) = 1.
3. A reset inhibitory neuron r with an edge from m of weight w(m, r) = 1, and bias b(r) = 1.
4. A collection of L+ 1 intermediate neurons v0, . . . , vL that are connected to the output
neuron z, where each vi has an incoming edge from the neuron c5·iL ∈ C with weight
w(ci·5L, vi) = 1, a self-loop of weight 1 and bias b(vi) = 1. In addition, each vi has a
negative incoming edge from the reset neuron r with weight w(r, vi) = −∞. Finally, each
vi has an edge to z with weight w(vi, z) = 1 and bias b(z) = L+ 1.
The correctness of the construction and the proof of Lemma 14 are deferred to Appendix B.1.
Synchronization of a Boolean Circuit. Given the synchronized sub-networks of Observa-
tion 13 and Lemma 14, we now show how to synchronize a Boolean circuit that contains OR
and NOT gates.
I Lemma 15. Given a Boolean circuit A of OR and NOT gates with n inputs, k outputs,
m gates and depth d, there exists a deterministic network N = sync(A) with input neurons
x1, . . . , xn, output neurons z1, . . . , zk and O(dL3 +mL2) auxiliary neurons, that computes A
in O(dL4) rounds. I.e., it holds that (i) If [A(σ0(x1), . . . , σ0(xn))]i = 0 then σt(zi,N ) = 0
for every t; and (ii) If [A(σ0(x1), . . . , σ0(xn))]i = 1, then7 there exists t ∈ [1, O(dL4)] such
that σt(zi,N ) = 1.
In the high-level, the network N = sync(A) is obtained by replacing each gate gi with its
synchronized module Sync(gi). The input neurons to the gate modules in layer j of A are the
output neurons of the gate modules of layer j − 1 in A. The network then contains a chain
of length O(d · L3) to control the synchronization between layers: the modules of layer j are
activated only after the modules of the previous layer have completed their computation.
See Appendix B.2.
4.2 Synchronization of a Single Threshold Gate
Deterministic Threshold Gate. Given a deterministic threshold gate g with ∆ inputs,
one can implement g using a Boolean Circuit with poly(∆) gates and depth O(log ∆) (see
Appendix B.3). Combining with the construction described in Lemma 15 we show the
following:
I Lemma 16. Given a weighted threshold gate g = f(x1, . . . , x∆), there exists a network N =
Sync(g) with ∆ input neurons x1, . . . , x∆, an output neuron z, and O(log ∆ ·L3 +poly(∆) ·L2)
auxiliary neurons that computes f within O(log ∆ · L4) rounds. I.e. the output z fires in
round τ ∈ [2, O(log ∆ · L4)] if and only if f(σ0(x1), . . . , σ0(x∆)) = 1.
7 For a vector of n bits x ∈ {0, 1}n, let [x]i denote the ith bit of x. I.e., if x = (x1, . . . , xn), then [x]i = xi.
Y. Hitron, M. Parter, and G. Perri 48:15
Probabilistic Threshold Gate. We next turn to consider the more challenging setting of
probabilistic threshold gates. To synchronize such gates, we first describe how to implement
them by using a Boolean circuit A that contains two types of gates: deterministic threshold
gates, and input-less gates which outputs 1 with probability 1/2. We hereafter denote
the latter gates by uniformly random gates8. The output distributions of the probabilistic
threshold gate and the output gate of A will be very close up to a small additive error of  ∈
(0, 1). The synchronized probabilistic gate will be obtained by applying the synchronization
scheme of Lemma 15 on the circuit A.
Our key result might be of independent interest in the context of Boolean circuits:
I Lemma 17. Given a probabilistic threshold gate g with ∆ inputs, and an error para-
meter  ∈ (0, 1), there exists a Boolean circuit with depth poly(log ∆, log(1/)) and a total
poly(∆, log(1/)) deterministic gates. In addition, there is a collection of O(log(1/)) uni-
formly random gates (each outputs 1 independently w.p. 1/2), and an output gate g′ that
approximates g in the following sense. Letting p(x¯), p′(x¯) be the probability that g, g′ output 1
given input x¯, it holds that |p(x¯)− p′(x¯)| ≤ θ() for any fixed assignment of input x¯.
Our starting point is the following useful fact from [20]:
I Observation 18. Let g1 be a probabilistic gate with an incoming weighted sum W and
bias b1. Let g2 be a deterministic threshold gate with incoming weighted sum W and bias
b2, where b2 is sampled from the Logistic distribution with mean b1 and scale 1. Then
Pr[g1 = 1] = Pr[g2 = 1] = 1/(1 + e−(W−b1)).
The observation holds as the cumulative density function of the Logistic distribution is a
Sigmoidal function. Since we already know how to implement a deterministic threshold
gate using a Boolean circuit, the key challenge is in sampling a value from the Logistic
distribution using a small number of uniformly random gates (i.e., fair coins). This is done
in two key steps. First, using O(log 1/) uniformly random gates, we sample a value from an
4-discretization of the uniform distribution 9. Then, we use the method of inverse transform
sampling to sample from a distribution that is Θ()-close (in L1 norm) to the Sigmoidal
distribution. For a value r sampled u.a.r in [0, 1], a sample from the Logistic distribution with
mean b and scale 1 is given by b+ ln(r/(1− r)). To compute the expression b+ ln(r/(1− r))
using a Boolean circuit, we approximate the ln(x) function using the first O(log 1/) terms
of the Taylor expansion. The almost-Logistic sample will serve as the bias of a deterministic
threshold gate and will be fed to the Boolean circuit of Lemma 16. The full description is
given in Appendix B.4.
We can then synchronize the Boolean Circuit as described in Lemma 17.
I Corollary 19. Given a probabilistic threshold gate g with ∆ inputs, and an error parameter
 ∈ (0, 1), there exists a network N = Sync(g) with ∆ input neurons x1, . . . , x∆, an output
neuron z, and poly(∆, log 1/) · L3 auxiliary neurons such that z approximates the gate
g within poly(log ∆, log 1/) · L4 rounds in the following sense. For any fixed input x¯,
with probability at least 1 − Θ(), it holds that g outputs 1 iff z fires in some round in
[1,poly(log ∆, log 1/) · L4].
8 A uniformly random gate is a fair coin, in contrast to probabilistic threshold gate that outputs 1 based
on a Sigmoidal distribution.
9 Our sample is equivalent to sampling a value from the uniform distribution and then rounding it to the
closest value of the form i · 4 for some integer i.
ITCS 2020
48:16 The Computational Cost of Asynchronous Neural Communication
x
NOT(x)
Internal Chain
m
r
𝒗𝟎 𝒗𝑳
𝒄𝟎 = 𝒗
∗
𝒄𝟓𝑳𝟐
𝐍𝐎𝐓𝒔𝒚𝒏𝒄
starter
NOT
OR
OR
NOT
Global Chain
𝑶𝑹𝒔𝒚𝒏𝒄
𝑶𝑹𝒔𝒚𝒏𝒄
𝑵𝑶𝑻𝒔𝒚𝒏𝒄
𝑵𝑶𝑻𝒔𝒚𝒏𝒄
𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝟒
𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝟒
Starter 𝒗∗
Boolean Circuit Pulse Generator (PG)
𝑐0 = 𝑣
∗ starter
𝑐𝐿
𝑐𝛼𝐿4
𝑐𝛼𝐿4+𝐿
𝑐𝑘
𝑘 = 𝑂(𝐿4 log Δ)
𝒔𝒚𝒏𝒄(𝒗𝒊)
𝑣𝑖
𝑜𝑢𝑡
𝑨𝑵𝑫𝒊
𝒗𝒊 AND
Threshold
gate
Synchronizer
Figure 2 Left: synchronized network of a single NOT gate. Middle: A synchronized network for
a Boolean circuit. Right: The transformation of a single neuron vi in the synchronized network for
the given SNN.
4.3 The Complete Synchronization Scheme
The complete synchronization scheme and the proof of Theorem 4 are given in Appendix B.5.
In the high level, the construction has two parts: a global pulse generator, and a specific
adaptation of the given network N into a network sync(N ), see Figure 2.
The pulse generator is implemented by a directed cycle of length k = O˜(L4 log ∆). The
input layer and output layer in sync(N ) are exactly as in N . Let V be the neurons of N . For
each auxiliary neuron vi ∈ V , we add its synchronized sub-network Sync(vi) from Lemma
16 and Cor. 19. Recall that each neuron in N implements either a threshold gate or a
probabilistic threshold gate. For each such vi ∈ V , we also add an AND module ANDi, which
receives input from the sub-network Sync(vi) and the pulse generator. The neuron vi is set
to be the output neuron of this ANDi module.
References
1 Douglas B Armstrong, Arthur D Friedman, and Premachandran R Menon. Design of asynchron-
ous circuits assuming unbounded gate delays. IEEE Transactions on Computers, 100(12):1110–
1120, 1969.
2 Baruch Awerbuch and David Peleg. Network Synchronization with Polylogarithmic Overhead.
In 31st Annual Symposium on Foundations of Computer Science, St. Louis, Missouri, USA,
October 22-24, 1990, Volume II, pages 514–522, 1990.
3 Michael J Berry II and Markus Meister. Refractoriness and neural precision. In Advances in
Neural Information Processing Systems, pages 110–116, 1998.
4 Tobias Bjerregaard and Shankar Mahadevan. A survey of research and practices of network-
on-chip. ACM Computing Surveys (CSUR), 38(1):1, 2006.
5 Sami Boudkkazi, Edmond Carlier, Norbert Ankri, Olivier Caillard, Pierre Giraud, Laure
Fronzaroli-Molinieres, and Dominique Debanne. Release-dependent variations in synaptic
latency: a putative code for short-and long-term synaptic dynamics. Neuron, 56(6):1048–1060,
2007.
6 Chi-Ning Chou, Kai-Min Chung, and Chi-Jen Lu. On the Algorithmic Power of Spiking
Neural Networks. In 10th Innovations in Theoretical Computer Science Conference, ITCS
2019, January 10-12, 2019, San Diego, California, USA, pages 26:1–26:20, 2019.
7 RE Lee DeVille and Charles S Peskin. Synchrony and asynchrony in a fully stochastic neural
network. Bulletin of mathematical biology, 70(6):1608–1633, 2008.
8 Huawei Fan, Yafeng Wang, Hengtong Wang, Ying-Cheng Lai, and Xingang Wang. Autapses
promote synchronization in neuronal networks. Scientific reports, 8(1):580, 2018.
Y. Hitron, M. Parter, and G. Perri 48:17
9 Martin Fürer. Faster integer multiplication. SIAM Journal on Computing, 39(3):979–1005,
2009.
10 Johan Håstad. On the Size of Weights for Threshold Gates. SIAM J. Discrete Math.,
7(3):484–492, 1994. doi:10.1137/S0895480192235878.
11 Scott Hauck. Asynchronous design methodologies: An overview. Proceedings of the IEEE,
83(1):69–93, 1995.
12 Yael Hitron and Merav Parter. Counting to Ten with Two Fingers: Compressed Counting
with Spiking Neurons. ESA, 2019.
13 Yael Hitron and Merav Parter. Counting to Ten with Two Fingers: Compressed Counting
with Spiking Neurons. CoRR, abs/1902.10369, 2019. arXiv:1902.10369.
14 Kaori Ikeda and John M Bekkers. Autapses. Current Biology, 16(9):R308, 2006.
15 Fabian Kuhn, Joel Spencer, Konstantinos Panagiotou, and Angelika Steger. Synchrony
and asynchrony in neural networks. In Proceedings of the twenty-first annual ACM-SIAM
symposium on Discrete algorithms, pages 949–964. SIAM, 2010.
16 Robert A. Legenstein, Wolfgang Maass, Christos H. Papadimitriou, and Santosh Srinivas
Vempala. Long Term Memory and the Densest K-Subgraph Problem. In 9th Innovations in
Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA,
USA, pages 57:1–57:15, 2018.
17 Nancy Lynch and Cameron Musco. A Basic Compositional Model for Spiking Neural Networks.
arXiv preprint, 2018. arXiv:1808.03884.
18 Nancy Lynch, Cameron Musco, and Merav Parter. Computational Tradeoffs in Biological
Neural Networks: Self-Stabilizing Winner-Take-All Networks. In Proceedings of the 8th
Conference on Innovations in Theoretical Computer Science (ITCS), 2017.
19 Nancy Lynch, Cameron Musco, and Merav Parter. Spiking Neural Networks: An Algorithmic
Perspective. In 5th Workshop on Biological Distributed Algorithms (BDA 2017), July 2017.
20 Nancy A. Lynch, Cameron Musco, and Merav Parter. Neuro-RAM Unit with Applications
to Similarity Testing and Compression in Spiking Neural Networks. In 31st International
Symposium on Distributed Computing, DISC 2017, October 16-20, 2017, Vienna, Austria,
pages 33:1–33:16, 2017.
21 Jun Ma, Xinlin Song, Wuyin Jin, and Chuni Wang. Autapse-induced synchronization in a
coupled neuronal network. Chaos, Solitons & Fractals, 80:31–38, 2015.
22 Wolfgang Maass. Lower Bounds for the Computational Power of Networks of Spiking Neurons.
Electronic Colloquium on Computational Complexity (ECCC), 1(19), 1994. URL: http:
//eccc.hpi-web.de/eccc-reports/1994/TR94-019/index.html.
23 Wolfgang Maass. On the computational power of noisy spiking neurons. In Advances in Neural
Information Processing Systems 8 (NIPS), 1996.
24 Wolfgang Maass. Networks of spiking neurons: the third generation of neural network models.
Neural Networks, 10(9):1659–1671, 1997.
25 Wolfgang Maass. Paradigms for computing with spiking neurons. In Models of Neural Networks
IV, pages 373–402. Springer, 2002.
26 Rajit Manohar and Yoram Moses. Analyzing Isochronic Forks with Potential Causality. In
21st IEEE International Symposium on Asynchronous Circuits and Systems, ASYNC 2015,
Mountain View, CA, USA, May 4-6, 2015, pages 69–76, 2015. doi:10.1109/ASYNC.2015.19.
27 Rajit Manohar and Yoram Moses. The eventual C-element theorem for delay-insensitive
asynchronous circuits. In 2017 23rd IEEE International Symposium on Asynchronous Circuits
and Systems (ASYNC), pages 102–109. IEEE, 2017.
28 Alain J Martin. The limitations to delay-insensitivity in asynchronous circuits. In Beauty is
our business, pages 302–311. Springer, 1990.
29 Robert Miller. Time and the brain. CRC Press, 2000.
30 Saburo Muroga. Threshold logic and its applications. Wiley, 1971.
31 Yu P Ofman. On the algorithmic complexity of discrete functions. In Doklady Akademii Nauk,
volume 145, pages 48–51. Russian Academy of Sciences, 1962.
ITCS 2020
48:18 The Computational Cost of Asynchronous Neural Communication
32 Christos H. Papadimitriou and Santosh S. Vempala. Random Projection in the Brain and
Computation with Assemblies of Neurons. In 10th Innovations in Theoretical Computer
Science Conference, ITCS 2019, January 10-12, 2019, San Diego, California, USA, pages
57:1–57:19, 2019. doi:10.4230/LIPIcs.ITCS.2019.57.
33 Huixin Qin, Jun Ma, Chunni Wang, and Ying Wu. Autapse-induced spiral wave in network of
neurons under noise. PloS one, 9(6):e100849, 2014.
34 Alexa Riehle, Sonja Grün, Markus Diesmann, and Ad Aertsen. Spike synchronization and rate
modulation differentially involved in motor cortical function. Science, 278(5345):1950–1953,
1997.
35 BL Sabatini and WG Regehr. Timing of synaptic transmission. Annual review of physiology,
61(1):521–542, 1999.
36 Jens Sparsø. Asynchronous circuit design-a tutorial. In Chapters 1-8 in “Principles of
asynchronous circuit design-A systems Perspective”. Kluwer Academic Publishers, 2001.
37 Lili Su, Chia-Jung Chang, and Nancy Lynch. Spike-Based Winner-Take-All Computation:
Fundamental Limits and Order-Optimal Circuits. Neural Computation, 2019.
38 Christopher S. Wallace. A Suggestion for a Fast Multiplier. IEEE Trans. Electronic Computers,
13(1):14–17, 1964. doi:10.1109/PGEC.1964.263830.
39 Barbeeba Wang and Nancy Lynch. Integrating Temporal Information to Spatial Information
in a Neural Circuit. arXiv preprint, 2019. arXiv:1903.01217.
40 Ergin Yilmaz, Mahmut Ozer, Veli Baysal, and Matjaž Perc. Autapse-induced multiple
coherence resonance in single neurons and neuronal networks. Scientific Reports, 6:30914,
2016.
A Missing Proofs for the Negative Results
Impossibility Result, Proof of Lemma 10. Assume towards contradiction there exists a
network N = (V, {x, y}, {z}, w, b) that computes the AND gate of the initial states x0, y0
of the inputs x and y. It is then required that if x0 ∧ y0 = 1, then there exists a round
in which z fires, and if x0 ∧ y0 = 0 then z is idle throughout the execution. Our goal is
to show the existence of a bad assignment of latency values to the edges of N . Such bad
assignment exists even if we fix the latency of each edge to the same value throughout the
entire execution. We begin with some quick observations.
The state of a neuron v in round τ , namely, στ (v), is fully determined by the network, the
latency function, the input and the initial state σ0, that is στ (v) = H(N , `, σ0, x0, y0, v, τ)
for some function H.
Given the network N and a latency function `, the state of a neuron v in round τ is a
function of the previous states of its incoming neighbors denoted as u1 . . . uk:
στ (v) = Fv(στ−`(u1,v)(u1), . . . , στ−`(uk,v)(uk)).
I Definition 20. For a neuron v and round τ , we say that the state στ (v) is x-independent
(equivalently y-independent) if its value does not depend on the initial state of x, i.e if
H(N , `, σ0, x0 = 0, y0, v, τ) = H(N , `, σ0, x0 = 1, y0, v, τ) .
I Observation 21. A concatenation of x-independent functions is also x-independent. Spe-
cifically, for round τ and neuron v with incoming neighbors u1, . . . uk, it holds that if
στ−`(u1,v)(u1), . . . , στ−`(uk,v)(uk) are x-independent then στ (v) is also x-independent.
Given the network N we set the edge latencies as follows:
`(u, v) =
{
1 if either u = x or v = x, but not both.
2 otherwise.
Y. Hitron, M. Parter, and G. Perri 48:19
We next show that for every neuron v ∈ V , its firing state in each round if either x-independent
or y-independent. Specifically, the firing state of z in each round does not depend on both
x0 and y0. This will contradict the assumption that z computes an AND gate of x0 and y0.
B Claim 22. For every round τ ≥ 1 it holds that: (1) For every v ∈ V \ {x}, the firing state
στ (v) is x-independent if τ is even and y-independent if τ is odd. (2) στ (x) is x-independent
if τ is odd and y-independent if τ is even.
Proof. By induction on the round τ . For τ = 1, since all outgoing edges from y have latency
2 (except for the edge (y, x), if exists), in round 1 no neuron v ∈ V \ {x} received a spike
from y and therefore σ1(v) is y-independent. Because the edge from x to itself has latency 2,
and the edge from y to x has latency 1, in round 1 the neuron x can receive a signal from y
but not from x and therefore also σ1(x) is x-independent. For τ = 2 and v 6= x, since the
edges from x have latencies 1, and all other edges have latency 2, it holds that
σ2(v) = Fv(σ1(x), σ0(y), σ0(u1), . . . , σ0(uk)),
where u1, . . . , uk are the neighbors of v in V \ {x}. Because σ1(x) is x-independent, and
σ0(ui) are the initial states, by Observation 21 we can conclude that σ2(v) is x-independent.
Next, for the input neuron x, since all its incoming edges (except for the self-loop) have
latency 1 it holds that
σ2(x) = Fx(σ0(x), σ1(u1), . . . , σ1(uk)).
Because σ1(v) is y-independent for all v 6= x, we conclude that σ2(x) is y-independent as
well.
Assume the claim holds for every round τ ′ < τ and we will show the claim holds for
round τ as well. For v 6= x with incoming neighbors u1, . . . uk in V \ {x}, by the definition of
the latencies it holds that
στ (v) = Fv(στ−1(x), στ−2(u1), . . . , στ−2(uk)).
If τ is even, so is τ −2 and by the induction assumption στ−2(ui) are x-independent. Because
τ − 1 is odd, στ−1(x) is x-independent. Hence, by Observation 21 we conclude that στ (v) is
also x-independent. Similarly, if τ is odd, then by the induction assumption στ−2(ui) and
στ−1(x) are y-independent, and therefore στ (v) is also y-independent. Then, for the neuron
x, it holds that
στ (x) = Fx(στ−2(x), στ−1(u1), . . . , στ−1(uk)).
If τ is odd, by the induction assumption στ−2(x) as well as στ−1(u1), . . . , στ−1(uk) are
x-independent and therefore στ (x) is x-independent. On the other hand, if x is even, then
by the induction assumption στ−2(x) and στ−1(u1), . . . , στ−1(uk) are y-independent and
therefore στ (x) is also y-independent. C
Since in each round the output neuron z is either x-independent or y-independent, this
contradicts the assumption and Lemma 10 follows.
ITCS 2020
48:20 The Computational Cost of Asynchronous Neural Communication
A.1 Size Lower Bound for Computing NOT (x)
A.1.1 Reduction to Simple Networks, Proof of Lemma 12
The reduction is based on the following notion of domination between two configurations.
Domination. Given a network N , a latency function `, and a vector of starting states σ¯
for all neurons but the input x, for b ∈ {0, 1} define potb(u, τ,N , `, σ¯) as the potential of
neuron u in round τ in the simulation of N with the initial vector state [b, σ¯], i.e., with the
initial state of x is being b and all other initial states are as in σ¯. When σ¯ is clear from
the context, we may omit it, and simply write potb(u, τ,N , `). Given networks N1,N2 with
vertices V1, V2 and latency functions `1, `2 respectively, we say that 〈N1, σ¯1〉 and 〈N2, σ¯2〉 are
compatible if V1 ⊆ V2 and σ1 and σ2 agree on the mutual vertices of V1, i.e., σ1(u) = σ2(u)
for every u ∈ V1.
In our arguments, we consider a pair of compatible configurations 〈N1, σ¯1〉 and 〈N2, σ¯2〉
along with latency functions `1, `2 for these configurations. We say that 〈N1, σ¯1, `1〉 dominates
〈N2, σ¯2, `2〉 if 〈N1, σ¯1〉 and 〈N2, σ¯2〉 are compatible and in addition:
pot1(u, τ,N1, `1, σ¯1) ≤ pot1(u, τ,N2, `2, σ¯2) for every u ∈ V1 \ {x} and τ ≥ 0.
pot0(u, τ,N1, `1, σ¯1) ≥ pot0(u, τ,N2, `2, σ¯2) for every u ∈ V1 \ {x} and τ ≥ 0.
Let V, Vsimple be the vertex sets of N and Nsimple respectively. Let σ¯simple be the initial
state vector that agrees with σ¯ on all vertices in Vsimple. Thus, 〈Nsimple, σ¯simple〉 and 〈N , σ¯〉
are compatible. For a number of rounds r, a latency function ` is r-good for 〈N , σ¯〉 if N
computes NOT (x) within r rounds under ` when starting from the initial state vector σ¯.
Our proof strategy is as follows. We will show that every latency function `simple is r-good
for 〈Nsimple, σ¯simple〉, by showing that there exists a function ` such that
〈Nsimple, σ¯simple, `simple〉 dominates 〈N , σ¯, `〉.
Given a latency function `simple for the network Nsimple, let ` be a latency function for N
which is similar on excitatory neurons and gives inhibitory neurons the latency value of the
neuron x. I.e., `(v, u, τ) = `simple(v, u, τ) for every pair of excitatory neurons. In addition,
`(v′, u, τ) = `simple(x, u, τ) for every inhibitory neuron v′, and a neuron u ∈ Vsimple. All
remaining latency values (i.e., the incoming edges to the inhibitors of N ) can be chosen
arbitrarily.
We show by induction on the round τ , that (i) pot1(u, τ,Nsimple, `simple) ≤ pot1(u, τ,N , `)
for every u ∈ Vsimple \ {x} , τ ≥ 0, and (ii) pot0(u, τ,Nsimple, `simple) ≥ pot0(u, τ,N , `) for
every u ∈ Vsimple \ {x} and τ ≥ 0. For τ = 0, this is true as the potential values in τ = 0 are
simply the initial states, and the vector of initial states of σ¯ and σ¯simple are compatible. For
the induction step, let τ ≥ 1, and assume correctness for all τ ′ ≤ τ − 1. We will prove the
claims for round τ . Let u be a neuron in Vsimple \ {x}, and let v1, . . . , vk be its incoming
excitatory neighbors.
The initial state of x is 0. By the induction assumption, it holds that pot0(u, τ ′,Nsimple,
`simple) ≥ pot0(u, τ ′,N , `) for every round τ ′ ≤ τ − 1 and every neuron v ∈ {v1, . . . , vk}.
Thus every excitatory neuron vi that fires in round τ ′ in the simulation of N , also fires in
round τ ′ in the simulation of Nsimple for every τ ′ ≤ τ − 1. Combining with the definition
of the latency function `, we get that each spike from vi that arrives to u at round τ of
the simulation of N also arrives u in round τ in the simulation of Nsimple. Let ω be an
inhibitory incoming neighbor of u in N , then ω does not exist in Nsimple. Also note that
since σ0(x) = 0, a negative spike from x never arrives at u in the simulation of network
Y. Hitron, M. Parter, and G. Perri 48:21
Nsimple. Therefore, no negative spikes arrive at u in Nsimple. Summing over the positive and
negative spike weights, we get that pot0(u, τ,Nsimple, `simple) ≥ pot0(u, τ,N , `) for every
u ∈ V1 \ {x}.
The initial state of x is 1. The initial state of x is 1: By the induction assumption it holds
that pot1(u, τ ′,Nsimple, `simple) ≤ pot1(u, τ ′,N , `) for every round τ ′ ≤ τ − 1 and every
u ∈ {v1, . . . , vk}. Thus every vi that fires in round τ ′ in the simulation of Nsimple, also fires
in round τ ′ in the simulation of N . Combining with the definition of the latency function `,
we get that each spike from vi that arrives at u in round τ of the simulation of Nsimple also
arrives at u in round τ in the simulation of N .
Let ω be an inhibitory incoming neighbor of u in N . By the definition of the latency
function `, and the fact that x fires in every round, for each spike that ω fires and arrives at
u in round τ in N , there is a spike from x that arrives at u in round τ in Nsimple. Since the
edges from x have weight −∞, we get that the negative spikes weight arriving at u in round
τ in Nsimple are larger (in absolute value) than the negative spikes in N . Thus, summing
up the both positive and negative spike weights, we get that pot1(u, τ,Nsimple, `simple) ≤
pot1(u, τ,N , `) for every u ∈ Vsimple \ {x}. This proves the induction step for round τ . We
get that 〈Nsimple, σ¯simple, `simple〉 dominates 〈N , σ¯, `〉.
Finally, we show that if 〈N1, σ¯1, `1〉 dominates 〈N2, σ¯2, `2〉 and `2 is r-good for 〈N2, σ¯2〉,
then also `1 is r-good for 〈N1, σ¯1〉.
Consider the simulation of 〈N2, σ¯2, `2〉, and assume that the initial state of x is 0. Since `2
is r-good for N2, there is a round τ ≤ r in which z fires in N2. Since 〈N1, σ¯1, `1〉 dominates
〈N2, σ¯2, `2〉, we can apply the condition of domination for z, τ and get that z also fires in
round τ in N1. Now, assume that the initial state of x is 1. Since `2 is r-good for N2, there
is no round τ in which z fires in N2. Since 〈N1, σ¯1, `1〉 dominates 〈N2, σ¯2, `2〉, we can apply
the condition of domination for z and every τ , and get that z also never fires in N1. Hence,
`1 is r-good for N1. This completes the proof of the lemma.
A.1.2 Size Lower Bound for Simple Networks
We show that the latency values defined in the step i satisfy the invariant in the beginning
of step i+ 1.
Proving that the Invariants Hold. For round i = 0, the correctness of invariant (I1) hold
since in the first round τ = −1 all positive spikes are set to arrive in round 0 in both `0, `1,
while a spike from x arrives at round τ = L − 1. As for the correctness of invariant (I2),
note that both simulations are similar for all neurons V \ {x}, and, again, a spike from x
arrives only in round L− 1. Therefore the same neurons are active in round τ = 0. Hence
A0,0 = A0,1 and we get correctness for (I2). We now show that the invariants are preserved
after each step. Assume that the correctness holds at the beginning of each step j ≤ i, and
consider now the beginning of step i+ 1.
(I1) By the construction, in all of the cases, the values we set for `0, `1 in step i are such
that all spikes except the spike from x generated at round iL+(L−1) which are generated
in Ti arrive at some round τ ≤ (i + 1)L, i.e. by the first round of Ti+1. Furthermore,
spikes from x generated at round iL+ (L− 1) is set to arrive in round (i+ 1)L+ (L− 1).
The invariant holds by combining with the correctness for all steps i′ ≤ i.
(I2) We start by proving the following auxiliary claim.
ITCS 2020
48:22 The Computational Cost of Asynchronous Neural Communication
B Claim 23. Consider the simulation of N with initial state σ¯b and latency function `b,
and let τ be round in Ti. Then:
1. For a strong neuron u ∈ Ai,1, u fires iff τ ∈ [iL, iL+ (L− 2)] ⊆ Ti. For a strong neuron
u ∈ Ai,0, u fires for every τ ∈ Ti.
2. For a weak neuron u ∈ Ai,b, u fires iff τ = iL.
3. For u /∈ Ai,b, u is not active in round τ .
Proof.
Case b = 1: We start by showing that all three claims hold for b = 1. By the definition of
`1, the only positive spikes received by any neuron in some round τ ∈ [iL+1, iL+(L−1)]
are self-loop spikes. Since a strong active neuron u ∈ Ai,1 receives an inhibiting spike
from x in round iL+ (L− 1), it is not active in this last round. For a weak neuron u′,
its spike from the self-loop is not strong enough to make it active. Lastly, for u /∈ Ai,1
since no negative spikes arrive at u in round iL we have that b(u) > 0. Due to the fact
that no spikes arrive at u in round τ , we get that u stays inactive.
Case b = 0: claim (1) holds since a strong u ∈ Ai,0 never gets inhibited as x never
fires. We will now consider claim (2) and (3) for a neuron u that is either a weak
neuron, or a strong neuron that is not in Ai,0. By Invariant (I1), all the positive spikes
from the previous blocks arrived by the first round of Ti. Thus if u fires in any round
τ ∈ [iL+ 1, iL+ (L− 1)] (i.e., any round which is not the first one in Ti), this must
be due to the incoming spikes generated in the first round of Ti. We will know prove
by induction on the round τ that u does not fire in any round in [iL+ 1, iL+ (L− 1)].
Induction Base, Round τ = iL+ 1: By the definition of the latency function `0, no
spike arrives at u from an incoming neighbor in that round. Therefore, if u is weak,
then a spike from itself will not make it active in round τ . In addition, if u /∈ Ai,0
then u did not fire in round iL, and thus receives no self-spike in round τ . Since no
negative spikes arrive at u in round iL, for every u /∈ Ai,0, it must hold that b(u) > 0.
Induction Step τ ≥ iL+ 2. Assume that the claims (2,3) hold up to round τ − 1 and
consider round τ ≥ iL+ 2. Consider first the case that u has either a weak neighbor gj
with w(gj , u) ≥ b(u), or alternatively a strong neighbor hj with w(hj , u) ≥ b(u)/(L−1).
Then by the definition of `0 (Case I in our definition), all the spikes fired by the incoming
neighbors of u are scheduled to arrive in the first round of Ti+1. Therefore, u does not
receive any spike in round τ , and remains inactive.
Next, consider the complimentary case where all the weak neighbors gj satisfy that
w(gj , u) < b(u), and all the strong neighbors hj satisfy w(hj , u) < b(u)/(L− 1).
Case (1): τ ∈ [iL+ 2, iL+ ω + 1]. By the induction assumption on τ − 1, u did not
fire in round τ − 1. By the definition of `0, u receives a spike from at most one weak
neighbor gj in round τ , and since w(gj , u) < b(u), it does not fire in this round.
Case (2): τ ∈ [iL+ (ω + 2), iL+ (L− 1)]. Let hj be a strong active neighbor of
u. By the definition of `0, in round τ , u receives at most (ω + 1)/(L − (ω + 1)) of
the spikes that fired by hj during the interval [iL, iL+ ω]. Furthermore, there is one
additional spike that hj fired at round τ − 1, that arrives at u in round τ . Note that
u does not receive spikes from weak neighbors in round τ since all spikes from weak
neighbors arrive in an earlier round τ ′′ ∈ [iL+ 2, iL+ (ω + 1)]. In addition, since u
did not fire in round τ − 1, it also does not get any self spikes in round τ . Overall, u
receives at most ((ω + 1)/(L − (ω + 1))) + 1 spikes by strong neighbors in round τ ,
and no other spikes (by a weak neighbor or by u). Since the spikes from the strong
Y. Hitron, M. Parter, and G. Perri 48:23
neighbors have weight of at most b(u)/(L− 1), and there are s strong active neighbors,
the overall weighted sum of the received spikes at round τ is at most
s · ((ω + 1)/(L− (ω + 1)) + 1) · b(u)
L− 1 <
s · ((ω + 1)/s+ 1) b(u)
L− 1 = (s+ ω + 1)
b(u)
L− 1 < b(u) , (2)
where both inequalities follow as s+ ω < L− 2. Therefore, u does not get activated at
round τ , claims (2)+(3) follow. C
We are now ready to prove the induction step for (I2). Assume towards contradiction
that there exists a neuron u ∈ Ai+1,0 \
⋃
i′≤i+1Ai′,1. Since u is active in the first round
of Ti+1, using Claim 23(3), it must have an incoming neighbor that fires in the first round
of the previous block. Let Ai,0(u) = Γin(u) ∩Ai,0 be those neighbors.
B Claim 24. For every strong neuron v ∈ Ai,0(u), it holds that w(v, u) < b(u)/(L− 1).
Proof. Consider such strong v ∈ Ai,0(u). By Invariant (I2) for the beginning of step
i, there exists a round j ≤ i such that v ∈ Aj,1. When running N with initial state
σ¯1 and the latency function `1, by Claim 23(1), v fires in all of the L − 1 rounds of
[jL, jL+(L−2)]. By the construction of `1, all these L−1 spikes arrive in round (j+1)L.
Therefore, if (L−1) ·w(v, u) ≥ b(u), then u is activated in round (j+1)L, i.e., u ∈ Aj+1,1,
in contradiction to the definition of u. Therefore (L− 1) · w(v, u) < b(u) for every strong
neuron v. C
B Claim 25. For every weak neuron v ∈ Ai,0(u), it holds that w(v, u) < b(u).
Proof. Consider such weak neuron v ∈ Ai,0(u). When running N with initial state σ¯1
and latency function `1, by Claim 23(2), v fires once in the interval Tj , i.e., in the first
round jL. By the construction of `1 this spike arrives in round (j + 1)L. Therefore,
if w(v, u) ≥ b(u), then u is activated in round (j + 1)L, implying that u ∈ Aj+1,1,
contradiction to the definition of u. We therefore conclude that w(v, u) < b(u) for every
weak neuron v ∈ Ai,0(u). C
Let ω, s be the number of weak (resp., strong) neurons in Ai,0(u). By Claims 25 and
23(2), all active weak neurons in Ti fire only in the first round of that block. By the
definition of the latency function, all these spikes are scheduled to the first ω rounds in
Ti, and therefore none of them is scheduled to arrive on the first round of Ti+1. This
implies that u fires in that round due to the spikes generated by its strong neighbors.
By Claim 24, w(v, u) < b(u)/(L − 1) for each such strong neighbor v of u. This in
particular implies that u is a weak neuron, and by Claim. 23 it did not fire in the last
round of Ti. By the definition of the latency function, the spikes generated by such strong
neighbors are divided almost evenly among L− ω rounds, up to the first round of Ti+1.
Each round gets at most s · (ω/(L− ω) + 1), which is strictly less than b(u) by Eq. (2).
Leading to contradiction for the assumption that u ∈ Ai+1,0.
Since `1 is a good latency function when starting with x = 1, we have that z never fires
and thus z /∈ ⋃Ai,1. By applying invariant (I2) on the output neuron z for every round
τ ≥ 0, we get that z /∈ ⋃Ai,0. By using Claim 23, we get that z never fires with `0 and
x = 0. Contradiction to the fact that N solves NOT (x).
ITCS 2020
48:24 The Computational Cost of Asynchronous Neural Communication
A.2 Time Lower Bound for Computing NOT (x)
In this section we show the following.
I Lemma 26. Every network N that computes NOT (x) in the L-bounded asynchronous
setting requires Ω(L3) rounds.
By Lemma 12, we restrict attention to a simple network N = Nsimple with one input neuron
x that computes NOT (x). Similarly to the size lower bound, we define two conflicting
latency functions `0 and `1, such that if `1 is good when x0 = 1, then the output neuron z of
N fires after Ω(L3) rounds in the simulation with the latency function `0 and x0 = 0.
The simulation with the latency function `0 is partitioned into consecutive blocks of L
rounds, Ti = [iL, iL+ (L− 1)] for every i ∈ N.
The simulation with the latency function `1 is based on the notion of important and
unimportant rounds. Consider the L-round interval Tk = [k · L, k · L+ (L− 1)] for k ∈ N.
Among the first L/2 rounds, there is an important round once every 16 rounds, and the rest
are unimportant. Furthermore, each of the last L/2 rounds of the interval are unimportant.
I.e., the important rounds in the interval are {kL+ 16j | 16j < L/2, j ∈ N}. Denote by
τi the ith important round in the simulation. Note that by definition τi+1 − τi ≤ L/2.
In our arguments, the configuration of the network in the ith important round τi of the
simulation with `1 and x0 = 1 will be compared against the configuration in round iL
(i.e., the first round of the block Ti) in the simulation with `0 and x0 = 0.
Active subsets of neurons: For every i ∈ N, let A0,i be the firing neurons (hence active)
of round i · L (the first round of the block Ti) in the simulation of 〈N , σ0, `0〉. Similarly,
let A1,i be the firing neurons in round τi of the simulation of 〈N , σ1, `1〉. Also define
A′b,i = Ab,i \
⋃
j≤i−1
Ab,j , the neurons that fire for the first time in “round” i.
For every neuron u, b ∈ {0, 1} and i ∈ N, let Ab,i(u) = Ab,i∩Nin(u), A′b,i(u) = A′b,i∩Nin(u)
where Nin(u) is the set of incoming neighbors of u.
For a subset of neurons V ′ ⊆ V and a neuron u, let w(V ′, u) = ∑
v∈V ′
w(v, u). Moreover,
let S(V ′) and W(V ′) be the strong10 and weak (respectively) neurons subsets of V ′.
A.2.1 Defining the latency functions `0 and `1
Throughout, a spike event is represented by a triplet 〈v, u, τ〉 where v ∈ Nin(u) fires in round
τ . Since the functions are nice, the latency values for the self spikes 〈u, u, τ〉 for every u and
τ are set to 1. For technical reasons, it is more convenient to start the simulations in round
−1, rather than in round 0. For this first round −1, let `b((v, u),−1) = 1 for every u and
every v 6= x, and `b((x, u),−1) = L for every u and b ∈ {0, 1}. As a result, the positive spikes
(by any v 6= x) fired in round −1 arrive to their destination in round 0, and the negative
spikes of x arrive in round L− 1. We now define the latency values for the remaining spikes.
Defining the function `0. Note that when x0 = 0, x never fires and thus there is no need
to define `0 values for the spikes of x. We define `0 iteratively in a block by block manner.
Here we do not accumulate spikes and spikes generated in the ith block Ti will arrive by
the first round of the (i+ 1)th block Ti+1. Fix a block Ti = [iL, iL+ (L− 1)] for i ≥ 0 and
assume that the latency values `0 for all prior spikes in rounds τ < iL have already been
10Recall that a neuron u is strong if w(u, u) ≥ b(u) and it is weak otherwise.
Y. Hitron, M. Parter, and G. Perri 48:25
fixed. Thus the active set A0,i can be determined. First, the algorithm checks if there is a
way to spread all the spikes generated in rounds of Ti among the interval [iL+ 2, (i+ 1)L],
in a way that guarantees that u will not fire in any of the rounds this interval. In particular,
no spike is scheduled to arrive in round iL+ 1. Otherwise, all spikes generated in this block
are scheduled to arrive in round (i+ 1)L (the first round of the (i+ 1)th block).
Defining the function `1. The definition of the function `1 is more involved. Unlike the
function `0 in which all spikes generated in block Ti are scheduled by round (i+ 1)L, here
the setting is slightly more sensitive. Specifically, the scheduling algorithm of `1 will make
sure that non-self spikes arrive to their destination only in important rounds.
Spikes by the input (inhibitory neuron) x: All spikes from x are scheduled to arrive in the
last round of the blocks Ti, namely, in rounds of the form i · L + (L − 1). Formally,
for every spike 〈x, u, τ〉 where τ = i · L + (L − 1) for some i ∈ N, let `1((x, u), τ) = L
thus arriving in round τ + L = (i + 1)L + L − 1. For every τ ∈ [iL, iL + (L − 2)], let
`1((x, u), τ) = (iL+ (L− 1))− τ , thus arriving in round iL+ (L− 1) as desired.
Spikes by v 6= x: The latency values are defined in a round by round fashion, such that
for every important round τi, every neuron u gets activated if possible. Otherwise the
arrival of the spikes towards u are postponed (when possible) to the next important
round τi+1. The spikes generated at non-important rounds will be always delayed to the
next important round. This is always possible as the distance to the next important
round is at most L/2. For a subset of spikes S, let w(S) =
∑
〈v,u,τ〉∈S
w(v, u) be the total
weight of the spikes in S. For every important round τi, we will maintain a list of pending
spikes Rτi(u) towards u that were not yet scheduled. In every step τ ≥ 0, the algorithm
will schedule the spikes generated in this round. If the round τ is important, then the
algorithm will also make decisions regarding the set of pending spikes Rτi(u).
We will keep the invariant that at the beginning of step τ , the latency value of all spikes
scheduled to arrive by round τ has already been determined. As we will see, the non-self
spikes will always be scheduled to arrive in important rounds. As a result, a neuron u
fires in an unimportant round τ iff u is strong and it fired in round τ − 1. Initially, for
every neuron u, the algorithm adds every non-self spike 〈v, u,−1〉 to Rτi(u). For every
τ ≥ 0, we consider the following algorithm.
All self-spikes 〈u, u, τ ′〉 are given a latency value of `(u, u, τ ′) = 1.
Handling important rounds τi. Consider a neuron u. If u fired in round τi − 1,
add the self-spike 〈u, u, τi − 1〉 to the pending spike set Rτi(u). If the total weight of
its pending spikes (towards u) is sufficiently large to make u fire, all the non-self spikes
are scheduled to arrive in τi. Formally, if w(Rτi(u)) ≥ b(u), schedule all these spikes
to round τi by setting `(v, u, τ ′) = τi − τ ′ for every spike 〈v, u, τ ′〉 ∈ Rτi(u).
Otherwise, if the total weight of pending spikes is small, i.e., w(Rτi(u)) < b(u), the
non-self spikes are deferred to the next important round τi+1 if possible (i.e., if the
latency does not exceed its upper bound L). Formally, for every non-self pending spike
〈v, u, τ ′〉 ∈ Rτi(u), if τi+1 − τ ′ > L then let `((v, u), τ ′) = L (i.e., 〈v, u, τ ′〉 cannot be
further deferred). Otherwise, add 〈v, u, τ ′〉 to the pending spike set Rτi+1(u) of the
next important round τi+1.
Finally, all spikes generated in round τi are also (safely) added to the pending list
Rτi+1(u).
Handling unimportant rounds. The non-self spikes towards u generated in round
τ are added to the pending spike set Rτi+1(u) of the next important round τi+1 (after
round τ).
ITCS 2020
48:26 The Computational Cost of Asynchronous Neural Communication
First observe that the function `1 is valid: All self-spikes have a latency value of 1. Moreover,
the non-self spikes have a latency value in [1, L]. To see this observe that for unimportant
round τ , a non-self spike 〈v, u, τ〉 is added to the pending list Rτi+1(u) where τi+1 is the next
important round after τ . Due to the fact that τi+1 − τi ≤ L/2, this assignment is valid. In
addition, the pending spikes 〈v, u, τ〉 ∈ Rτi(u) are deferred to τi+1 only if τi+1 − τ ≤ L.
A.2.2 Proof of Lemma 26
The key lemma that establishes Lemma 26 is the following:
I Lemma 27. For every neuron u 6= x with u ∈ A′0,i for i < L2/1024, there exists some
i′ ≤ i such that u ∈ A′1,i′ .
By the correctness of the simple network N , the output neuron z should not fire when x0 = 1
and with the latency function `1. In other words, z /∈ A1,i′ for any i′. By Lemma 27, we get
that z can only be in A′0,j for some j ≥ L2/1024, hence firing when x0 = 0 only after Ω(L3)
rounds. We start with the following simple observation.
I Observation 28. In the simulation of 〈N , σ¯0〉 with `0, it holds for every i that: (i) each
strong neuron s ∈ A0,i fires in every round of block Ti; (ii) each weak neuron ω ∈ A0,i which
is not x fires only in the first round of block Ti; and (iii) every neuron v /∈ A0,i does not fire
in any round of block Ti.
Proof. (i). In the simulation with `0 with x0 = 0 there are no inhibiting spikes, and if a
strong neuron s fires in some round, it will keep on firing for the rest of the simulation.
(ii). By the definition of the latency function `0, no spikes from incoming neighbors of
the weak neuron ω arrive in round iL+ 1, the second round of block Ti. We will prove by
induction on τ ∈ [iL+ 1, iL+ (L− 1)] that ω does not fire in round τ . For the base of the
induction, since ω 6= x is excitatory and weak, it holds that 0 ≤ w(ω, ω) < b(ω), thus ω does
not fire in round iL+ 1. Assume that the claim holds up to round τ ≥ iL+ 1 and consider
round τ + 1. Since ω did not fire in round τ by the induction assumption, it does not receive
a self spike in round τ + 1. By the definition of the function `0, the non-self spikes that arrive
in round τ + 1 < (i+ 1)L cannot make ω fire. Thus ω does not fire in τ + 1 and (ii) holds.
(iii). Let v /∈ A0,i, i.e., v did not fire in round iL. Since v does not receive negative spikes
in round iL (as the spikes of x are always scheduled to the last round of the blocks). We can
then conclude that b(v) > 0. Since in round iL + 1, it receives no self-spike and no other
spike, it also did not fire in round iL+ 1. The argument then follows inductively in the same
manner as in (ii). J
We next state the following claim which is crucial to complete the key lemma.
B Claim 29. Fix a neuron u ∈ A′0,i such that for every v ∈ A0,i−1 it holds that w(v, u) < b(u).
Then the total weight of spikes fired towards u in block Ti−1 is at least L · b(u)/8.
We first complete the proof of Lemma 27 and only then prove Claim 29.
Proof of Lemma 27. The proof is shown by induction on the block i. For the base case of
i = 0, note that the initial states and the latency functions for the neurons V \ {x} in both
simulations are the same, and that spikes from x (that exist only in the simulation with `1)
arrive only in round L− 1. This implies that in both simulations the same neurons (except
for x) are active in round τ = 0, hence A0,0 = A1,0 \ {x}. Now consider the block Ti for
1 ≤ i < L2/1024. Let u ∈ A′0,i, i.e. u fires for the first time in round iL in the simulation
with `0 and x0 = 0.
Y. Hitron, M. Parter, and G. Perri 48:27
Case 1: There exists a previously firing dominant incoming neighbor: First assume that
u has some incoming neighbor v ∈ A0,i−1 with w(v, u) ≥ b(u). By definition v ∈ A′0,j(u)
for some j ≤ i−1, and then by the induction assumption v ∈ A′1,i′ for some i′ ≤ j ≤ i−1.
By definition of the latency function `1, since w(v, u) ≥ b(u), the total weight of spikes
from incoming neighbors will be sufficient to activate u in the next important round,
τi′+1. Therefore, u ∈ A1,i′+1, which implies u ∈ A′1,i′′+1 for some i′′ ≤ i′ + 1. Since
i′′ ≤ i′ + 1 ≤ i the condition holds.
Case 2: All previously firing incoming neighbors are not dominant: By applying Claim 29
on u and block Ti, we get that the total weight of spikes fired towards u in block Ti−1 is
at least L · b(u)/8. Due to Observation 28, we get
L · w(S(A0,i−1(u)), u) + w(W(A0,i−1(u)), u) ≥ L8 · b(u).
By the definition of A′0,j and the induction assumption, it holds that
A0,i−1 ⊆
⋃
j≤i−1
A′0,j ⊆
⋃
i′≤i−1
A′1,i′ .
We now consider the simulation with x0 = 1 and the latency function `1, and partition all
the rounds until τi into k blocks of L rounds (expect perhaps the last one). Formally, for
every j ≤ k − 2, let Bj = [jL, jL+ (L− 1)] and let Bk−1 = [(k − 1)L, τi−1 + 15]. Denote
by S(Bj) and W(Bj) the strong and weak (respectively) incoming neighbors of u that
fire in some round of Bj . Using these notations, we can write
k−1∑
j=0
L · w(S(Bj), u) + w(W(Bj), u) ≥ L8 · b(u).
Case 2.1: Most of the weight is in the last block. We first assume that
L · w(S(Bk−1), u) + w(W(Bk−1), u) ≥ L16 · b(u).
Consider the algorithm that defines `1, and recall that Rτi′ (u) is the set of pending spikes
that were not yet scheduled when the algorithm considered the important round τi. The
interesting case is when u did not fire in any round of Bk−1. In such a case, all the spikes
generated towards u in the rounds of Bk−1 were added to the pending list of Rτi(u). Note
that each strong neuron v ∈ S(Bk−1) fires at least 16 spikes in Bk−1, since τi− τi−1 = 16.
Furthermore, each v ∈ W(Bk−1) fires at least one spike in Bk−1. Moreover, the gap
between any τi′ ∈ Bk−1 and τi is at most L rounds, so they do not exceed the maximal
latency in τi. Altogether, we get that
w(Rτi(u)) ≥ 16 · w(S(Bk−1), u) + w(W(Bk−1), u) ≥ (3)
16
L
· (L · w(S(Bk−1), u) + w(W(Bk−1), u)) ≥ b(u) .
Therefore u fires in τi and u ∈ A1,i′ for some i′ ≤ i as desired.
Case 2.2: Most of the weight is in the first k − 1 blocks. It remains to consider the com-
plementary case where
k−2∑
j=0
L · w(S(Bj), u) + w(W(Bj), u) ≥ L16 · b(u).
ITCS 2020
48:28 The Computational Cost of Asynchronous Neural Communication
Since i < L2/1024 and each block Bj for j ≤ k − 2 consists of L/32 important rounds,
we have k ≤ L
2/1024
L/32 =
L
32 . Therefore, by an averaging argument there exists Bj for
j ≤ k − 2 satisfying that:
L · w(S(Bj), u) + w(W(Bj), u) ≥ 2 · b(u). (4)
First observe that every strong neuron s ∈ S(Bj) fires for at least L/2 rounds in this
block. The reason is that there is a gap of L/2 rounds between the last important rounds
of Bj and the round where the inhibiting spike from x arrives. During this time interval
every strong neuron in S(Bj) keeps on firing. Now, assume that u does not fire in any
round of Bj , and denote the first important round of Bj+1 by τi′ . Again, consider the
algorithm that defines `1. Since u did not fire in any round of the block Bj , all the
spikes that are fired towards u in Bj are in the residual set Rτi′ (u). Therefore by Eq. (4),
we get that w(Rτi′ (u)) ≥ (L/2) · w(S(Bj), u) + w(W(Bj), u) ≥ b(u), and u fires in τi′ .
Therefore, we get that u fires either in some important round of Bj or in τi′ . In both
cases there is a round τi′′ with i′′ ≤ i such that u ∈ A1,i′′ . This implies u ∈ A′1,i′′ for
i′′ ≤ i, and the condition holds. J
Finally, it remains to prove Claim 29.
Proof of Claim 29. Recall that S(A0,i−1(u)) and W(A0,i−1(u)) are the strong and weak
(respectively) incoming neighbors of u that fire in block Ti−1. If w(S(A0,i−1), u) ≥ b(u)/8,
then by Observation 28 the total spike weight fired in block i− 1 is at least L · b(u)/8, and
we are done. Therefore, it remains to consider the case where w(S(A0,i−1), u) < b(u)/8
and w(W(A0,i−1)) < L · b(u)/8. We will show that in this case, there is a way to schedule
all spikes fired towards u in block Ti−1 to arrive in rounds [(i − 1)L + 2, iL], such that u
does not get activate in any of these rounds. By the definition of `0, we get that u does
not get activated in any of the rounds [(i− 1)L+ 2, iL], and in particular u /∈ A′0,i, thus a
contradiction.
First observe that b(u) > 0 since u did not fire in round (i − 1)L (as u /∈ A0,i−1) and
it did not receive any negative spike in that round (as all negative spikes arrive in the last
rounds of the blocks). We next show that all the spikes generated in block Ti−1 can be
scheduled in rounds [(i− 1)L+ 2, iL] without making u fire in any of these rounds. Since
the scheduling algorithm of `0 works in this manner, we will get a contradiction to the fact
that u ∈ A0,i.
Scheduling spikes from weak neighbors. Let FW = {〈v, u, (i− 1)L〉 | v ∈W(A0,i−1)} be
the spikes of weak neighbors fired in the block Ti−1. Recall that by Observation 28, these
weak neurons fire only in the first round. Since these spikes are fired in round (i− 1)L,
they can arrive in any of the rounds [(i− 1)L+ 2, iL]. As the total weight of the weak
spikes is at most Lb(u)/8, we show that we can schedule them in a greedy manner into at
most L/2−2 rounds while keeping the total weight in each such round to strictly less than
b(u). We traverse the weak spikes one by one, and start throwing them into rounds in
[(i− 1)L+ 2, iL− 1]. We add a spike to round τ as long as the total weight of weak spikes
scheduled to it is at most b(u)/2. If the addition of the next weak spike raises the weight
to above b(u) it is deferred to the next round τ + 1. Let τ ′ be the last round to which the
weak spikes are schedules. Since in each τ ∈ [(i− 1)L+ 2, τ ′ − 1] the total weight of weak
spikes is at least b(u)/2, we get that τ ′ ≤ (i− 1)L+ L/4 + 3 ≤ (i− 1)L+ L/2 as desired.
Scheduling spikes from strong neighbors. We next turn to show that also the strong spikes
can be scheduled in a balanced manner in the remaining L/2 slots of the block Ti without
activating the neuron u. Let FS = {〈v, u, τ〉 | v ∈ S(A0,i−1), τ ∈ T0,i−1)} be the spikes of
Y. Hitron, M. Parter, and G. Perri 48:29
strong neighbors fired in block Ti−1. For a spike 〈v, u, τ〉 ∈ FS with τ ≤ (i−1)L+(L/2−1),
schedule 〈v, u, τ〉 to arrive in round τ+L/2+1. For 〈v, u, τ〉 ∈ FS with τ ≥ (i−1)L+L/2,
schedule 〈v, u, τ〉 to arrive in round τ + 1. In this way, due to Observation 28, u receive
two spikes from each v ∈W(A0,i−1) in each round τ ∈ [(i − 1)L + L/2 + 1, iL]. Since
w(S(A0,i−1)) < b(u)/8, we get that the total weight of spikes that u receives in each
of these rounds is less than b(u)/4, and therefore u does not get activated. Overall, all
spikes generated in the block Ti−1 are scheduled by `0 without activating the neuron u in
any of the rounds [(i− 1)L+ 2, iL], contradiction to the fact that u ∈ Ai,0. The claim
follows. C
B Missing Proofs for the Positive Results
B.1 Synchronization of Boolean Gates
Proof of Observation 13. The network is as follows: connect each input neuron xi to the
output neuron z by an edge of weight w(xi, z) = 1, and let the bias of z be b(z) = 1. First
note that if all input neurons xi did not fire in round 0, then pot(z, τ) = −1 for all τ , and z
will not fire. If a neuron xi fires in round τ , since the latency of each edge is at most L, there
is a round τ ′ ∈ [τ + 1, τ + L] in which the spike from xi arrives to z. Thus, in round τ ′, the
weighted incoming sum to z is at least 1, therefore pot(z, τ ′) ≥ 0 and z fires in round τ ′. J
The Complete Proof of Lemma 14. We analyze the correctness of the network NOTsync.
We begin by proving the following auxiliary claim.
B Claim 30. If all the intermediate neurons v0, . . . vL fire starting round τ for at least
L(L+ 1) rounds, then there exists a round τ ′ ∈ [τ + 1, τ + L(L+ 1)] in which the output
neuron z fires (i.e., regardless of the latencies of the edges).
Proof. For every i ∈ {0, 1 . . . , L(L+ 1)}, denote the L-length interval Ti = [τ + i · L, τ + (i+
1)L− 1]. In addition, define T˜ = T0 ∪ ... ∪ TL+1. Let qi be the number of spikes that were
fired in the interval of Ti but received by z in the next interval Ti+1. Note that since the
maximum edge latency is L, in the worst case the spikes of interval Ti must arrive to z by
the end of the next interval Ti+1. We next prove by induction on i that either z fires by the
end of the interval Ti, or qi+1 ≥ i · L. For the base of the induction, consider i = 0. If z did
not fire in some round during T0, we claim that q1 ≥ L. Since all the L + 1 neurons fire
in every round during the interval, overall L(L+ 1) many spikes where fired. By the fact
that z did not fire during T0, we have that in each of these rounds, it received at most L
spikes. This implies that z received at most L2 many spikes during T0, and therefore at least
q1 ≥ L many spikes will be received by z in the interval T1. Assume that the claim holds
up to i− 1 and consider the ith interval. If z fired by the end of the ith interval Ti, we are
done. Otherwise, by induction assumption for i − 1, we have that qi ≥ i · L. In addition,
all these qi spikes must be received at z during the interval Ti. Then, in interval Ti we
again have a total of L(L+ 1) fresh spikes by the neurons v0, . . . vL. This creates a total of
L(L+ 1) + i ·L spikes. As z did not fire in Ti, it received at most L2 many spikes, leaving at
least qi+1 ≥ L(L+ 1) + i · L− L2 ≥ (i+ 1)L spikes for the next interval. This completes the
proof of the induction step.
Overall, for i = L, we have that either z fired by the end of the interval TL, or that
qL+1 ≥ L(L+ 1). In the latter case, since all these spikes must arrive during the last interval
TL+1, by the pigeonhole principle there must be a round in this interval in which z received
at least L+ 1 spikes and fire. This completes the proof of the claim. C
ITCS 2020
48:30 The Computational Cost of Asynchronous Neural Communication
Proof of Lemma 14. Due to Claim 30, it remains to show that if x did not fire in round
0, then there must be a starting round τ , in which all the neuron v0, . . . vL fire for at least
L(L+ 1) rounds.
If x did not fire, then neither the memory neuron m nor the reset neuron r fire during
the execution. Since we assume that v∗ fires in round 0, it must hold that all these
neurons fire starting some round τ ∈ [5L2, 5L3 + 2L]. This holds due to the self-loops on
the neurons v0, . . . vL, and the chain of length 5L2. By Claim 30, there exists a round
τ ′ ∈ [τ + 1, τ + L(L+ 1)] in which z fires.
We next show that if x fired in round 0, then z would not fire in any round. The key
observation is as follows:
I Observation 31. In order for z to fire in some round τ , it must receive spikes from at
least two different vi neurons.
Proof. To see this, note that since the maximum edge latency is L, in round τ , z can receive
spikes only from the L previous rounds τ − L, . . . , τ − 1. In particular, a single neuron can
be accounted for at most L many spikes received by z in a given round. Finally, since the
bias of z is L+ 1, and all edge weights are 1, we conclude that z must receive spikes from at
least two neurons in order to fire. J
When x fires in round 0, the memory neuron m fires from round τm ∈ [1, L+ 1] ahead, due
to its self-loop. Hence, starting round τr ∈ [2, 2L + 2], the reset neuron r starts firing at
least once in every interval of 2L rounds. Recall that each vi gets a negative spike from the
inhibitor r and positive spike from the neuron c5iL ∈ C. We next show that each neuron vi
gets inhibited at least L rounds before the activation of the neurons vi+1. As a result, at any
point of time, there will be no two neurons vi and vj such that z received both of their spikes
in the same round. By induction on i, the first intermediate neuron v0 has an incoming
edge from c0 = v∗, and thus it begins to fire in some round τ ′ ∈ [0, L]. Due to the negative
edge from the reset neuron r, it stops firing before round 3L+ 2. Since v1 has an incoming
edge from c5L, it starts firing only after round 5L+ 1, and therefore z starts receiving spikes
from v1 only starting round 5L+ 2. Assume the claim is correct for neurons v0, . . . , vi−1 and
consider neuron vi. If the neuron vi starts firing in round τi, by round τi + 2L it is inhibited
by r. Because vi starts to fire after receiving a spike from c5·i·L and vi+1 starts firing after
receiving a spike from c5(i+1)L, neuron vi+1 begins to fire only after round τi + 4L, at least
L rounds after vi is inhibited.
Hence, z cannot receive input from two different neurons vi, vj at the same round, and
the claim follows by combining Observation 31. Finally, the next observation plays a rule in
the subsequent constructions.
I Observation 32. The correctness still holds even if the chain starts to fire at some round
τ > 0. In this case the output neuron z fires in some round t ∈ [τ + 1, τ + Θ(L3)].
Proof of Observation 32. The correctness of the observation follows from the fact that the
input neuron x activates the memory neuron m, that keeps on firing (i.e., presenting the
state of x) due to its self-loop. Thus all arguments in Lemma 14 still hold in case the chain
starts to fire in any later round. J
Y. Hitron, M. Parter, and G. Perri 48:31
B.2 Synchronization of a Boolean Circuit, Proof of Lemma 15
The Construction. Given a Boolean circuit A of OR / NOT gates g1, . . . , gm of depth d, we
describe a construction of an analogous neural network N with a similar execution. For every
gi, let Sync(gi) be the synchronized sub-network of the gate gi. Specifically, for a NOT gate
(resp., OR) gi, the sub-network Sync(gi) is taken from Lemma 14 (resp., Observation 13).
Recall, that for a NOT gate gi, its syncronized sub-network Sync(gi) contains a chain of
neurons where the head of the chain c0 will be denoted hereafter by v∗i . The network N
consists the following components:
1. Input neurons x1, . . . , xn, and output neurons z1, . . . , zk, that serve as the input and the
output for the network N .
2. A chain C = [c0, . . . , cq] containing q + 1 = αdL3 + 1 neurons, where α is a constant
satisfying that αL3 ≥ 5L3 + L. For every i ≥ 0, the neuron ci has bias b(ci) = 1.
Moreover, for every i ≥ the neuron ci has a positive incoming edge from ci−1 with weight
w(ci−1, ci) = 1. Our simulation starts with neuron c0 firing.
3. A Sync(gi) network (using Lemma 14 and Observation 13 respectively) for every gate gi.
The connections between these components are as follows:
1. For every gate gi in the first layer, the input for its synchronized sub-network Sync(gi) is
given by xi,1, . . . , xi,ki , namely, the input bits of the gate gi in the circuit A.
2. For every gate gi in layer j ≥ 2, denote by gi,1, . . . , gi,ki the input of the gate gi in
the circuit A. In the network N , the input to the sub-network Sync(gi) are the output
neurons of the sub-networks Sync(gi), Sync(gi1), . . . ,Sync(gik).
3. The output gates of the network N are the output neurons of the sub-networks
Sync(o1), . . . ,Sync(ok), where o1, . . . , ok are the output gates of the circuit A.
4. Finally, the synchronized sub-networks of the NOT gates are connected to the chain C as
follows. For each NOT gate gi in every layer j, the (jαL3)th neuron cjαL3 in the chain
has an outgoing edge to v∗i with weight 1 (where v∗i is the head of the internal chain in
Sync(gi)), since the bias of v∗i is 1, a spike from cjαL3 makes v∗i fire.
Figure 3 illustrates the construction for a circuit with 4 NOT and OR gates of depth 3. We
note that one can shave an L-factor in the size and time overhead of lemma 15, by reusing
the synchronization chain for all the Boolean gates in the network.
Correctness. Let V be the total set of neurons in N , and let ` : V × V × N→ [1, L] be a
fixed (arbitrary) nice latency function. First note that in the global chain C, each of the
neurons fires once, in a sequential manner. Recall, that we assume that the starter c0 fires
in round τ0 = 0. For every j ∈ {1, . . . , d}, let τj be the round in which cjαL3 fires (i.e., the
spike from cjαL3−1 is received at cjαL3 in round τj − 1).
For every gate gi in the circuit A in layer j ≥ 1, denote by out(gi,A) the final state of gi
after receiving its inputs in the circuit A. In addition, let qi be the output neuron in the
sub-network Sync(gi), and let σt(qi,N ) be the state of the neuron in round t when simulating
the network N . Our goal is to show that for every gi, its corresponding output qi in the
network N , has the same “output” as gi in the circuit A.
B Claim 33. For every layer j ∈ {1, . . . , d} and every gate gi in layer j of circuit A, its holds
that: (i) If out(gi,A) = 0, then σt(qi,N ) = 0 for every t, and (ii) If out(gi,A) = 1, then
there exists t ∈ [τj−1 + 1, τj ] such that σt(qi,N ) = 1.
ITCS 2020
48:32 The Computational Cost of Asynchronous Neural Communication
NOT
OR
OR
NOT
Global Chain
𝑶𝑹𝒔𝒚𝒏𝒄
𝑶𝑹𝒔𝒚𝒏𝒄
𝑵𝑶𝑻𝒔𝒚𝒏𝒄
𝑵𝑶𝑻𝒔𝒚𝒏𝒄
𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝟒 𝒙𝟏 𝒙𝟐 𝒙𝟑 𝒙𝟒
Starter 𝒗∗
Figure 3 The transformation of the circuit on the left with 4 inputs and 3 layers. For each
gate we add the corresponding synchronized sub-network, where we connect the input and output
neurons of the sub-network according to the original circuit. In addition we introduce a global chain
that activates the sub-networks in each layer after the previous layers have already finished the
computation. The first neuron in the global chain is set to be the starter neuron which fires in the
beginning of the simulation.
Proof. We prove by induction on the layer j. For j = 1, recall that the input neurons
xi,1, . . . , xi,ki of the sub-network Sync(gi) are the input neurons of the network N . Therefore,
in round 0 in the simulation of N , the sub-network Sync(gi) has the same input as gate gi
in the circuit A. Assume first that gi is a NOT gate. Then the spike of the starter neuron
c0 arrived at the head chain v∗i by round L. Combining with Observation 32 we get that
if out(gi,A) = 1 then σt(qi,N ) = 1 for some t ∈ [L,L + 5L3] ⊆ [1, αL3]. In addition, if
out(gi,A) = 0 then σt(qi,N ) = 0 for every t. The case where gi is an OR gate is even simpler
and follows by Observation 13. Since the path from c0 to cαL3 in the chain C is of length
αL3, we have that τ1 ≥ αL3. Therefore [1, αL3] ⊆ [τ0 + 1, τ1], and the claim holds for j = 1.
For the induction step, let j ≥ 2, and assume correctness up to layer j − 1. We now
prove the claim for layer j. Let gi be a gate in layer j. By Observation 32, the important
thing to take care of regarding a NOT gate g is to make sure that its inputs have the correct
states (i.e., as the corresponding states in A) by the time that the head of the chain v∗i in
Sync(g) has received the spike from c(j−1)αL3 . Denote by qi,1, . . . , qi,ki the output neurons
of the sub-networks Sync(gi,1), . . . ,Sync(gi,ki). By the induction assumption, for each gi,h, if
out(gi,h,A) = 0 then σt(qi,h) = 0 for every t, and otherwise σt(qi,h) = 1 for some t ≤ τj−1.
Since the neurons qi,1, . . . , qi,ki are the input neurons of g, it holds that the sub-network
Sync(g) gets the same input as the input of g in A, by round τj−1 of the simulation of N .
Now, assume that gi is a NOT gate. Then, by round τj−1 + L, the head of the chain v∗i
has recieved the spike from c(j−1)·α·L3 . Combining with Observation 32, when out(qi,N ) = 1
, it holds that σt(qi,N ) = 1 for some t ∈ [τj−1 + 1, τj−1 + L + 5L3]. In addition, when
out(qi,N ) = 0 then σt(qi,N ) = 0 for every t. Again, since the path from c(j−1)αL3 to cjαL3
in the chain C is of length αL3, we have that τj ≥ τj−1 + L+ 5L3, and the claim follows.
The case where gi is an OR gate follows in a similar way by Observation 13. C
Lemma 15 follows by using Claim 33 with j = d, and noting that each output neuron zi in
N is the output neuron qi′ for some sub-network Sync(gi′) where gi′ is a gate in layer d of A.
This completes the correctness and the bound on the time overhead. We finally bound the
Y. Hitron, M. Parter, and G. Perri 48:33
size of the network. The network N consists of a chain of O(dL3) neurons, and a Sync(gi)
sub-network of size O(L2) for each gate gi in A. Therefore, there are overall O(dL3 +mL2)
auxiliary neurons.
B.3 Synchronization of a Single Deterministic Threshold Gate
We now turn to consider the synchronized implementation of a single deterministic threshold
gate and prove Lemma 16.
Thanks to a result of [30], we can assume without loss of generality that the weights and
bias values can be represented using binary vectors of length d∆ log ∆e. Hastad [10] also
showed that this bound is tight. In addition, we can also assume without loss of generality
that b(z) ≥ 0. The key part is to implement the single threshold gate by a Boolean circuit.
This requires small adaptations from existing results in the area, specifically we will use the
following known facts.
I Fact 34 (Iterated Addition [31, 38]). Given two input binary vectors x¯ = [x1, . . . , x∆]
and y¯ = [y1, . . . , y∆], there exists a Boolean circuit with poly(∆) gates and O(1) depth that
outputs the binary representation of dec(x¯) + dec(y¯).
I Corollary 35 (Multiple Iterated Addition). Given ∆ input binary vectors x¯1, . . . , x¯∆ where
x¯i ∈ {0, 1}m for some integer m ≥ 1, there exists a Boolean circuit with poly(∆,m) gates
and O(log ∆) depth that outputs the binary representation of
∑
i dec(x¯i).
I Observation 36 (Comparison). Given two input binary vectors x¯ = [x1, . . . , x∆] and
y¯ = [y1, . . . , y∆], there exists a Boolean circuit with poly(∆) gates and O(1) depth that
outputs 1 iff dec(x¯) ≥ dec(y¯).
We are now ready to implement a threshold gate by a small depth Boolean circuit of
polynomial size. This lemma explains the dependency in the largest in-degree ∆ of the final
synchronized solution.
I Lemma 37. Given a threshold gate g with Boolean inputs x1, . . . , x∆ with weights
w1, . . . , w∆, and an output neuron z with bias b(z), there exists a Boolean circuit that
computes g (i.e., outputs 1 iff
∑
wi · xi ≥ b(z)) using poly(∆) gates and depth O(log ∆).
Proof. Each input xi is connected to ` = d∆ · log ∆e neurons wi,1, . . . , wi,`, where the edge
weight w(xi, wi,j) is 1 if the jth-bit in wi is 1 and 0 otherwise. We set the bias values to
be b(wi,j) = 1. Thus, the outgoing edge weights of xi encode the binary representation
of the weight wi. As a result, once xi fires in round τ , after at most L rounds, wi,j fires
iff the jth bit in the representation of wi is 1. As we will see, those ∆2 · log ∆ neurons
w1,1, . . . , w∆,` will serve as the input layer to the circuit. In addition, we also represent the
bias of z using ` neurons b1, . . . , b` that encode the binary representation of b(z): the bias
of bj = 1 if the jth bit in b(z) is 0, and bj = −1 otherwise. Let x¯pos = {xi | wi ≥ 0}
and x¯neg = {xi | wi < 0}. In the same manner, let Wpos =
∑{wi | wi ≥ 0} and
Wneg =
∑{|wi| | wi < 0}. We will use the Multiple Iterated Addition circuit of Corollary
35 to compute the binary representation of Wpos and Wneg + b(z). Finally, we use the
Comparison circuit of Observation 36 to compare those values, such that the output will be
1 iff Wpos ≥Wneg + b(z), hence computing the function of the threshold gate. J
The final synchronous implementation of g is obtained by applying Lemma 15 on C, i.e.,
Sync(g) ← Sync(C). The construction uses a total O(log ∆ · L3 + poly(∆) · L2) auxiliary
neurons, and computation time of O(L4 ·log ∆) rounds. This completes the proof of Lemma 16.
ITCS 2020
48:34 The Computational Cost of Asynchronous Neural Communication
B.4 Probabilistic Threshold Gate
B.4.1 Description of the Boolean Circuit
The construction of the boolean circuit A approximating a probabilistic threshold gate is
achieved using two main steps. First we sample an almost uniform random variable, then we
use the sampled value in order to approximate a sample from the Logistic distribution.
Step 1: Sampling from the Almost Uniform Distribution. We introduce k = 4 log(1/)
uniformly random gates, denoted as r1, . . . rk. Hence, dec(r¯) encodes an integer number that
is uniformly sampled between 0 and d(1/)4e. In addition, we introduce k input bits (with
fixed value) a1, . . . ak such that dec(a¯) = d(1/)4e. Thus, the value r′ = dec(r¯)/ dec(a¯) is
sampled uniformly at random from the set {0, 4, 24, 34, . . . 1}.
Step 2: Sampling from the Almost Logistic Distribution. Next, we transform the sample
r′ from Step 1 into a sample from an almost Logistic distribution. This is done by using
the method of inverse transform sampling. In our context, for a sample r u.a.r in [0, 1], the
value b+ ln(r/(1− r)) is a sample from the Logistic distribution with mean b and scale 1. To
compute the expression b+ ln(r′/(1− r′)) using a Boolean circuit, we approximate the ln(x)
function (up to ±poly()) using the first O(log 1/) terms of the Taylor expansion around a
point x0 where 0 ≤ x0 − x ≤ 1/2.
I Definition 38 (-Approximation of the ln(x) Function). Given x > 0 and a positive integer
k, let l̂nk(x) be the ln-approximation of x obtained by computing the first k terms of the
Taylor expansion around a point x0, where 0 ≤ x0 − x ≤ 1/2. When k is clear from the
content we may omit it and simply write l̂n(x).
The task of sampling from the (almost) Logistic distribution then boils into computing
f(r′) = l̂nk(r′/(1 − r′)) with k = d4 log 1/e. We first use a Boolean circuit to distinguish
between the case where r′ ≤ 1−r′, and the complementary case. Using the vectors r¯ and a¯, this
can be done using integer operations and comparison as r′/(1−r′) = dec(r¯)/(dec(a¯)−dec(r¯)).
When r′ > 1− r′, we calculate f(−r′), and then either add or subtracts it from the bias b
respectively.
In what follows, assume that r′ ≤ 1 − r′, and therefore r′/(1 − r′) ∈ [0, 1]. To pick
the point x0 around which the Taylor approximation is expended, we let x0 = 1/2 when
r′/(1 − r′) ≤ 1/2, and x0 = 1 otherwise. This latter condition can also be easily checked
with a Boolean circuit.
To finally be able to compute the function l̂nk(x) using a Boolean circuit, we must ensure
that all our operations are applied on integers. Therefore, instead of computing f(r′), we will
be actually computing q · f(r′) for some large enough constant q that guarantees that q · f(r′)
is an integer. Specifically, letting q = (dec(a¯)−dec(r¯))k does the job as the function l̂nk(x) is
a polynomial of degree k. This factor of q would not affect the correctness of the computation
as it will be canceled out later on. Using the circuit for iterated addition [38, 31] and fast
multipliers [9], we can compute q · f(r′) using only integer addition and multiplication. The
output of the final Boolean circuit is y¯ where dec(y¯) = q · b+ q · f(r′) . In the analysis section,
we show that dec(y¯)/q is sampled from a distribution that is poly()-close to the Logistic
distribution with mean b.
Y. Hitron, M. Parter, and G. Perri 48:35
Putting all Together: The Output Circuit. Let w1, . . . , w∆ be the weights of the probabil-
istic threshold gate g. To cancel out the multiplication of q in the output bias value from the
previous step, we multiply all the incoming weights by q as well. We can then use the construc-
tion of Lemma 16 for a deterministic threshold gate with weights w′1 = q ·w1, . . . , w′∆ = q ·w∆
and bias b′′ = dec(y¯). This completes the description of the construction.
B.4.2 Analysis and Proof of Lemma 17
We now turn to prove Lemma 17 and start with several auxiliary claims.
B Claim 39. Let r1, r2 ∈ [0, 1] such that |r1 − r2| ≤ 2 and  ≤ r1 ≤ 1− , then
| ln(r1/(1− r1))− ln(r2/(1− r2))| ≤ 2 .
Proof. By the definition of r1 and r2 we get the following inequalities:
| ln(r1/(1− r1))− ln(r2/(1− r2))| = | ln(r1)− ln(1− r1)− ln(r2) + ln(1− r2)|
≤ | ln(r1 + 
2
r1
)|+ | ln(1− r1 + 
2
1− r1 )|
≤ | ln(1 + 2/r1)|+ | ln(1 + 2/(1− r1))|
≤ 2 ln(1 + ) ≤ 2 ·  ,
where the last inequality is due to the Taylor expansion of ln(1 + x) around 0. C
Recall that given x > 0 and an integer k > 0, l̂nk(x) is the ln-approximation of x
obtained by computing the first k terms of the Taylor expansion around a point x0 where
0 ≤ x0 − x ≤ 1/2.
B Claim 40. Fix r1, r2 ∈ [0, 1] such that |r1 − r2| ≤ 2 and  ≤ r1 ≤ 1 − , denote
b̂1 = b+ ln(r1/(1− r1)) and b̂2 = b+ l̂n(r2/(1− r2)). Then, |̂b1 − b̂2| ≤ 3 .
Proof. Fix x ∈ (0, 1). Since l̂n(x) is obtained by using the first k terms in the Taylor expansion
of ln(x) around x0, we have that | ln(x)− l̂n(x)| = 1xk0 ·
(x−x0)k
k · ηk, where η ∈ [x, x0]. Since
x0 ≥ x, also x0 ≥ η. As |x− x0| ≤ 1/2, we get that | ln(x)− l̂n(x)| ≤ (1/2)k. By plugging
k = Θ(log 1/), we have that | ln(x)− l̂n(x)| ≤  for every x.
Thus, combining with Claim 39 we conclude the following:
|̂b1 − b̂2| = |b+ ln(r1/(1− r1))− b− l̂n(r2/(1− r2))| (5)
≤| ln(r1/(1− r1))− ln(r2/(1− r2)) +  |
≤ + | ln(r1/(1− r1))− ln(r2/(1− r2))| ≤ 3 ·  . C
B Claim 41. Consider two threshold gates g1, g2 with the same weighted sum and bias values
b1 ≤ b2 such that b2 − b1 ≤ . Then |Pr[g1 = 1]− Pr[g2 = 1]| ≤
√
.
Proof. Let W be the weighted incoming sum to both g1 and g2. The probability that g1
outputs 1 is 1/(1 + e−(W−b1)), and the probability that g2 outputs 1 is 1/(1 + e−(W−b2)).
The following holds:
Pr[g2 = 1] = 1/(1 + e−(W−b2)) ≥ 1/(1 + e−(W−b1−)) = 1/(1 + ee−(W−b1))
≥ 1
e · (1 + e−(W−b1)) ≥
1
(1 +
√
) · (1 + e−(W−b1))
= (1−√)(1/(1 + e−(W−b1))) ≥ 1/(1 + e−(W−b1))−√ = Pr[g1 = 1]−
√
 .
ITCS 2020
48:36 The Computational Cost of Asynchronous Neural Communication
In the third inequality we use the fact that e < (1 +
√
) 1 and thus e < 1 +
√
. On the
other hand, since b2 ≥ b1 it holds that
Pr[g2 = 1] = 1/(1 + e−(W−b2)) ≤ 1/(1 + e−(W−b1)) = Pr[g1 = 1] .
Hence, we conclude that |Pr[g2 = 1]− Pr[g1 = 1]| ≤
√
 as required. C
Analysis of Step 1. In the first step of the construction, since each uniformly random gate ri
is 1 with probability 1/2, the value dec(r¯) is a uniform sample in {0, 1 . . . , (1/)4}. Therefore,
r′ = dec(r¯)/dec(a¯) = 4 · dec(r¯) is sampled uniformly at random from {0, 4, 24, 34, . . . 1}.
By a simple coupling argument, sampling r′ is equivalent to the process of sampling a uniform
random variable r1 ∈ [0, 1] and rounding it to the closest value of the form i · 4 for some
integer i. In this manner, these two samples have an additive distance of at most 4.
Analysis of Step 2. Denote the probability z outputs 1 by q, and the probability u outputs
1 by p. Recall that g is the probabilistic gate and g′ is the output gate of the Boolean circuit
that approximates g.
In the second step, we compute dec(y¯) = q · (b+ f(r′)) where q = (dec(a¯)− dec(r¯))k and
f(r′) = l̂n(r′/(1 − r′)). Then g′ outputs 1 iff dec(y¯) ≤ W · q, or simply iff b + f(r′) ≤ W .
Given that r′ ∈ [22, 1− 22] by Claim 40, b′ = b+ f(r′) satisfies that |b∗− b′| ≤ 32 where b∗
is a true sample from the Logistic distribution with mean b. Therefore, the following holds.
Pr[g′ = 1 | r′ ∈ [22, 1− 22]] = Pr[W ≥ b′ | r′ ∈ [22, 1− 22]]
≤ Pr[W + 32 ≥ b∗ | r′ ∈ [22, 1− 22]]
= 1/(1 + e−(W−b+3
2)),
and in addition
Pr[g′ = 1] ≥ Pr[W − 32 ≥ b∗ | r′ ∈ [22, 1− 22]] = 1/(1 + e−(W−b−32)).
Recall that Pr[g = 1] = 11+e−(W−b) . By claim 41 we conclude that |Pr[g′ = 1]−Pr[g = 1]| ≤ 3.
We note that r′ ∈ [22, 1− 22] with probability at least 1− 42. Hence, we conclude that:
Pr[g′ = 1] ≤ Pr[g′ = 1 | r′ ∈ [22, 1− 22]] + 42 ≤ Pr[g = 1] + 3+ 42 = p+ Θ() ,
and on the other hand:
Pr[g′ = 1] ≥ (1− 42) Pr[g′ = 1 | r′ ∈ [22, 1− 22]]
≥ (1− 42)(Pr[g = 1]− 3) ≥ Pr[g = 1]−Θ() .
Thus, |Pr[g = 1]− Pr[g′ = 1]| = O() as required.
Complexity. We assume that the bias and weights of the given probabilistic threshold gate g
are polynomial in 1/. We first claim that with high probability of 1−Θ(), the approximate
bias sampled from the almost Logistic distribution in also bounded by poly(1/).
B Claim 42. Given that |µ| = poly(1/), for a random variable x drown from the logistic
distribution with mean µ it holds that |x| = poly(1/) with probability greater than 1− .
Proof. By the definition of the Logistic CNF function it holds that
Pr[x > 2 ln(1/) + µ] = 1− 11 + e−2 ln(1/)−µ+µ =
2
1 + 2 < 
2 .
Y. Hitron, M. Parter, and G. Perri 48:37
On the other hand
Pr[x ≤ −2 ln(1/) + µ] = 11 + e2 ln(1/)−µ+µ =
1
1 + 1/2 < 
2 . C
Thus we can assume from now on that all integer numbers can be representing using
O(poly(1/)) bits. Using circuits for fast integers multiplication as described in [9] and
iterated addition [38, 31], there exists a Boolean circuit computing W · q as well as b · q using
poly(∆, log(1/)) gates and poly(log ∆, log(1/)) depth. When computing the polynomial
q · l̂n( r′1−r′ ) (of total degree 2k), calculating each term requires O(log k) multiplicity operations.
Since we have k summands, in total we use k · log k multiplicity operations, each requires
O(k · log k · 2O(log∗ k)) gates (and depth), and log k addition operations. The comparison
circuits uses poly(log 1/) gates and depth, and the final threshold gate circuit requires
poly(∆, log 1/) gates and depth poly(log ∆, log 1/). We conclude that the Boolean circuit
has poly(∆, log(1/)) gates and depth of poly(log ∆, log(1/)).
B.4.3 Synchronizing a Probabilistic Threshold Gate
In order to construct a synchronized neural network computing the Boolean Circuit described
in Lemma 17, we use the construction for synchronized Boolean circuits as described in
Lemma 15. We are left with describing the implementation of the random bits r¯ and the
constant bits a¯.
In order to represent a¯, we introduce k neurons a1, . . . , ak. If the ith bit in the binary
representation of dec(a¯) = (1/)4 equals 1 we set the bias of ai to be b(ai) = −1 and
otherwise we set the bias to be b(ai) = 1. As a result, the neurons that represent the bits
that are 1 in the binary representation fire on every round, and the other neurons idle
thought the execution.
In order to represent r¯ we introduce k spiking neurons r1, . . . , rk. For the computation
to succeed, we need to sample each random variable ri only once. Therefore, each neuron
ri has a very large bias b(ri) = poly(1/) and an incoming edge from the starter neuron
s with weight w(s, ri) = b(ri). As a result, as long as ri did not receive a spike from s,
with high probability it does not fire. On the other hand, when neuron ri receives a spike
from the starter neuron v∗ it fires with probability 1/2.
B.5 The Complete Synchronization Scheme
Finally, we describe the synchronizer for a given neural network and prove Theorem 4. We
start by describing the construction for a network of deterministic threshold gates. The
adaptation to a network of spiking neurons is quite straightforward as discussed in the
end of the section. The construction has two parts: a global pulse generator that can
be used to synchronize many networks, and an adaptation of the given network N into a
network Sync(N ), see Figure 2. The pulse generator is implemented by a directed cycle
PG = [c0, . . . , ck] of length k = O(L4 log ∆). All neurons in PG have bias b(ci) = 1. In
addition, for every i ≥ 1 neuron ci has an incoming edge from neuron ci−1 with weight
w(ci−1, ci) = 1, and the first neuron c0 has an incoming edge from the last neuron ck with
weight w(c0, ck) = 1. The last neuron of the chain ck will declare the end of each phase. We
assume throughout that the simulation starts by a spike of the starter v∗ = c0.
Modifications to the Network Sync(N ). The input layer and output layer in Sync(N )
are exactly as in N . We will now focus on the set of auxiliary neurons V in N . The network
Sync(N ) contains the vertices V of the original network N , and in addition, for each neuron
vi ∈ V we add the following components to the network:
ITCS 2020
48:38 The Computational Cost of Asynchronous Neural Communication
A synchronized sub-network Sync(vi) using Lemma 16 implementing the threshold gate
defined by neuron vi. The input neurons to the sub-network Sync(vi) are the incoming
neighbors of vi in N . The first neuron v∗i in the internal chain of the sub-network Sync(vi)
has an incoming edge from the Lth neuron of PG cycle, namely cL with weight 1 and
bias b(v∗i ) = 1. Denote the output of the sub-network Sync(vi) by vouti .
An AND module ANDi whose output neuron is vi. This module is implemented by a
circuit of ORsync and NOTsync gates with three layers (using simple De-morgan rule). The
ANDi module receives input from the neuron vouti and from the (αL4)th neuron in PG,
cαL4 where α is a large enough constant. The internal chains of the ANDi circuit receive
input from neuron cβ in PG where β = αL4 + L, making sure the circuit begins the
execution after receiving all its inputs11.
Modifications to the Circuit Synchronization of Sec. 4.1. So far, we handled the syn-
chronization of circuits. In order to handle general networks (e.g., that contains self-loops
and recurrent edges), we need to apply small adaptations to the synchronized sub-networks of
Sec. 4.1. Specifically, unlike circuits, in the execution of a network, certain neurons (or gates)
might be activated several times. To be able to re-use the sync. sub-networks throughout
the execution, we need to reset the states kept by their self-loops.
We therefore adapt the construction of the sync. sub-network presented in Section 4.1
to reset themselves at the end of their computation. For each NOTsync gate, we augment
its internal chain by 3 · L2 neurons, and the last neuron of this chain is connected to an
inhibitor neuron vr. The inhibitor vr has outgoing edges of weight −∞ to all neurons in the
sub-network. Due to Claim 30, it holds that the inhibition by vr (i.e., the round in which
vr fires) occurs after the output neuron has already fired. Observe that the timing of the
inhibition by vr is set in a way that guarantees that all gates in the sub-network will be idle
from that point on (i.e., there will be no delayed spikes that arrive after this inhibition). For
the sub-network ORsync which do not contain self-loops, no adaptation is needed.
B.6 Correctness
Throughout, we fix a synchronous execution Πsync and an asynchronous execution Πasync.
For every neuron v and phase p, define the beginning of phase p of v in the asynchronous
execution (r(v, p)) as the round in which the pth spike of c0 is fired. I.e., the pth phase of v
is the time interval [r(v, p), r(v, p+ 1)). For every round p, let V +sync(p) be the set of neurons
that fire in round p in Πsync (i.e., the neurons with positive entries in σp). Similarly, let
V +async(p) be the set of neurons that fire during phase p.
I Lemma 43. The networks Sync(N ) and N have similar executions.
In order to show the networks Sync(N ) and N have similar executions, we show by induction
on round (resp., phase) p that V +sync(p) = V +async(p). For p = 1, let V +sync(0) be the neurons
that fired at the beginning of the simulation in round 0. We will show that every neuron
vi ∈ V fires in phase 1 iff vi ∈ V +sync(1). For vi ∈ V , the spikes from its incoming neighbors
in V +sync(0) reach the sub-network Sync(vi) by round L. The global chain in Sync(vi) is then
activated by the neuron cL in some round τ ∈ [L,L2]. Therefore, by Lemma 16 there exists
a constant γ such that vouti fires in some round τi ∈ [2, τ + γ · log ∆ · L4] iff the output of the
11We say that the circuit receives its input, if every gate in the first layer has received the signals from its
incoming input.
Y. Hitron, M. Parter, and G. Perri 48:39
threshold function corresponding to vi is 1, meaning that vi ∈ V +sync(1). We next note that
the first layer of the sub-network ANDi consists of two NOTsync sub-networks with input
from cα·L4 and vouti . Hence, by Observation 32 as long as ANDi receives the information
from cα·L4 and vouti before the activation of the global chain of the network ANDi in some
round τ∗ its output neuron fires by round τ∗ +O(L4) iff both vouti and cα·L4 fired.
The global chain of ANDi is activated by neuron cβ for β = α · L4 + L and therefore is
indeed activated after ANDi receives the spike from cαL4 . In addition, we choose α such that
αL4 > L2 + γ · log ∆ · L4. Therefore the neuron cβ fires after round τi + L, i.e. after ANDi
received the spike from vouti as well. We conclude that vi fires in some round τ ′′ ∈ [β,O(L4)]
iff vouti fires in round τi. We choose k to be large enough to make sure that ck fires after
round τ ′′ and therefore all neurons in V +(1) fired during the first phase.
Next, we assume that V +sync(p) = V +async(p) and consider phase p+ 1. Let τp be the round
that c0 fired at the beginning of phase p and let τp+1 be the round in which c0 fired at
the beginning of phase p+ 1. In addition, we denote the round in which cαL4 fired during
phase p by τα. By the induction assumption, neuron vi fires between round τp and round
τp+1 iff vi ∈ V +sync(p). Moreover, since the activation of the sub-network ANDi is performed
by neuron cβ , every vi ∈ V +sync(p) fires after round τα. We choose α to be large enough
such that by round τα, all sub-networks Sync(vi) have been reset due to the modification
in the circuit synchronization. Hence, for neuron vi ∈ V , the spikes from its incoming
neighbors in V +async(p) reach Sync(vi) after the sub-network has already been reset. Thus,
when the global chain of the sub-network Sync(vi) is activated by the neuron cL in round
τL ∈ [τp+1 + L, τp+1 + L2], the sub-network Sync(vi) received spikes from the incoming
neighbors of vi in V +async(p). Combining with Lemma 16 we conclude that vouti fires in round
τi ∈ [τp+1 + L, τp+1 + L2 + γ · log ∆ · L4] iff vi ∈ V +sync(p+ 1). Thus, when neuron cβ fires in
phase p+ 1, the sub-network ANDi has received the spikes from both vouti and cαL4 . Since
the global chain of ANDi is activated by the neuron cβ , we conclude that vi fires in some
round τ∗ ∈ [τp+1 + β, τp+1 + Θ(L4)], iff vi ∈ V +sync(p+ 1). Choosing k to be large enough, τ∗
occurs before ck fires and ends the phase.
Synchronization of a Spiking Neural Network. We next explain the adaptation of the
construction given a network of spiking neurons N . Let n be the number of auxiliary neurons
in N and let t be the number of rounds. Each spiking neuron implemented by a probabilistic
threshold gate can be made synchronized using Cor. 19 where we use an error parameter of
 = 1/ poly(n, t). Thus, The network Sync(N ) consists of poly(∆, logn · t) · L4 · n auxiliary
neurons and uses poly(log ∆, logn · t) · L5 rounds.
To compare the simulation of the given spiking neural network N and the synchronized
network Sync(N ), we fix the randomness used by N throughout the simulation and use these
coins when simulated the network Sync(N ). For neuron v ∈ V and round τ ≥ 1, by Cor. 19,
with probability at least 1 − 1/ poly(n · t) it holds that v ∈ V +sync(τ) iff v ∈ V +async(τ). By
applying the union bound over all n neurons and t rounds of the simulation, we conclude
that with high probability N and Sync(N ) have similar executions.
C Synchronization in the Node-Delay Model
C.1 Network Dynamics in the Node Delay Setting
Network evolution proceeds in seconds, namely, a sufficiently small time unit. For a given
integer T ≥ 1, the dynamics is specified by a node-delay function t : V → N≤T interpreted
as follows: the round duration on each neuron v consists of t(v) seconds. Specifically, the ith
ITCS 2020
48:40 The Computational Cost of Asynchronous Neural Communication
round of v is defined by the time interval Ri(v) = [(i− 1)t(v) + 1, i · t(v)] for every i ≥ 1. All
spikes are assumed to arrive with a delay of a single second12. For the neuron v and integer
i, the set of spikes received at v during its ith round is given by
A(v, i) = {(u, j · t(u)) | j · t(u) + 1 ∈ Ri(v)}.
The state of v in its i-round (i.e., at the second i · t(v)) is given by:
pot(v, i) =
∑
(u,j·t(u))∈A(v,i)
w(u, v) · σj(v) − b(v) and σi(v) = 1 iff pot(v, i) ≥ 0 . (6)
If v is a probabilistic threshold gate then it fires in second i · t(v) with probability p(v, i) =
1
1+e− pot(v,i) .
I Definition 44 (The T -bounded Node-Delay Setting). We are given a network N and an
integer T . It is assumed the network contains a special neuron, the starter, that fires in
the first round of the simulation. The dynamic is determined by a node-delay function
t : V → N≤T . This function t can be chosen arbitrarily.
I Definition 45 (Computation of a Boolean Function in the T -bounded Node-Delay Setting).
Let f : {0, 1}n → {0, 1}k be a Boolean function. A network N with n input neurons x1, . . . , xn
and k output neurons z1, . . . , zk computes f in this setting if for every T -bounded function
t : V → N≤T and for every fixed possible assignment to the input neurons b1, . . . , bn the
following holds: (i) If fi(b1, . . . , bn) = 1, then there exists a round in which zi fires, where
fi(·) is the ith bit in the output of f . (ii) If fi(b1, . . . , bn) = 0 then zi does not fire throughout
the entire execution.
Synchronizers for the Node-Delay. A synchronizer ν is an algorithm that gets as input a
network N and integer T , and outputs a network N ′ = syncV (N , T ) that contains all the
neurons of N , plus additional auxiliary neurons. One of the auxiliary neurons in N ′ is a
starter neuron that fires in the first round of the simulation. The network N ′ works in the
asynchronous setting and should have similar execution to N in the sense that for every
neuron v ∈ V (N ), the firing pattern of v in the asynchronous network should be similar to
the one in the synchronous network. The output network N ′ simulates each round of the
network N by a phase.
I Definition 46 (Phases). We partition the execution of N ′ into phases 1, 2, . . ., using a
function r : V (N )× N→ N that defines the beginning of phase p. Hence, the pth phase is
the round interval [r(v, p), r(v, p+ 1)).
I Definition 47 (Similar Executions (Deterministic Networks)). The synchronous execution
Π of a deterministic network N is specified by a list of states Π = {σ1, . . . , } where each σi
is a binary vector describing the firing status of the neurons in round i. The asynchronous
execution of the network N ′ = syncV (N , T ) with a node-delay function t : V → N≤T denoted
by Π′(t) is defined analogously only when applying the asynchronous dynamic. The execution
Π′(t) is divided into phases according the a function r : V (N )× N→ N.
The network N and the pair 〈N ′, t〉 have a similar execution if V (N ) ⊆ V (N ′), and
in addition, a neuron v ∈ V (N ) fires in round p in the execution Π iff v fires during phase p
in Π′(t).
12As discussed in the introduction, this model can be generalized to support both edge-delays and
node-delays, to isolate the node-delay effect we assume that all edges have latency of 1.
Y. Hitron, M. Parter, and G. Perri 48:41
The networks N and N ′ are similar if N and 〈N ′, t〉 have a similar execution for every
node-delay function t.
As for the edge-delay model, the extension for randomized networks is made by fixing
the random bits in the simulation of the input network.
C.2 Reduction to the Edge-Delay Model: A Simulation Result
Given a neural network N and an integer parameter T , our goal is to construct a network
NR = syncV (N , T ) in the T -bounded node-delay model that behaves similarly to N , i.e.,
that N and NR are similar according to Definition 47.
Given the network N and the delay bound T , we start by computing the network
NL = syncE(N , L) with L = 5T 2. The desired NR = syncV (NT ) is obtain by changing some
of the edge weights in NL. Our proof of correctness is based on similarity between a network
in the node-delay model and a network in the edge-delay model.
Similarity between the networks NR and NL. Fix integer parameters T, L. Given an
edge-delay network NL, a latency function ` : E(NL)×N→ N≤L, a node-delay network NR
on the same neuron set and a node-delay function t : V (NR) → N≤T , we want to define
similarity between the simulations 〈NL, `〉 and 〈NR, t〉, where both simulations use the same
initial configuration.
This notion of similarity is based on defining different time scales in each of the simulations.
Specifically, for every i ≥ 1 and neuron u ∈ V the time window Ri(u) will be the time that u
collects spikes for its round i in the simulation of 〈NR, t〉. Moreover, for every i ≥ 0 the time
window Li(u) correspond to the firing period of round i of u in the simulation of 〈NR, t〉,
where
Ri(u) = [(i− 1) · t(u) + 1, i · t(u)] and Li(u) = [i · T · t(u), i · T · t(u) + (T · t(u)− 1)].
Furthermore, for every second τR in the simulation of 〈NR, t〉 we will have the corresponding
block BτR = [τR · T, τR · T + (T − 1)] in the simulation of 〈NL, `〉. For the simulation of
〈NL, `〉 define for every neuron u and i ≥ 0:
σi(u,NL) =

1 u is strong and u fires in every τL ∈ Li(u)
1 u is weak and u fires in τL ∈ Li(u) only for τL = i · T · t(u)
0 u never fires in Li(u)
∅ Otherwise.
For the simulation of 〈NR, t〉 define for every neuron u and i ≥ 0:
σi(u,NR) =
{
1 u fires in round i of u (i.e. in the second i · t(u))
0 Otherwise.
I Definition 48. The simulations 〈NR, t〉, 〈NL, `〉 are similar, denoted as 〈NR, t〉 ∼ 〈NL, `〉,
if for every neuron u and i ≥ 0 it holds that σi(u,NL) = σi(u,NR).
A network NL in the L-bounded edge-delay model and a network NR in the T -bounded node-
delay model are similar, denoted by NL ∼ NR, if for every node-delay function t : V (NR)→
N≤T there exists a latency function ` : E(NL)× N→ N≤L such that 〈NR, t〉 ∼ 〈NL, `〉.
The key simulation lemma used in the synchronization scheme is as follow:
ITCS 2020
48:42 The Computational Cost of Asynchronous Neural Communication
I Lemma 49. Given a network NL in the L-bounded edge delay model such that:
1. b(u) > 0 for every neuron u.
2. Every weak neuron v has no self-loop.
3. There is no edge from a strong neuron to a strong neuron.
4. Every negative edge has weight −∞.
5. For every neuron u, either any excitatory incoming neighbor of u is weak, or any excitatory
incoming neighbor of u is strong.
6. Let v be a strong incoming neighbor of a neuron u, and let f be an inhibitor. Then if f
has an edge to v, it also has an edge to u.
Then there exists a network NR in the T -bounded node-delay model with T ≤
√
L/5 with
V (NR) = V (NL) such that NR and NL are similar.
Defining the node-delay network NR. The network NR is exactly as NL, up to small
adaption of the weights. Denote by wL : V → R the weight function of the network NL.
Define the weight function wR of NR as
wR(v, u) =
{
T · wL(v, u) v 6= u, v is strong,
wL(v, u) Otherwise.
Correctness. We will show that NL and NR are similar. Fix a node-delay function t : V →
N≤T . First, we define the corresponding latency function ` and prove it is valid, i.e. that ` is
nice and `(v, u, τ) ∈ [1, L] for every neurons v, u and round τ . Then, we restate Lemma 49
in order to prove its correctness by induction on the round.
Definition of the latency function `. First, set the latency of self-spikes to be of value 1.
For a neuron u, we say that u is weak-incoming if any excitatory incoming neighbor of u
is weak, and we say that u is strong-incoming if any excitatory incoming neighbor of u is
strong. Note that by property 5, every neuron u is either weak-incoming or strong-incoming.
For a strong-incoming neuron u, an inhibitor v and τL ≥ 0, set `(v, u, τL) = 2T 2 + 1. Now
consider the remaining spikes, which are either positive spikes, or spikes to a weak-incoming
neuron u.
For every τL ≥ 0 define the latency value for the spike event 〈v, u, τL〉 as follows. Let j
be an integer satisfying that τL ∈ Lj(v), and let i be such that j · t(v) + 1 ∈ Ri(u), hence
(v, j · t(v)) ∈ A(u, i).
If v is weak, then for τL = j · T · t(v) set `(v, u, τL) = i · T · t(u) − τL. That is, the
spike 〈v, u, τL〉 is scheduled to arrive in the first round of Li(u). For τL > j · T · t(v), set
`(v, u, τL) = 1. Otherwise, if v is strong, consider the following argument. For every second
τL in the edge-latency simulation, let τR be the second in the node-delay simulation such
that τL ∈ BτR .
Case (I): there exists a second in [τR + 1, τR + 2T ] such that u fires in the node-delay
simulation, let τ ′R be the first such second. Set `(v, u, τL) = τ ′R · T − τL, that is schedule
〈v, u, τL〉 to arrive in round τ ′R · T .
Case (II): case I does not apply, and there is an inhibitor f which is an incoming
neighbor of u, and a second τ ′R ∈ [τR − T, τL + 2T ] such that f fires in τ ′R in the node-
delay simulation. Then for such τ ′R, set `(v, u, τL) = τ ′R · T + (2T 2 + 1) − τR, that is
schedule 〈v, u, τL〉 to arrive in round τ ′R · T + (2T 2 + 1).
Case (III): neither case (I) nor case (II) apply. Set `(v, u, τL) = 1.
Y. Hitron, M. Parter, and G. Perri 48:43
The intuition is that for a positive spike in the edge-delay simulation, we look for a round
such that u is supposed to fire in the next 2T 2 rounds. If we cannot find one, we want to
send the spike to a round that we know it will not activate u. This is a round in which u
receives a negative spike (since negative spikes are of weight −∞). If such round also does
not exist, it implies that the total weight of positive incoming neighbors of u that fired in
round τL is low, and we can schedule all these spikes to arrive together in τL + 1 without
activating u. We next show that ` is valid.
B Claim 50. ` is a valid latency function for NL.
Proof. First, since all self-spikes have latency value 1, ` is nice. For a negative spike 〈v, u, τL〉
such that u is strong-incoming, it holds that `(v, u, τ) = 2T 2 + 1 < L. Therefore we are
left to show validity for positive spikes, and for negative spikes that are fired towards a
weak-incoming neuron. Consider a spike 〈v, u, τL〉, and let j be an integer satisfying that
τL ∈ Lj(v). Furthermore, let i be an integer such that j · t(v) + 1 ∈ Ri(u).
Next, assume that v is weak. We distinguish between two cases depending whether τL is
the first round in the block or not. For τL = i · T · t(v) we have `(v, u, τL) = i · T · t(u)− τL.
Recall that Ri(u) = [(i− 1) · t(u) + 1, i · t(u)], thus j · t(v) + 1 ≤ i · t(u), and `(v, u, τL) =
T · (i · t(u) − j · t(v)) ≥ T ≥ 1. Furthermore j · t(v) + 1 ≥ (i − 1) · t(u) + 1, hence
i · t(u)− j · t(v) ≤ t(u) ≤ T , and `(v, u, τL) ≤ T · (i · t(u)− j · t(v)) ≤ L. Otherwise, i.e. for
τL ≥ i · T · t(v), it holds that `(v, u, τL) = 1, and thus `(v, u, τL) ∈ [1, L].
It remains to consider the case where v is strong. Let τR be the second in the node-
delay simulation such that τL ∈ BτR . Consider the definition of ` for a spike 〈v, u, τL〉.
In case (I), we have `(v, u, τL) = τ ′R · T − τL, and since τ ′R ∈ [τR + 1, τR + 2T ] it holds
that 1 ≤ τ ′R · T − τL ≤ 2T 2 < L. In case (II), since τ ′R ∈ [τR − T, τR + 2T ], we have
that `(v, u, τL) = τ ′R · T + (2T 2 + 1) − τL ∈ [1, 5T 2]. Finally, in case (III) we simply have
`(v, u, τL) = 1. Hence, in all cases it holds that `(v, u, τL) ∈ [1, L]. C
In order to show that 〈NR, t〉 ∼ 〈NL, `〉, we restate the condition for similarity in the following
lemma. We then prove the lemma by induction on the round τL.
I Lemma 51 (Restating Lemma 49). For every round τL ≥ 0 of the simulation 〈NL, `〉 and
for every neuron u, let i be such that τL ∈ Li(u). Then:
1. If σi(u,NR) = 1:
If τL = i · T · t(u) then u fires in τL.
If τL > i · T · t(u) then u fires iff u is strong.
2. If σi(u,NR) = 0 then u does not fire in τL.
For the base case τL = 0, the correctness follows the fact that both simulations have the
same starting configuration. Now, let τL ≥ 1 and assume correctness for every τ ′L ≤ τL − 1.
Fix a neuron u and let i be an integer such that τL ∈ Li(u). We start with a useful auxiliary
claim.
B Claim 52. Let u be a weak-incoming neuron, v an incoming neighbor of u, and τ ′L ≥ 0.
Furthermore, let j be such that τ ′L ∈ Lj(v), and i such that j · t(v) + 1 ∈ Ri(u). Then the
spike 〈v, u, τ ′L〉 occurs and arrives to u in round τL = i · T · t(u) in the simulation 〈NL, `〉 iff
τ ′L = j · t(v) and the spike 〈v, u, j · t(v)〉 occurs and arrives to u in Ri(u) in the simulation
〈NR, t〉.
Proof of Claim 52. Since u is weak-incoming v is weak, then by the induction assumption for
τ ′L and the definition of `, the spike event 〈v, u, τ ′〉 occurs and arrives in round τL iff there
exists j such that τ ′L = j · T · t(v) and σj(v,NR) = 1. This happens iff in the simulation of
〈NR, t〉 the spike event 〈v, u, j · t(v)〉 occurs and arrives to u in Ri(u). C
ITCS 2020
48:44 The Computational Cost of Asynchronous Neural Communication
We split the proof of Lemma 51 into two cases.
Case 1: u is weak-incoming. Assume τL = i ·T · t(u), we want to show that u fires in round
τL iff σi(u,NR) = 1. By Claim 52, we get that the mapping 〈v, u, j ·T ·t(v)〉 7→ 〈v, u, j ·t(v)〉
is a bijection between the set of non self-spikes that u receives in τL in the simulation
〈NL, `〉 and the set of non-self spikes that u receives in Ri(u) in the simulation 〈NR, t〉.
As for self-spikes, note that if u is weak it has no self-loop. If u is strong, then by the
induction assumption u fires in τL − 1 iff σi−1(u,NR). Thus, u receives the self-spike
〈u, u, τL − 1〉 in τL iff it receives the self-spike 〈u, u, (i − 1) · T · t(u)〉 in Ri(u). Since
wL(v, u) = wR(v, u) for every weak neuron v and for v = u, we get that the total spike
weight that u receives in τL equals to the total spike weight it receives in Ri(u). Thus, u
fires in round τL iff σi(u,NR) = 1.
Now, assume τL > i · T · t(u) and that either v is weak, or v is strong and σi(u,NR) = 0.
We want to show that u does not fire. Note that if v is weak then it has no self-loop, and
if v is strong and σi(u,NR) = 0 then by the induction assumption for τL − 1, u does not
fire in τL − 1. Thus, in both cases u does not receive a self-spike in τL. Furthermore, u
has no strong neighbors, therefore by Claim 52 u does not receive any positive spikes
from incoming neighbors. Since b(u) > 0, u does not fire in τL.
Finally, assume τL > i · T · t(u), and assume u is strong and σi(u,NR) = 1. We want
to show that u fires. Note that by Claim 52, u does not receive a negative spike in τL.
Furthermore, since σi(u,NR) = 1 by the induction assumption for τL− 1, u fires in τL− 1
and therefore u receives a self-spike in τL. Since wL(u, u) ≥ b(u), u fires in τL.
Case 2: u is strong-incoming. By the properties of NL there is no edge between strong
neurons, and weak neurons have no self-loop. Hence u is weak and has no self-loop. We
handle separately the following sub-cases:
Case 2.1: σi(u,NR) = 1 and τL = i · T · t(u). We want to show that u fires in τL.
Let 〈v, u, j · t(v)〉 be a positive spike in the simulation 〈NR, t〉 that arrives to u in
Ri(u), and let τ ′L be one of the T rounds [j · T · t(v), j · T · t(v) + (T − 1)]. Since v is
strong then by the induction assumption for τ ′L v fires in τ ′L, and therefore the spike
event 〈v, u, τ ′L〉 occurs in the simulation 〈NL, `〉. We now show that 〈v, u, τ ′L〉 arrives
to u in τL, according to the definition of ` for spikes from strong neurons.
Since σi(u,NR) = 1, u fires in the second rR = i · t(u) in the simulation 〈NR, t〉.
Note that j · t(v) + 1 ∈ Ri(u) implies that i · t(u)− j · t(v) ≤ T . Hence in particular
i · t(u) ∈ [j · t(v) + 1, j · t(v) + 2T ]. Let r′R ∈ [j · t(v) + 1, j · t(v) + 2T ] with r′R < i · t(u).
Note that r′R ∈ Ri(u), therefore r′R is not an end of a round of u. Hence u does not fire
in r′R. Therefore the second rR = i · t(u) is the first second in [j · t(v) + 1, j · t(v) + 2T ]
that u fires, and due to the definition of ` the spike 〈v, u, τ ′L〉 arrives in round τL.
Now, let f be an inhibitory incoming neighbor of u. By the definition of `, a spike
from f to u can arrive only in a round of the form τ ′R · T + T 2 + 1 for some second τ ′R,
which is not a multiplicity of T . Note that τL = i · T · t(u) is a multiplicity of T . Thus
u does not receive a negative spike in τL.
We get that in round τL, u receives only positive spikes in τL, and for every positive spike
〈v, u, j · t(v)〉 that arrives to u in Ri(u) and every τ ′L ∈ [j ·T · t(v), j ·T · t(v) + (T − 1)],
u receives a spike 〈v, u, τ ′L〉. Since wR(v, u) = T · wL(v, u) for every strong v and
[j · T · t(v), j · T · t(v) + (T − 1)] contains T rounds, we get that the total spike weight
that u receives in τL is at least the total spike weight it receives in Ri(u) in the
node-delay simulation. Since σi(u,NR) = 1, u receives in Ri(u) a spike weight of at
least b(u), which implies the same for round τL in the edge-delay simulation. Thus u
fires in round τL.
Y. Hitron, M. Parter, and G. Perri 48:45
Case 2.2: σi(u,NR) = 0 or τL > i · T · t(u). We want to show that u does not fire
in τL. Towards contradiction, assume that it does. First note that if u receives no
positive spikes in τL, then since b(u) > 0 u does not fire in τL. Otherwise, let 〈v, u, τ ′L〉
be a positive spike that arrives to u in round τL. Recall that since v is strong, there
are three cases for defining the latency value of 〈v, u, τ ′L〉.
We will now show that 〈v, u, τ ′L〉 belongs to case (II). It does not belong to case (I),
since it does not hold that σi(u) = 1 and τL = i · T If we are in case (II), then there
exists an inhibitor f which is connected to u that fired in second τ ′R in the node-delay
simulation that arrived in τL, i.e. such that τL = τ ′R · T + (T 2 + 1). By the induction
assumption for τ ′R · T , u fires in round τ ′R · T in the edge-delay simulation, and since
u is strong-incoming then by the definition of ` the spike 〈f, u, τ ′R · T 〉 arrives to u in
round τ ′R · T + (T 2 + 1) = τL. Since negative spikes are of weight −∞, u does not fire
in τL. Therefore, 〈v, u, τ ′L〉 belongs to case (III).
By the definition of case (III), 〈v, u, τ ′L〉 was generated in round τ ′L = τL − 1. Let j be
such that τL − 1 ∈ Lj(v), and let τR such that τL − 1 ∈ BτR . Furthermore, let v be an
excitatory incoming neighbor that fires in τL − 1, let j be such that τL − 1 ∈ Lj(v),
and let τR such that τL − 1 ∈ BτR . Our goal is to show that v fires in [τR + 1, τR + T ]
in the node-delay simulation, by showing that it receives enough positive spikes from
its neighbors in this interval.
Let f be an inhibitor that has an edge to v. By the network properties f also
has an edge to u, and since we are not in case (II) in the definition of `, f does
not fire in the interval [τR − T, τL + 2T ] in the node-delay simulation. This implies
that v does not receive a negative spike in [τR − T + 1, τR + 2T + 1]. Notice that
τR ∈ [j · t(v), (j + 1) · t(v)− 1], and since t(v) ≤ T we get
Rj+1(v) = [j · t(v) + 1, (j + 1) · t(v)] ⊆ [τR − T + 1, τR + 2T + 1].
Therefore, v does not receive a negative spike in Rj+1(v).
By the induction assumption for τL− 1 we have σj(v,NR) = 1. Together with the fact
that v is strong and receives no negative spikes in Rj+1(v), we get that σj+1(v,NR) = 1,
i.e. v fires in the node-delay simulation in the second (j + 1) · t(v). This implies that u
receives a spike from v in (j + 1) · t(v) + 1, which is inside the interval [τR + 1, τR + T ].
If so, let W the total weight of the incoming neighbors of u that fired in round τL − 1.
Since u fires in round τL, it holds that W ≥ b(u). We will show this implies that u
fires in some round in [τR + 1, τR + 2T ], which contradicts the fact that none of the
arriving spikes belong to case I.
We showed that for every neuron v that fires in τL−1 in the edge-latency simulation, u
receive a spike from v in some round τ ′R ∈ [τR+ 1, τR+T ] in the node-delay simulation.
By the definition of wR it holds that wR(v, u) = T · wL(v, u), and therefore we get
τR+T∑
τ ′
R
=τR+1
Wτ ′
R
≥ T ·W.
By an averaging argument there is a second τ ′R ∈ [τR + 1, τR + T ] with Wτ ′R ≥
(T ·W )/T = W .
Let i′ be an integer such that τ ′R ∈ Ri′(u). Therefore u receive in Ri′(u) a total positive
spike weight of at least W ≥ b(u). Furthermore, since no spike belongs to case C.2,
u do not receive a negative spike in Ri′(u) ⊆ [τR + 1, τR + 2T ]. Thus, u fires in the
second i′ · t(u) ∈ [τR + 1, τR + 2T ], a contradiction.
ITCS 2020
48:46 The Computational Cost of Asynchronous Neural Communication
C.3 The Complete Synchronization Scheme
We are now ready to complete the proof of Theorem 5. We consider a neural network N
and an integer parameter T . Set L = 5T 2 and let NL = syncE(N , L) be the synchronized
network of N in the L-bounded node-delay model. We will now show that NL satisfies the
properties in the conditions of Lemma 49.
Showing that NL satisfies the properties of Lemma 49. Note that by the definition of the
edge-delay synchronization scheme given in Section 4.3, every neuron u ∈ NL is contained in
one of the following modules: (1) an ORsync or NOTsync subnetwork (Section 4.1), (2) a chain
of a threshold gate which is a implemented as a boolean circuit subnetwork (Section B.3), or
(3) the chain of the global pulse generator (Section 4.3). By the definitions of these modules,
properties 1 and 2 hold. Moreover, together with the fact that edges between the modules
connect only weak excitatory neurons, we also get property 3. Furthermore, note that the
only inhibitors in the network are r and vr neurons (which is later added in 4.3) in the
NOTsync module, and all their edges have weight −∞. Therefore, property 4 is satisfied.
The remaining properties 5 and 6 are relevant only for strong neurons. Therefore, consider
the NOTsync module (Lemma 14), which is the only module that contains strong neurons.
By the module definition, there are two possible types of strong neurons: (i) the memory
neuron m, that is only connected to the reset neuron r; and (ii) intermediate neuron vi, that
is only connected to the output neuron z. In case (i), v has only one incoming inhibitor,
which is the neuron vr that resets the whole network after it finishes. Thus vr also has an
edge to r. In case (ii), v has two incoming inhibitors, vr and r, which both have an edge to
z. Therefore property 6 holds. Furthermore, both r and z have no edges from weak neurons.
Hence, property 5 holds.
Indeed, NL satisfies the conditions of Lemma 49, and therefore there exists a network
NR in the T -bounded node delay model which is similar to NL. We are left to show the
transitivity of similarity, i.e. that if N ∼ NL and NL ∼ NR, also N ∼ NR.
Showing transitivity of similarity. Let t be a node-delay function for NR. First, by the
similarities of the networkss we get V (N ) = V (NL) = V (NR). Moreover, by the definition of
NL ∼ NR there exists a latency function ` for NL such that 〈NL, `〉 ∼ 〈NR, t〉. Let Π be the
execution of N , ΠL be the execution of 〈N , `〉, and ΠR be the execution of 〈N , t〉. Let the
interval [rL(v, p), rL(v, p+ 1)) be the pth phase of ΠL, and define [rR(v, p), rR(v, p+ 1)) as
the pth phase of ΠR, where the definition of rR(v, p) is as follows. Let L∗p(v) be the earliest
block Li(v) whose first round τ∗p is contained in phase p of ΠL, then rR(v, p) = τ∗p /T . We
wish to prove the following claim.
B Claim 53. For every neuron v and p ≥ 0, v fires in round p of Π iff v fires in phase p
of ΠR.
First, note that by the construction of NL = sync(N ,L), every neuron v ∈ V (N ) can fire
only after the chain neuron cαL4+L fires. Since αL4 > t(v) · T this implies that v does not
fire in the first t(v) ·T rounds of each phase in ΠL. We prove the two directions of the claim.
Assume neuron v fires in round p in Π. Because N ∼ 〈NL, `〉 there is a round τL in
phase p of ΠL where v fires. Since v does not fire in the first t(v) · T rounds of each
phase we have τL ≥ rL(v, p) + t(v) · T . Since Lj(v) consists of t(v) · T rounds, the first
round of Lj(v) is in phase p. Therefore, j · t(v) · T ≥ τ∗, and therefore j · t(v) ≥ rR(v, p).
Y. Hitron, M. Parter, and G. Perri 48:47
Furthermore we have that j · t(v) is not in phase p+ 1. Hence also j · (v) < rR(v, p+ 1),
i.e. j · t(v) is in phase p of Π. Due to the similarity 〈NL, `〉 ∼ 〈NR, t〉, since v fires in
Lj(v) it also fires in j · t(v). Hence v fires in phase p of ΠR.
Assume that v fires in phase p in ΠR. Assume this happens in round τR, then τR ≥ τ∗p /T .
Thus j ·T · t(v) ≥ τ∗p ≥ rL(v, i). Furthermore, j · t(v) < τ∗p /T implies that round j ·T · t(v)
was before phase p+ 1 of ΠL. Therefore j · T · t(v) is in phase p of ΠL. By the similaritiy
〈NL, `〉 ∼ 〈NR, t〉 we have that v fires in round j · T · t(v) in ΠL. Hence v fires in phase p
of ΠL. By the similarity N ∼ 〈NL, `〉 we get that v fires in round p of Π.
ITCS 2020
