University of Pennsylvania

ScholarlyCommons
Departmental Papers (BE)

Department of Bioengineering

July 2004

A burst-mode word-serial address-event link--II: receiver design
Kwabena A. Boahen
University of Pennsylvania, boahen@seas.upenn.edu

Follow this and additional works at: https://repository.upenn.edu/be_papers

Recommended Citation
Boahen, K. A. (2004). A burst-mode word-serial address-event link--II: receiver design. Retrieved from
https://repository.upenn.edu/be_papers/4

Copyright 2004 IEEE. Reprinted from IEEE Transactions on Circuits and Systems--I: Regular Papers, Volume 51,
Issue 7, July 2004, pages 1281-1291.
Publisher URL: http://ieeexplore.ieee.org/xpl/tocresult.jsp?isNumber=29094&puNumber=8919
This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply
IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this
material is permitted. However, permission to reprint/republish this material for advertising or promotional
purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing
to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws
protecting it.
This paper is posted at ScholarlyCommons. https://repository.upenn.edu/be_papers/4
For more information, please contact repository@pobox.upenn.edu.

A burst-mode word-serial address-event link--II: receiver design
Abstract
We present a receiver for a scalable multiple-access inter-chip link that communicates binary activity
between two-dimensional arrays fabricated in deep submicron CMOS. Recipients are identified by row
and column addresses but these addresses are not communicated simultaneously. The row address is
followed sequentially by a column address for each active cell in that row; this cuts pad count in half
without sacrificing communication capacity. Column addresses are decoded as they are received but
cells are not written individually. An entire burst is written to a row in parallel; this increases
communication capacity with integration density. Rows are written one by one but bursts are not
processed one at a time. The next burst is decoded while the last one is being written; this increases
capacity further. We synthesized an asynchronous implementation by performing a series of program
decompositions, starting from a high-level description. Links using this design have been implemented
successfully in three generations of submicron CMOS technology.

Keywords
asynchronous logic synthesis, event-driven communication, neuromorphic systems, pipelining, pixel-level
quantization, serial-to-parallel conversion

Comments
Copyright 2004 IEEE. Reprinted from IEEE Transactions on Circuits and Systems--I: Regular Papers,
Volume 51, Issue 7, July 2004, pages 1281-1291.
Publisher URL: http://ieeexplore.ieee.org/xpl/tocresult.jsp?isNumber=29094&puNumber=8919
This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way
imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or
personal use of this material is permitted. However, permission to reprint/republish this material for
advertising or promotional purposes or for creating new collective works for resale or redistribution must
be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document,
you agree to all provisions of the copyright laws protecting it.

This journal article is available at ScholarlyCommons: https://repository.upenn.edu/be_papers/4

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 7, JULY 2004

1281

A Burst-Mode Word-Serial Address-Event Link—II:
Receiver Design
Kwabena A. Boahen

Abstract—We present a receiver for a scalable multiple-access
inter-chip link that communicates binary activity between two-dimensional arrays fabricated in deep submicron CMOS. Recipients
are identified by row and column addresses but these addresses are
not communicated simultaneously. The row address is followed sequentially by a column address for each active cell in that row;
this cuts pad count in half without sacrificing communication capacity. Column addresses are decoded as they are received but cells
are not written individually. An entire burst is written to a row in
parallel; this increases communication capacity with integration
density. Rows are written one by one but bursts are not processed
one at a time. The next burst is decoded while the last one is being
written; this increases capacity further. We synthesized an asynchronous implementation by performing a series of program decompositions, starting from a high-level description. Links using
this design have been implemented successfully in three generations of submicron CMOS technology.
Index Terms—Asynchronous logic synthesis, event-driven communication, neuromorphic systems, pipelining, pixel-level quantization, serial-to-parallel conversion.

I. SCALING TWO-DIMENSIONAL ARRAYS

E

VENT-DRIVEN demultiplexers are used to deliver binary
signals to arrays of parallel-processing cells. Traditionally,
clock-driven demultiplexers were used for this purpose. However, updating each and every cell regularly is wasteful if activity is sparse, either in time or in space. In that case, it is more
efficient to update a cell only when its input changes, which
may be accomplished simply by delivering the cell’s address to
the array. A decoder then selects the cell and either toggles its
input (level coded) or drives the input high briefly (pulse coded).
This address-event representation has been used to communicate pulse-coded outputs of silicon retinae [1]–[4] and cochleas
[5], to drive arrays of silicon neurons [1], [2], [6], [7], and to interface these neuromorphic chips with computers [5]. Interest
in event-driven multiplexer-demultiplexer links is increasing,
driven by the trend toward quantizing signals inside the array
(e.g., active pixel sensors [8]–[10] and pulse-coded neural networks [11], [12]).
We recently developed an event-driven multiplexer that
boosts capacity by reading an entire row of cells in parallel
[13]. As feature sizes shrink, it takes longer to cycle the row

Manuscript received January 3, 2002; revised November 2002. This work
was supported in part by the Whitaker Foundation and in part by the National
Science Foundation’s LIS/KDI and CAREER programs under Grant ECS9874463 and Grant ECS00-93851. This paper was recommended by Associate
Editor G. Cauwenberghs.
The author is with the Department of Bioengineering, University of Pennsylvania, Philadelphia, PA 19104-6392 USA (e-mail: boahen@seas.upenn.edu).
Digital Object Identifier 10.1109/TCSI.2004.830702

and column lines because faster logic (minimum-sized inverter
chain) is neutralized by larger load (cells per row or column).
This bottleneck limits prior multiplexer designs, where a single
active cell is read at a time [11], [14]–[16]. We broke the
bottleneck by exploiting parallelism—reading an entire row
simultaneously. Communication capacity is not compromised
when we serially encode the addresses of active cells in that
row because we use devices much larger than those in the
array. As communication capacity is boosted without sizing up
devices inside the array, our multiplexer design can exploit the
high integration densities deep submicron processes offer.
In this paper, we describe a complementary event-driven demultiplexer that writes an entire row of cells in parallel. Prior demultiplexer designs write a single cell at a time [1], [2], similar
to prior multiplexer designs. Hence, they also face a bottleneck.
Our new demultiplexer design provides a scalable solution when
paired with our recently developed multiplexer design to form
a parallel read–write link. The increase in parallelism as the
array gets denser enables the communication capacity to keep
increasing, despite the fact that faster logic is neutralized by
larger load. Our demultiplexer requires large devices only in the
periphery, where serial-to-parallel conversion occurs. The entire
parallel read–write link is implemented asynchronously—the
same approach adopted by prior link designs—to facilitate its
use in large heterogeneous multichip systems.
Similar to prior designs, our event-driven multiplexer-demultiplexer link provides virtual connectivity between cells in the
same array or in different arrays, which need not be on the
same chip. That is, the multiplexer, or transmitter, uses an encoder to generate an address that uniquely identifies an event’s
place of origin. Conversely, the demultiplexer, or receiver, uses
a decoder to recreate the event at the destination [1], [5], [14].
These virtual wires can be rerouted by using a look-up table
to translate an incoming address into one or more outgoing addresses [7], [17], [18]. Events may be fanned out to multiple receivers by using splits and merges [6], or with a shared bus [18],
[19]. Thus, in addition to providing point-to-point communication for parallel distributed processing in multi-chip systems, the
single-transmitter-single-receiver address-event link described
here can serve a wide variety of purposes when augmented appropriately.
However, unlike prior designs, our event-driven multiplexer–
demultiplexer link communicates row and column addresses
serially, rather than in parallel. By going serial, we cut the pad
count in half—without sacrificing communication capacity.
There is no loss in communication capacity because the multiplexer does not retransmit the row address if the next event is
from the same row. These events are communicated in a burst:

1057-7122/04$20.00 © 2004 IEEE

1282

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 7, JULY 2004

Fig. 2. Receiver specification: the receiver inputs an -bit address on port A
and communicates on the corresponding one of its dataless ports P .

Fig. 1. Receiver architecture: a two-way demultiplexer (U) directs row
addresses to one latch (D) and column addresses to another (E). When the
burst ends, the row’s address and its decoded column addresses (P) are written
to a second set of latches (E and M). As the row address is decoded and the
column data is written to that row (R), the next burst is received and its column
addresses are decoded.

the row address, a column address for each active cell, and a
termination signal. It is obvious which events are from the same
row since an entire row of cells is read out in parallel. Thus,
parallel readout makes it possible to eliminate redundant row
addresses and thereby communicate addresses serially without
sacrificing capacity.
The paper is divided into four sections. In Section II, we
present a high-level specification for the receiver, and decompose it into a hierarchy of concurrent processes. In Section III,
we present the final handshaking sequences and the resulting
asynchronous logic circuits; intermediate synthesis steps can be
found in the Appendix. Section IV concludes the paper. A parallel-read burst-mode transmitter and analysis and test results
are presented in companion papers [13], [20].
II. RECEIVER DESIGN
Our goal is to optimize the address-event receiver’s
row–column architecture in three ways. For an -cell array, the
following hold.
1) Multiplexing row and column addresses cuts pad count in
half.
2) Pipelining serial-to-parallel conversion increases communication capacity.
3) Writing a row’s events in parallel boosts communication
.
capacity by up to
As alluded to in Section I, we realize Optimization 1 by eliminating redundant row addresses, thus the drop in (peak) capacity
.We realize Optimization 2 by decoding the
is only 1 in
new row’s column addresses even while we are writing the previous row’s events into the array. Finally, we realize Optimizaevents in parallel assuming a square
tion 3 by writing up to
array.
A preview of the receiver architecture we developed is
shown in Fig. 1. In this section, we derive programs that
describe the behavior of each of these blocks by following a
synthesis methodology for asynchronous digital VLSI systems
developed by Martin [21] (tutorial examples are provided in
[22]). His methodology involves applying a series of program

transformations, starting from a high-level specification. As
each step preserves the logic of the original program, the resulting circuit is correct by induction. Thus, it is unnecessary to
deduce how these processes behave when executed in parallel,
which is extremely difficult. After decomposing the receiver
specification into a set of concurrent programs, we compile
these one-line programs into hardware in Section III.
A. High-Level Specification
We start by writing a high-level specification in the concurrent hardware processes (CHP) language, a hardware description language for asynchronous systems [21]. In CHP, logic circuits “execute” concurrent programs, for example

The program or process is named
and its argument
is named ; process and argument names are always set in
upper and lower case sans-serif font, respectively. As we are
as a call
describing hardware here, you should think of
to a silicon compiler that lays out a circuit with, for instance,
denotes infinite repetition; this
an -bit-wide datapath.
demarcates the body of the program. Semicolons (;) denote
inputs data from a port named
sequential execution.
and stores it in a local variable named ; port and variable
names are always set in italicized upper and lower case roman
outputs the data stored in
font, respectively. Similarly,
on port .
is a dataless communication on port ; its
only effect is to synchronize the two processes whose ports are
connected together. That is, this process waits until the other
one gets to the corresponding point in its program, or vise
versa. In the text, we will write “port ” to distinguish the port
itself from a communication performed on that port, which we
write simply as “ .” There is no such ambiguity in the code, as
only communications can appear in the body of the program.
A high-level block diagram of the address-event receiver
is shown in Fig. 2. We use selection,
, to choose the recipient. This program
construct picks a guard
that is true and executes the
corresponding program segment .1 In our case, the guard
is a single bit of , an -bit word obtained by decoding an
-bit address received on port , the receiver’s input, where
. And the program segment communicates on one
of the receiver’s dataless ports, , to signal the occurrence
of an event. Thus, we have

where
is the th bit of , which is assigned (:=) the value
with the address read from port .
returned by calling
1If all the guards are false, it waits for one to become true; they must be mutually exclusive.

BOAHEN: BURST-MODE WORD-SERIAL ADDRESS-EVENT LINK—II

1283

for the decoded address, and
tion. That is

Fig. 3. Row–column organization. (a) The k th column communicates with
l) as well as the column decoder (
). (b) The lth row
all the rows (
) and all the columns (
k). It
communicates with the row decoder (
services event recipients l + 1 through (l + 1) with its P ports.

This function converts a binary code ( -bit) into a one-hot one
( -bit).
Alternatively, the receiver process may be described succinctly using CHP’s replication construct:
, where each
is a program-segment and
is any operator that can be concatenated. As the selection
operator
can be concatenated, we have

The next step in the synthesis procedure is to decompose
this high-level specification into a hierarchy of concurrent
processes. These processes’ ports are then connected together
by channels. We present this connectivity information pictorially. These figures also give the names of instances (e.g.,
specifies an instance of
named
)
and their ports’ data types (e.g.,
specifies that port
outputs bytes). Ports that are defined neither as input nor
output are dataless by default. Port names that appear inside
a box are local to that instance; those outside are local to the
process within which that instance occurs.
B. Reorganizing Into Rows and Columns
Here, we decompose
into separate row, column,
,
, and
,
and decoder processes, named
respectively. These processes are connected as shown in Fig. 3.
This decomposition is accomplished through three program
transformations. For the first transformation, we reorganize
’s dataless ports into rows and columns and replace
and , that accept
its input, port , with two inputs, ports
row and column addresses, respectively. We use a 1-indecoder to select a row and a 1-in- decoder to select a cell in
that row. Thus, we have

where
and
. Parallel lines
denote
parallel execution. The decoded addresses are stored in local
-bit and -bit words named and , respectively.
For the second transformation, we implement address de. This 1-in- decoder uses
coding in a separate process,
a
-bit input, port , for the address, a local -bit word

dataless ports

, for selec-

Two instances of
, with
or , are used for row
and column decoding, respectively.
Having removed the decoders, we are left with a process con.
taining just the array of dataless ports, which we call
and dataless ports
This process uses dataless ports
to communicate with the row and column decoders, respectively. It probes these ports to find out which of its rows
or columns has been selected. The probe, , evaluates to true
(i.e., the
when there is a communication pending on port
other process is waiting). Having found the selected row and
communicates on the corresponding datacolumn,
less port and communicates with the column and row decoders
to acknowledge its selection. Thus, we have

For our third and final transformation, we break up
into column processes and row processes. As shown in
communicates with the column decoder using
Fig. 3,
communicates with the row decoder
its port while
using its port; these ports are dataless.
also performs
ports, which are connected
a column-wide broadcast on its
ports. A broadcast, denoted by the circle
,
to the rows’
waits for at least one recipient to respond.2 Thus, programs for
the column and row processes read

The second
communication is included to prevent the row
decoder from selecting another row until the column communication is finished. A second communication must be added to
as well to reflect this. With mutual exclusion guaranteed, it is no longer necessary to synchronize the row and
column decoders; they can read and decode addresses independently.
This serial-write architecture, where cells are written individually, was first implemented in [1]; it was pipelined by adding
latches to the decoder inputs in [23] and [24]. We design a parallel-write version in the next subsection.
C. Writing the Array in Parallel
and
to write all events
Here, we modify
destined for the same row in parallel, an innnovation introduced
in this work. These events are received in a burst, the row
address followed by a column address for each active cell [13].
Hence, we must demultiplex the row and column addresses
and detect when the burst ends, signaled by a reserved address
named , only then can the parallel write begin. We introduce
to perform these tasks, where
a process named
. Predecoded column addresses are
2This construct is not supported by CHP; its use is discouraged as there is
no delay-insensitive implementation. The reason is that communication signals
sent to inactive recipients are not acknowledged, and therefore, we cannot tell
if they have been cleared.

1284

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 7, JULY 2004

Fig. 4. Burst reception. (a) Latch ( ) is written to by the column decoder
). (b) Demultiplexer (
) relays
(decc) and read by the demultiplexer (
words from the latch ( ) to the bus (
). It also receives row and column
addresses from the receiver’s global A port and sends them to the appropriate
and
).
decoder (

stored in a -bit latch, named
. These processes are
connected as shown in Fig. 4.
[see Fig. 4(a)], we simply set the th bit (
For
or
) of its -bit word
when the th column
is selected
and clear every bit (
or
) after the
. Thus, its program reads
latch is read

is read
by
at the end of the burst,
after all the column addresses have been decoded [see Fig. 4(a)].
Hence, we assumed that the communication does not overlap
with any of the
communications, allowing us to include it
in the same selection statement. This mutual exclusion can be
, as we shall show next.
guaranteed by
For
[see Fig. 4(b)], we read bursts from the receiver’s word-serial input (port ), direct the first address to
the row decoder (port ) and direct subsequent addresses to the
column decoder (port ), until we receive . When that hap’s data to the array
and repens, we transfer
peat the procedure. Thus, we have

where and are local -bit words. We chose to delay passing
until the burst ends
. This
on the row address
rearrangement allows us to use a single control signal to initiate
row selection as well as column data transfer
, since,
at this point, all the column addresses have been decoded.
Note that the row address is the one that follows ; it is stored
in . Upon initialization, execution must begin at this
communication.
To write
’s data into the selected row, we combine
processes into a single -bit-wide bus, named
our
, as shown in Fig. 5(a). And we make
compatible by combining its dataless
ports into a single
-bit input, port , as shown in Fig. 5(b). We use concurrency,
, to update all the cells
selected in that row. This construct executes (concurrently) all

Fig. 5. Parallel write-in. (a) Bus (
) transfers -bit words handed off by
) to the selected row (rowl). (b) Row’s C ports have
the demultiplexer (
been combined into a single port C that inputs -bit words.

program segments,
have

, whose guards,

, are true.3 Thus, we

where is a -bit word that is local to each process.
into five concurIn summary, we have decomposed
, two of
, one
rent processes: One instance of
of
, one of
, and of
. These process’
ports are connected together as shown in Figs. 4 and 5. The next
step in the synthesis procedure is to compile these CHP programs into hardware.
III. RECEIVER IMPLEMENTATION
or clear
an output
Electrically, processes set
signal
, or wait for an input signal
to become true
or false
; tilde denotes logical complement. To
communicate, they must perform complementary four-phase
on an
sequences of actions and waits:
active port and
for its passive coundenotes repetition, just like in CHP. We
terpart , where
always append and to the port’s name to indicate its input
and output signals, respectively. Such signal names are always
set in lower case typewriter font. The active port’s output signal
is commonly called Request; the passive port’s
is the
so-called Acknowledge. At the signal level, we refer to as
and to as
—request first and acknowledge
second in both cases.
We have three choices of signal representations for data.
1) Bundled-data requires a single line per bit, in addition
to the request and acknowledge signals. The data is valid
when the request signal is set; otherwise it is invalid.
2) Straight-data dispenses with the request signal. Instead,
all zeroes signifies invalid data; any other word is considered valid. Both representations require matched delays
for data as well as request.
3This construct is not supported by CHP. Its use is discouraged because termination cannot always be guaranteed, but that is not the case here. Concurrency
waits for at least one guard to become true if necessary, just like selection does.

BOAHEN: BURST-MODE WORD-SERIAL ADDRESS-EVENT LINK—II

3) Dual-rail acheives delay-insensitive operation by encoding each bit using two lines: bit is true and bit is false
(denoted by appending or ). The data is invalid when
both are cleared; setting either transmits a one or a zero.
Handshaking expansion (HSE) is the procedure whereby
each communication in our CHP programs is fleshed out into
a full four-phase request-acknowledge sequence. Following
Martin’s synthesis procedure [21], we make two choices when
we perform HSE. First, we make output ports active and input
ports passive.4 The only exception is a port that is probed
.
must be passive, as the probe is implemented simply as
Symmetric links—dataless ports that are not probed on either
end—are dealt with on a case by case basis. Second, we use the
second half of the four-phase handshake to implement a second
communication on the same port—a two-phase handshake—if
these communications always occur in pairs. This optimization
is possible because the second half just returns the signals to
their initial state. So,we are free to clear them whenever it is
convenient to do so, a process known as reshuffling.
The final step in Martin’s synthesis procedure is compiling
HSE sequences into production-rule sets (PRS), which are
straightforward to implement with CMOS transistors. A
clears a bit
when a boolean
production rule
to set the bit
expression becomes true. We write
when the expression is false. An n-type field-effect transistor
(nFET) implements the former rule while a p-type field-effect
transistor (pFET) implements the latter; the two rules together
correspond to an inverter. Logical and and or (denoted by
& and , respectively, in PRS, or HSE) are implemented by
connecting FETs in series and in parallel. If both pull-up and
pull-down chains may be inactive at the same time, a weak
feedback-inverter must be added to overcome their leakage
currents. Such outputs are said to be state holding, as opposed
to combinational; the feedback-inverter is called a staticizer.
Active-low signals are allowed in PRS and at the circuit level;
their names have an underscore prepended (e.g.,
).
We present only the final HSE sequences and the synthesized
circuits in this section. Details of how we arrived at these reshufflings and how we compiled them into PRS are in the Appendix.
We recommend that you refer to Fig. 6 to see how these circuits
interact as you read their descriptions. To facilitate this, we include the block labels in this figure in HSE sequences and in
subsequent figure captions.
A. Demultiplexing
The CHP program for
[see Section II-C and
Fig. 4(b)] requires us to read addresses from the receiver’s
input (port ), store the row address, pass column addresses
on to the column decoder (port ), and then pass on the row
address (port ) and transfer the row word (port to port )
when the burst ends. These operations are implemented in this
here as well.
section; we implement
We use the passive counterpart of the transmitter’s three-wire
protocol [13] for port . That is, we use bundled-data but we
4Our choice is arbitrary—the direction that data flows is not necessarily restrictive.

1285

Fig. 6. Receiver schematic.
consists of a two-way demultiplexer (U)
that parses b-bit row and column addresses (Y;X), and a latch (controlled
(two
by block D) that holds the row address till the burst ends.
instances) includes a latch (controlled by block E) that drives its decoding
logic (represented by discs) with dual-rail encoded data (2b lines).
’s bit
cells consist of a buffer (P) and a memory (M).
includes a set of wires
’s cells (do; di), to the array (so; si), and to a controller
that connect
(ho; hi). These wires broadcast column data to
’s cells (R), thereby
implementing
as well.

distinguish row and column addresses using separate request
lines,
and
, respectively; they share the same acknowledge, . Fig. 6 shows the correspondence between these signals and the receiver’s Ry, Rx, and Ack signals mentioned
earlier. The sequence of waits and actions on these three lines
is
3[[ari];ao+;[ ~ aci];ao-]
3[[aci];ao-;[ ~ aci];ao+]

where denotes parallel execution, just like in CHP. The first
sequence’s first half receives the row address, while the second
half receives the burst-termination signal . Multiple column
addresses are received by executing the second sequence as
many times as desired, halfway through the first sequence.
This three-wire handshake is illustrated in Fig. 7. The address
switches back to the row address because the transmitter’s
).
address mux is controlled by the column request (i.e.,
remains high throughout.
Note that
To demultiplex addresses received on
’s port (passive), we augment the three-wire handshaking sequence above
and
(both active), which
with communications on ports
are connected to the row and column decoder, respectively [see

1286

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 7, JULY 2004

Fig. 7. Three-wire handshake: when ari becomes high, receiver reads the row
address y0 : b and raises ao. When aci becomes low, the receiver reads the
column address x0 : b and lowers ao. This column communication is completed
by taking aci high—the address reverts back to y0 : b—followed by ao. After
a second column address is received, the burst is terminated by taking ari low,
followed again by ao.

Fig. 8. Three-to-four-wire converter [U]. Requests ari and aci are relayed
to the decoders on ro and co, while acknowledges ri and ci are merged onto
ao.

Fig. 4(b)]. Thus, we obtained the following reshuffled HSE sequence:
# demux (U) #
3[[ari];ro+;[ri];ao+;[ ~ ari];ro-;[ ~ ri];ao-]
k3[[aci];co+;[ci];ao-;[ ~ aci];co-;[ ~ ci];ao+].

These sequences convert the three-wire handshake into a
four-wire one, with separate acknowledges for row and column
addresses. The second two-phase communication on port
now signals the end of a burst.
Compiling this HSE into PRS (see Appendix part A) yielded
becomes high, goes high,
the circuit shown in Fig. 8. When
which prompts to become high. Both of the AND gates’ inputs
are now high, so it drives high. Row-address reception is now
complete. Column-address reception starts with
becoming
low. Both of the NOR gates’ inputs are now low,so it drives high,
to become high. Hence, the inverter’s output
which prompts
low. Columngoes low, which forces the AND gate’s output
address reception is now complete. Now, initial states must be
is cleared once
goes back high and
returns
restored.
becomes low.
is cleared next, once
to the high state once
swings low, and
is cleared once
becomes low.
We do not decode the row address, which is read on the first
(two-phase) communication, until the second communicaabove). This delay is realized by
tion occurs (see
the following reshuffled HSE sequence (see Appendix part B),
which communicates with the three-to-four-wire converter on
its port (passive) and communicates with the row decoder on
its port (active), as shown in Fig. 6
# delay (D) #
3[[gi];go+;[ ~ gi& ~ pi];po+;[pi];go-;po-].

Fig. 9. Row-address delay [D]. Transfers row address from three-to-four-wire
converter ( gi; go) to row decoder (po; pi), after holding it in a latch (go) till
the burst ends.

As required, does not begin until the second half of starts,
which signals the end of a burst. At this point, all the column
addresses have been decoded, and a parallel write may begin as
soon as the row address is decoded.
Compiling this HSE into PRS (see Appendix part B) yielded
is low, so
goes high
the circuit shown in Fig. 9. Initially,
becomes low. After
goes back high, swings
as soon as
low, since both inputs to the NAND gate are now high. bego high, provided
is low.
also
coming low will make
strobes the latch that holds the row address (described in Secis high. It does
tion III-B). The latch becomes opaque when
goes low, after
benot become transparent again until
comes high, which signals that the address has been read.
becoming low also clears
by forcing high.
’s parallel read–write operation
is realized
by the following reshuffled HSE sequence, which uses a
are both active; a
straight-data representation. Ports and
third port, , which is passive and dataless, gives the go signal.
# transfer #
3[[hi];do+;[di];so+;[si];ho+;
[ ~ hi];do-;[ ~ di];so-;[ ~ si];ho-].

This sequence is implemented by three wires: tie to , to
, and to .
may also be implemented with wires [see
Fig. 5(a)]: these connect its passive input (port ) to its active
outputs (ports ). Thus, the parallel write is realized simply by
extending the lines (one per bit) into the array ( wires in all),
connecting them to every row.
The triangular communication that implements the parallel
read–write operation is shown in Fig. 6. It is controlled by the
delay circuit (D), whose port drives port . Thus, column data
transfer and row-address decoding are initiated simultaneously.
The E block in this figure was inserted to pipeline the decoder;
this is described in the next subsection (Section III-B). Ignoring
is tied directly to
E for the moment, and imagining that
and
to , we observe a triangular path connecting the delay
circuit (D), the column data latch (M), and the receiving row (R).
, which signals column
The receiving row’s acknowledge
parallel write as well as row selection (see Section III-C), is fed
to a -input OR gate (part of the decoder) that combines row
.
acknowledges into a single array-level acknowledge
as well as
This completes our implementation of
, which we have essentially folded into
.

BOAHEN: BURST-MODE WORD-SERIAL ADDRESS-EVENT LINK—II

Fig. 10. Address latch [E]. Interfaces three-to-four-wire converter or rowaddress delay (li; lni; lo) with decoder-logic (ent; enf; ei). The latch is
opaque when its control signal (s) is high. Dual-rail encoding is used for data
output.

B. Decoding
We now turn our attention to the CHP program for
(see Section II-B). In addition to the combinational logic required to implement the binary to one-hot function,
, our
decoder design includes a latch, to support pipelined operation,
and a bundled-data-to-dual-rail converter, to guarantee delayinsensitive operation. Here we describe the latch and the converter; the combinational logic was described in [22]. There is
also an -input OR gate (mentioned above), built from a tree of
two-input ORs, which combines the acknowledges.
We made the latch’s input (port ) passive and its output (port
) active and used the following reshuffled HSE sequence:
# pipeline (E) #
3[[ ~ ei&li];eo+,lo+;[ei& ~ li];eo-,lo-].

This reshuffling allows the decoder to acknowledge receiving
the address before it has finished decoding it, thereby increasing
throughput [23], [24]. The compiled circuit, called a C-element
[25], is shown in Fig. 10, which also includes the memory cell.
The C-element clears
and sets when
and
are both
high. Thus, the write is acknowledged and a read is initiated at
the same time. It sets
and clears when and
are both
low, thus enabling a new write and terminating the old read at
the same time.
The latch’s memory cell consist of two inverters and a mux,
whose select signal is driven by the C-element. When goes
high, is driven high and is driven low. Thus, the latch becomes opaque. It does not become transparent again until
goes low, which happens after
is lowered, indicating that
the address has been decoded.
The data converter consist of two AND gates that combine the
control signal (i.e.,
or with the stored bit and its comple-

1287

or
is forced high depending on the
ment. Thus, either
th bit’s value when goes high. One or the other of these dual
rails is connected to the
-input AND gates that generate
the decoder’s (one-hot) outputs, depending on that particular
address [22]. Thus, when all
bits are valid, an output
will become active. If even just one bit is invalid (i.e., both rails
are low), all the outputs will remain inactive.
Previously, we drove the decoding logic with bundled-data
instead of dual-rail. That is, we connected one or the other
of the memory cell’s complementary outputs directly to each
-input-AND gate, which had an extra input driven
by the request (i.e.,
) that served as an enable [22]. We
abandoned this scheme as we found that delays on this enable
line produced glitches, because the
-input AND gates
would remain enabled for some time after
was driven low.
Since the latch becomes transparent at this point, new data
would propagate straight through and drive the AND gates until
the delayed enable was cleared, hence the glitch. Glitching is
prevented by using dual rail, which eliminates the extra line
and logic level as well.
We use the same pipelined decoder design for both the row
and column addresses. For the row decoder, this means that the
parallel write is also pipelined, as the C-element’s signal initiates column data transfer as well as row selection (see Fig. 6).
Hence, when the delay circuit passes on the row address at the
end of the burst, it will be acknowledged at the same time we
begin transferring the data into the array. This simultaneity allows the delay circuit to complete its communication with the
three-to-four-wire converter early (see Section III-A). Hence,
the next burst can be received, and its column addresses decoded, even while we are performing the parallel write. The
column data latch, which is presented in the next subsection
(Section III-C), must be designed to ensure that the old data is
not overwritten.
C. Writing
We implement
and
in this section.
must complete each and every
communication in its entirety
before it has even begun one communication [see Fig. 4(a)].
This ability for communications on one port to get out of step
with communications on another is referred to as slack. For instance, in a regular left–right buffer, like that used to pipeline
the decoders (see Section III-B), the second phase of the left
port’s communication starts at the same time as the first phase
of the right port’s. Thus, a little more than a quarter cycle of
’s cells, a full cycle of slack is reslack is provided. For
quired to allow another burst to be read while we are writing the
preceding one into the array.
cell in
To obtain a full cycle of slack, we implement the
two stages called column buffer and data latch, each of which
contributes at least half a cycle of slack. We made the port that
column buffer uses to communicate with the decoder passive
(see Section II-C). We made the port it uses to communicate
with data latch active (see Fig. 6). Buffer’s reshuffled bit-level
HSE sequence reads (see Appendix part C)
# buffer (P) #
3[[ci];co+;[ ~ qi];qo+;[ ~ ci];co-;[qi];qo-].

1288

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 7, JULY 2004

finishes before the request to the second stage has even been
). In fact, as
is from the end of the
acknowledged (i.e.,
previous cycle, only the first of ’s four phases overlaps with
. Hence, buffer provides a slack of three-quarters.
Compiling buffer’s HSE into PRS (see Appendix part C)
goes high as soon as
yielded the circuit shown in Fig. 11(a).
becomes high, provided
is low.
becoming high will
make
go high, provided
is low. And a high
will clear
after
goes back down. With
low,
becoming high
is all that is required to clear . Essentially, this circuit is just
two serially connected C-elements, like those used to control
the address-latch (see Fig. 10).
For the second stage, data latch, we made the port it uses
to communicate with buffer passive as well as the port it uses
to communicate with the array (see Section II-C). Port must
be passive to make it possible for
to initiate the parallel write. Latch’s reshuffled bit-level HSE sequence reads (see
Appendix part C)
# data latch (M) #
3[[bi];bo+;[vi];vo+;[ ~ bi];bo-;[ ~ vi];vo-].

starts when is halfway, hence, the latch provides a slack of
half. This amount is clearly sufficient, since buffer only needs
, which corresponds to its
, to complete its
to see
communication. The buffer can even get halfway through its
communication, but it must wait for
to happen,
next
which corresponds to its
, to proceed to completion. Thus,
the buffer can complete a second communication while is
only halfway—before even the write-request has been cleared
). Consequently, this reshuffling gives us a half-cycle
(i.e.,
more slack than we need.5
Compiling the latch’s HSE into PRS (see Appendix part C)
goes high as soon as
yielded the circuit shown in Fig. 11(b).
becomes high, provided is low. Now must become high
before
can drive
high. When
becomes low it cannot
clear unless is high. When that happens, and is cleared,
also must become low before
is cleared. This sequencing
ensures that data cannot be overwritten before it is read—even
enables new data to be presented. Inactive
though clearing
cells require extra care, as explained in Appendix part C.
[see Fig. 5(b)], we made its
Finally, to implement
and ports both passive and made its
ports active (see
Section II-C). The reshuffled bit-level HSE for the row reads
(see Appendix part D)
# row cell (R) #
3[[ri&ci];po+;[pi];ro+;[ ~ ri& ~ ci];po-;
[ ~ pi];ro-]

where we treat
as data that arrives on . Thus, there is no
acknowledges the
need for a separate acknowledge for ;
latch as well as the decoder. Compiling this HSE into PRS (see
5The receiver will hang if the same column address appears thrice, because,
while the first goes in the latch and the second is held in the buffer, there is
nowhere to put the third; the address-latch provides less than a half-cycle.

Fig. 11.

Column buffer [P] and data latch [M]. (a) Interfaces column decoder

(ci; co) with data latch (qo; qi). (b) Interfaces column buffer (bi; bo) with
array (vo) and row decoder’s C-element (vi). The zero-tag indicates that the

inverter’s output is forced low during reset; this clears the pipeline.

Fig. 12. Event-recipient interface [R] and acknowledge-OR. (a) Interfaces
data latch ( ci) and row decoder ( ri; ro) with event-recipient (po; pi).
(b) Broadcasts a request li, to all n ports and ORs their acknowledges
r1i; r2i; . . . or rni, together to create a single one lo.

Appendix part D) yielded the circuit shown in Fig. 12(a)—a
goes
C-element (the staticizer was eliminated to save area).
and
are both low, and it goes low when
high when
they are both high. Thus, the cell makes sure its row select and
column data lines are both clear before clearing , its request.
Then, it simply copies , the acknowledge, to .
’s bit-level acknowledges to generate a
We combine
single acknowledge for that row using the staticized wired OR
gate shown in Fig. 12(b). This gate produces an output when at
least one cell acknowledges. Nevertheless, the remaining cells
have sufficient time to read the column data, as the row-spanning
wired OR line is slow due to its large capacitance. The OR gate’s
output, , is cleared when all the cells’ acknowledges are clear.
cannot clear
because the pFET is
Until this is the case,
sized to be weaker than the nFETs.

BOAHEN: BURST-MODE WORD-SERIAL ADDRESS-EVENT LINK—II

IV. SUMMARY AND CONCLUSION
We have described an address-event receiver that writes all
events destined for a particular row in parallel. While these
events are being written, it decodes the next burst’s column addresses, in preparation for the next parallel write. Bursts consist of a sequence of addresses: one for the row and additional
ones for the column of each active cell in that row, plus a termination signal. They are communicated using a three-wire handshake: a row request, a column request, and a common acknowledge. In return for the extra request line, input pads are cut by
50% without sacrificing throughput as the row address is not repeated.
In terms of cell area, the cost of the parallel-write design
is practically the same as prior receiver designs. Prior designs
require a four-transistor NAND gate to combine the row- and
column-select signals (this can be reduced to three if an NMOSstyle design is used [22]), similar to our four-transistor C-element [see Fig. 12(a)]. They require an additional transistor
to pull down the acknowledge line, similar to our staticized
wired-OR [see Fig. 12(b)]. However, we made our design efficient by eliminating the staticizer, which would have added
four more transistors. It is acceptable to cut corners here be)
cause the output is in the high-impedance state (i.e.,
only briefly. Thus, the increased throughput and scalability parallelism offers [20] is attained at no cost in hardware.
We also illustrated how to synthesize an asynchronous implementation starting from a high-level specification by way
of a concrete example. The result was six asynchronous logic
circuits that, together, can be used to implement a burst-mode
word-serial address-event receiver of any desired size. These
circuits include an improved decoder that eliminates glitches by
using dual-rail encoding. We have laid out a library of cells (in
MOSIS DEEP_SUBM rules) for these circuits and written a silicon compiler to tile them to fit any desired pixel or array size.
Thus far, this tool has successfully compiled receivers for three
generations of chips, fabricated in 0.6-, 0.4-, and 0.25- m technology [20].
APPENDIX
LOGIC SYNTHESIS
When compiling HSE into PRS, we perform two passes. On
the first pass, we make the wait before an action the guard
, the
for its production rule. For example,
passive port’s sequence, is realized by the set
, which is implemented by a wire. On the second
pass, we strengthen guards that can become true at some other
point in the sequence by ANDing with another boolean variable.
If all signals are in exactly the same state at these two points, we
add a state variable to distinguish them, setting it after we pass
the first point and clearing it after we pass the second point, or
vise versa.
, where is passive
For example, the CHP process
and is active, could be augmented with the state variable as
follows:
3[[pi];po+;[ ~ pi];s+;po-;
ao+;[ai];s-;ao-;[ ~ ai]].

1289

’s state now distinguishes the point where ends and begins
from the point where ends and repeats. Alternatively, the
ambiguous state can be eliminated if we begin before ends.
For instance
[[pi];po+;ao+;[ai];[ pi];po-;ao-;[ ai]]
which we used for the decoder (see
in Section III-B and Fig. 10), is unambiguous. This reshuffling, if acceptable, is cheaper to implement, as it does not require a state
variable.
We can often avoid adding state variables by reshuffling sequences in this way, provided the change in sequencing is benign. When compiling such sequences into PRS, multiple pre) and
ceeding waits are ANDed together (e.g.,
). Anpreceeding actions become guards too (e.g.,
other goal of reshuffling is symmetry: clearing signals in the
same order that you set them. Such symmetry makes the signals that appear in the pull-up and the pull-down the same. This
duplicity usually results in a simpler implementation, as the
pull-up is disabled when the pull-down is active, and vise versa.
It is sometimes possible to convert a state-holding gate into
a combinational one, thereby avoiding the need for a staticizer.
and pull-down
That is, to make the gate’s pull-up
complementary
. Such conversion is typand ANDing
ically done by ORing terms with the pull-up
, or vise versa. For example,
terms with the pull-down
requires a staticizer, since
is not an
is combinational,
identity. However,
is an identity. In fact, that is a NAND gate.
since
These added terms must have a benign effect, such that
at all points in the sequence, where is the original guard and
is the weakening term.
A. Demux
’s
port passive and its
and
ports acMaking
tive (see Section III-A) yielded these HSE sequences for the
three-to-four-wire converter
3[[ari];ao+;ro+;[ri];[ ~ ari];ao-;ro-;[ ~ ri]]
k3[[aci];ao-;[ ~ aci];ao+;co+;[ci];co-;[ ~ ci]]

where two-phase handshakes communicate the row address and
on and on . To avoid storing
the end-of-burst signal
the addresses locally, we moved the communications into the
occurred immiddle of the communications. This way,
, passing on the address immediately. And
mediately after
did not occur until
, stalling till the address was read.
Downward transitions follow the same sequence. We reshuffled the column-address communications in the same way, even
though the second halves of these handshakes are meaningless.
Thus, we obtained the sequences presented in Section III-A
.
We compiled these reshuffled sequences into the following
PRS:
ari->ro+
ari->ro-

ari& ~ _aci->co+
~ arij_aci->co-

ri& ~ ci->ao+
~ rijci->ao-.

1290

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 7, JULY 2004

We strengthened the guard for
with
, otherwise, it would fire at start-up, before we have the chance to
high. We weakened the complementary guard to make
set
the gate combinational. The corresponding circuit is shown in
Fig. 8.
B. Delay
To delay delivering row addresses directed to
’s port
to the row decoder until the burst ends (see Section III-A), our
initial choice for an HSE sequence was
3[[gi];go+;[ ~ gi];go-;po+;[pi];po-;[ ~ pi]]

which has an active input port and an active output port .
to avoid creating an ambiguous
However, we postponed
state, which would have required us to introduce a state varimust occur before
, otherwise, another
able. However,
as long as
ambiguous state would occur. We also postponed
we could to maximize the time we have to decode the address
and write data to that row. Relocating this wait to just before
occurs in the next cycle gave us the reshuffling presented
.
in Section III-A
In compiling the reshuffled HSE, we introduced a local varibecomes false and set when
able, , that is cleared when
becomes false. Hence, the sequence we implemented was
3[[gi];go+;[ ~ gi];u-;[ ~ pi];po+;[pi];go-;
u+;po-].

The following PRS resulted:
~ po&gi->go+
po&pi->go-

go& ~ gi->ugij~ go->u+

~ u& ~ u->po+
u->po-.

We strengthened the guard of
with
to disable it when
fired. And we weakened the guard of
to make the gate combinational. Notice that introducing avoids a three-transistor
chain in ’s pull-up, which would compromise performance.
The corresponding circuit is shown in Fig. 9.
C. Buffered Data Latch
’s input port passive and its output port also
Making
passive (see Section III-C) yielded the following bit-level HSE
sequence:
[[ci];co+;[ pi];co-;
[vi];vo+;[ vi];vo-].
We introduced an intermediate communication in order to obtain more slack, and thereby expanded this sequence into two
concurrent processes
3[[ci];co+;[ ~ ci];co-;qo+;[qi];qo-;[ ~ qi]]
3[[bi];bo+;[ ~bi];bo-;[vi];vo+;[ ~vi];vo-]

where the intermediate communication is called in the first
process and called in the second one (i.e., the active port is
tied to the passive port).
Reshuffling the first process gave us the HSE sequence pre. Specifically, we adsented in Section III-C
to the middle of
to eliminate the ambiguous
vanced
state created by returning to the initial state halfway through
to the
the sequence. We restored symmetry by postponing
’s new location.
next cycle, placing it immediately before
Reshuffling the second process gave us the other HSE sequence
. All we did was
presented in Section III-C
to postpone the second half of to the middle of .
We compiled the reshuffled buffer sequence into the following PRS:
~ qo&ci->co+
qo& ~ ci->co-

co& ~ qi->qo+
~ co&qi->qo-.

The corresponding circuit is shown in Fig. 11(a).
We compiled the reshuffled data-latch sequence into the following PRS:
~ vi&bi->bo+
vo& ~ bi->bo-

bo&vi->vo+
~ bo& ~ vi->vo-.

We guarded
with
instead of
to prevent inactive cells
from setting
after
becomes high. Otherwise,
would
becomes true, since
is true for inactive cells. It
fire when
, since
also is true. Thus, the next
would be followed by
burst’s events would end up in the row that the current burst is
being written to. Using the global signal, , instead of the local
in inactive
signal, , prevents this scenario, postponing
cells until the on-going write is completed, at which point
becomes false. The corresponding circuit is shown in Fig. 11(b).
However, blocking
with
may also lock out the last
event in the current burst. Since the pipelined column decoder
provides almost half-a-cycle of slack, the three-to-four-wire
communication finishes while the
’s
converter’s
communication is only halfway through (see Fig. 6). That is,
at the same time
issues
,
the converter issues
’s
(see Section III-C). As
which corresponds to
triggers the row-address-delay block to issue the write-reabove), correct operation requires sufficient
quest (i.e.,
becomes true first. Fortunately, this timing
delay to ensure
assumption is easily satisfied, as the buffer-to-data latch path
involves just one channel, while the other path involves several,
and includes signaling off-chip.
D. Row Cell
Making
’s and ports passive and its
yielded this bit-level HSE sequence

port active

[[ri];ro+;[ci];co+;[ ci];co-;
po+;[pi];po-;[ pi];[ ri];ro-]
where two-phase handshakes implement the pair of communications (see
in Section II-C). We chose to ran in

BOAHEN: BURST-MODE WORD-SERIAL ADDRESS-EVENT LINK—II

lock-step with , which made redundant. And we moved the
first and second halves of to the middle of the first and second
communications, respectively. Keeping
at the end guaranteed mutual exclusion. Hence, we obtained the sequence pre.
sented in Section III-C
We compiled the reshuffled sequence into the following PRS:
ri&ci->po+
~ ri& ~ ci->po-

pi->ro+
~ pi->ro-.

The corresponding circuit is shown in Fig. 12(a).
ACKNOWLEDGMENT
The author would like to thank K. Zaghloul for reviewing the
design and K. Hynna for invaluable help simulating and refining
it. He would also like to thank one of the anonymous reviewers
for his/her comments, which markedly improved the paper.
REFERENCES
[1] M. Mahowald, An Analog VLSI Stereoscopic Vision System. Boston,
MA: Kluwer Academic, 1994.
[2] K. A. Boahen, “The retinomorphic approach: Pixel-parallel adaptive amplification, filtering, and quantization,” Analog Integr. Circuits Signal
Process., vol. 13, pp. 53–68, 1997.
[3] E. Culurciello, R. Etienne-Cummings, and K. A. Boahen, “A biomorphic digital image sensor,” IEEE J. Solid-State Circuits, vol. 38, pp.
281–294, Feb. 2003.
[4] J. Kramer, “An on/off transient imager with event-driven asynchronous
read-out,” in Proc. IEEE Int. Symp. Circuits and Systems, vol. 2, 2002,
pp. II-165–II-168.
[5] J. Lazzaro, J. Wawrzynek, M. Mahowald, M. Sivilotti, and D. Gillespie, “Silicon auditory processors as computer peripherals,” IEEE Trans.
Neural Networks, vol. 4, pp. 523–528, May 1993.
[6] S. P. DeWeerth, G. N. Patel, M. F. Simoni, D. E. Schimmel, and R.
L. Calabrese, “A VLSI architecture for modeling intersegmental coordination,” in Proc. 17th Conf. Advanced Research in VLSI, 1997, pp.
182–200.
[7] C. M. Higgins and C. Koch, “Multi-chip motion processing,” in Proc.
Conf. Advanced Research in VLSI, vol. 20, 1999, pp. 309–322.
[8] W. Yang, “A wide-dynamic range low-power photosensor array,” in
Proc. Int. Solid-State Circuits Conf., vol. 37, 1994, p. 230.
[9] B. Fowler, A. E. Gamal, and D. Yang, “A cmos area image sensor with
pixel-level A/D conversion,” in Proc. IEEE Int. Solid-State Circuits
Conf. (ISSCC’94), vol. 37, San Francisco, CA, 1994, pp. 226–227.
[10] L. G. McIlrath, “A low-power low-noise ultrawide-dynamic-range cmos
imager with pixel-parallel A/D conversion,” IEEE J. Solid-State Circuits, vol. 36, pp. 846–853, May 2001.
[11] A. Murray and L. Tarassenko, Analogue Neural VLSI: A Pulse Stream
Approach. London, U.K.: Chapman and Hall, 1994.
[12] W. Maass and W. B. C. M., Eds., Pulsed Neural Networks. Boston,
MA: MIT Press, 1999.
[13] K. A. Boahen, “A burst-mode word-serial address-event link—I: Transmitter design,” IEEE Trans. Circuits Syst. I , vol. 51, pp. 1269–1280,
July 2004.

1291

[14] M. Sivilotti, “Wiring considerations in analog VLSI Systems, with
application to field-programmable networks,” Ph.D. dissertation, Dept.
Comp. Sci, California Inst. of Technol., Pasadena, CA, 1991.
[15] A. Mortara, E. Vittoz, and P. Venier, “A communication scheme for
analog VLSI perceptive systems,” IEEE J. Solid-State Circuits, vol. 30,
pp. 660–669, June 1995.
[16] A. Abusland, T. S. Lande, and M. Hovin, “A VLSI communication architecture for stochastically pulse-encoded analog signals,” in Proc. IEEE
Int. Symp. Circuits and Systems, vol. 3, May 1996, pp. 401–404.
[17] J. G. Elias, “Artificial dendritic trees,” Neur. Comput., vol. 5, pp.
648–663, 1993.
[18] S. R. Deiss, R. J. Douglas, and A. M. Whatley, “A pulse-coded communications infrastructure for neuromorphic systems,” in Pulsed Neural
Networks, W. Maass and W. B. C. M, Eds. Boston, MA: MIT Press,
1999, ch. 6, pp. 157–178.
[19] J. P. Lazzaro and J. Wawrzynek, “A multi-sender asynchronous extension to the address-event protocol,” in Proc. 16th Conf. Advanced Research in VLSI, 1995, pp. 158–169.
[20] K. A. Boahen, “A burst-mode word-serial address-event link—III:
Analysis and test results,” IEEE Trans. Circuits Syst. I, vol. 51, pp.
1292–1300, July 2004.
[21] A. Martin, “Programming in VLSI: From communicating processes to delay-insensitive circuits,” in Proceedings of UT Year of
Progamming Institute on Concurrent Programming. Reading, MA:
Addison-Wesley, 1990, pp. 1–64.
[22] K. A. Boahen, “Point-to-point connectivity between neuromorphic
chips using address-events,” IEEE Trans. Circuits Syst. II, vol. 47, pp.
416–434, May 2000.
[23]
, “Retinomorphic vision systems II: communication channel design,” in Proc. IEEE Int. Symp. Circuits and Systems, May 1996, pp.
14–17.
[24]
, “Communicating neuronal ensembles between neuromorphic
chips,” in Neuromorphic Systems Engineering: Neural networks in
Silicon, T. S. Lande, Ed. Boston, MA: Kluwer Academic, 1998, ch.
11, pp. 229–262.
[25] I. E. Sutherland, “Micropipelines,” Commun. ACM, vol. 32, no. 6, pp.
720–738, 1989.

Kwabena A. Boahen received the B.S. and M.S.E.
degrees in electrical and computer engineering
from The Johns Hopkins University, Baltimore,
MD, in the concurrent masters-bachelors program,
both in 1989, and the Ph.D. degree in computation
and neural systems from the California Institute of
Technology, Pasadena, in 1997.
He is an Associate Professor in the Bioengineering Department at the University of
Pennsylvania, Philadelphia, where he holds a
secondary appointment in electrical engineering.
His current research interests include mixed-mode multichip VLSI models of
biological sensory and perceptual systems, and their epigenetic development,
and asynchronous digital interfaces for interchip connectivity.
Dr. Boahen was awarded a Packard Fellowship in 1999, a National Science
Foundation CAREER Grant in 2001, and an Office of Naval Research YIP Grant
in 2002. He is a member of Tau Beta Kappa and has held a Sloan Fellowship
for Theoretical Neurobiology at the California Institute of Technology.

