PCI-AER interface for Neuro-inspired Spiking Systems by Paz Vicente, Rafael et al.
PCI-AER interface for Neuro-inspired Spiking Systems
R. Paz-Vicente, A. Linares-Barranco, D. Cascado, MA. Rodriguez, G. Jimenez, A. Civit, JL. Sevillano.
Dpto. de Arquitectura y Tecnologia de Computadores. Universidad de Sevilla, SPAIN.
rpaz@ atc.us.es
Abstract- Address Event Representation (AER) is a that are more active are accessing the bus more frequently
neuromorphic interchip communication protocol that allows than those less active.
for real-time connectivity between huge number neurons
located on different chips. By exploiting high speed digital CPIE2
communication circuits (nano-seconds), synaptic neural 1111|111SI In
connections can be time multiplexed (mili-seconds). When wIFAL
building multi-chip muti-layered AER systems it is absolutely D BU°
necessary to have a computer interface that allows (a) to read Hl 0
AER interchip trafric, and (b) inject a sequence of events to the D3 D|
AER structure. This paper presents a PCI to AER interface, l_m_ _=_l
that dispatches a sequence of events with timing information. It
is able to recovery the possible delays introduced by AER bus. Figure 1. AER inter-chip communication scheme.
It has been implemented in real time hardware using VHDL
and tested in a PCI-AER board, developed by authors, that Transmitting the pixel addresses allows performing extra
currently capable to send and receive events at a peak rate of operations on the images while they travel from one chip to
16 Mev/sec, and a typical rate of 10 Mev/sec. another. For example, inserting properly coded memories
(ie. EEPROM) allows transformation (ie. shifting and
I. INTRODUCTION rotation) of images. Also, the image transmitted by one chip
can be received by many receiver chips in parallel, by
Address-Event-Representation (AER) was proposed in properly handling the asynchronous communication
1991 by Sivilotti [1] for transferring the state of an array of protocol. The peculiar nature of the AER protocol also
neurons from one chip to another. It uses mixed analog and allows for very efficient convolution operations within a
digital principles and exploits pulse density modulation for receiver chip [2].
coding information. The state of the neurons is a continuous There is a growing community ofAER protocol users for
time varying analog signal. bio-inspired applications in vision and audition systems, as
Figure 1. explains the principle behind the AER basics. demonstrated by the success in the last years of the AER
The emitter chip contains an array of cells (like, for group at the Neuromorphic Engineering Workshop series
example, a camera or artificial retina chip) where each pixel [3]. The goal of this community is to build large multi-chip
shows a continuously varying time dependent state that and multi-layer hierarchically structured systems capable of
change with a slow time constant (in the order of performing complicated array data processing in real time.
milliseconds). Each cell or pixel includes a local oscillator The powerful of these systems can be used under computer
that generates digital pulses of minimum width (a few based systems under co processing. This purpose strongly
nanoseconds). The density of pulses is proportional to the depends on the availability of robust and efficient AER
state or intensity of the pixel. Each time a pixel generates a interfaces [4]. One such tool is a PCI-AER interface that
pulse (which is called "event"), it communicates with the allows not only reading an AER stream into a computer
array periphery and a digital word representing its code or memory and displaying it on screen in real-time, but also the
address is placed on the external inter-chip digital bus (the opposite: from images available in the computer's memory,
AER bus). Additional handshaking lines (Acknowledge and generate a synthetic AER stream in a similar manner as
Request) are also used for completing the asynchronous would do a dedicated VLSI AER emitter chip [1][5][6].
communication. In Section II we comment the problems behind the AER
In the receiver chip the pulses are directed to the pixels sequencing and monitoring. In Section III and IV we present
or cells whose code or address was on the bus. This way, a hardware architecture for the CAVIAR PCI-AER interface
pixels with the same code or address in the emitter and developed under the European project CAVIAR. In Section
receiver chips will "see" the same pulse stream. The V we present experiment results. Finally Section VI presents
receiver cell integrates the pulses and reconstructs the the conclusions.
original low frequency continuous-time waveform. Pixels
0-7803-9390-2/06/$20.00 ©)2006 IEEE 3161 ISCAS 2006
II. SEQUENCING AND MONITORING AER EVENTS As the computer interfacing elements are mainly a
To be useful for debugging an AER tool should be able to monitoring and testing feature in many address event
receive and also send a long sequence of events interfering systems, the instruments used for these proposes should not
as little as possible with the system under test. Let's start delay the neuromorphic chips in the system. Thus, speed
explaining the meaning of interfacing in the context. requirements are at least 10 times higher than those of the
As neurons have the information coded in the frequency original PCI-AER board. Several alternatives are possible to
(or timing) of their spikes, the pixels that transmit their meet these goals: (a) extended PCI buses, (b) bus mastering
address through an AER bus also have their information and (c) hardware based Frame to AER and AER to Frame
coded in the frequency of appearance of those addresses in conversion.
the bus. Therefore, inter-spike-intervals (ISIs) is critical for When the development of the CAVIAR PCI-AER board
this communication mechanism. Thus, a well designed tool was started, using 64bit/66MHz PCI seemed an interesting
shouldn't modify the ISIs of the AER. alternative as computers with this type of buses were
The ISIs may be difficult to preserve depending on the popular in the server market. When we had to make
nature ofthe emitter and/or receiver chips. Let's suppose the implementation decisions the situation had altered
case of having an AER emitter chip connected to an AER significantly. Machines with extended PCI buses had almost
receiver chip, and we want to debug their communication. disappearing and, on the other hand, serial LVDS based PCI
In principle, there are two possibilities: connecting to the express was emerging clearly as the future standard but
bus an AER sniffer element, or to introducing a new AER almost no commercial implementations were in the market.
element in between the emitter and the receiver. Therefore, the most feasible solution was to stay with the
- The sniffer element will consist on an AER receptor that common PCI implementation (32 bit bus at 33MHz).
captures the address and stores it with a timestamp in The previously available PCI-AER board uses polled I/O
memory for each request that appears on the AER bus. to transfer data to and from the board. This is possibly the
The problem in this case is that the speed of the emitter main limiting factor on its performance. To increase PCI
and receiver protocol lines could be faster than the bus mastering is the only alternative. The hardware and
maximum speed supported by the sniffer (15 ns per event driver architecture of a bus mastering capable board is
in some existing chips), causing events to be lost. Another significantly different, and more complex, than a polling or
typical problem could be that the throughput of the AER interrupt based implementation.
bus (unknown in principle) would be so high that the Hardware based frame to AER conversion doesn't
interface memory cannot be downloaded to the increase PCI throughput but, instead, it reduces PCI traffic.
computer's memory on time. This also implies that events First some important facts have to be explained. It is well
are lost. known that some AER chips, especially grey level imagers
- The other possibility is to introduce a new AER element where pulse density is proportional to the received light
between the two chips. In this case the emitter sends the intensity, require a very large bandwidth. This is also the
event to the AER element and the AER element sends the case ofmany other chips when they are not correctly tuned.
same event to the receiver chip. The problem now is that For example let's consider a Grey level 128*128 imager
the new AER element will always introduce a delay in the with 256 grey levels. In a digital frame based uncompressed
protocol lines, and may also block the emitter if it is not 25fps format, it would require a bandwidth of 128*128*25=
able to keep up with its throughput. Therefore, ISIs are 0. 39MBytes/s. The maximum requirements for an
not conserved. But the behaviour will be the same than if equivalent" system that would output AER supposing the
we connect the emitter to a slower receiver. number of events in a frame period is equal to the gray level
The throughput problem requires using very fast PC and considering the worst case where all pixels spike with
interfaces and the problem of very fast emitter or receiver maximum rate is:
protocols can be reduced by using a very high frequency 2bytes/event*256events/pixel*number of pixels/ frame
clock for the stages that interface with the AER protocols. period= 200MBytes/s
The meaning of this figure should be carefully
III. PCI-AER INTERFACE: CONSIDERATIONS AND PCB considered. A well designed AER system, which produces
Before the development of our tools the only available events only when meaningful information is available, can
PCI-AER interface board was developed by Dante at ISS- be very efficient but, an AER monitoring system should be
Rome [3]. This board is very interesting as it embeds all the prepared to support the bandwidth levels that can be found
requirements mentioned above: AER generation, remapping in some real systems. These include systems that have not
and monitoring. Anyhow its performance is limited to been designed carefully or that are under adjustment.
lMevent/s approximately. In realistic experiments software Currently the available spike rates, even in these cases, are
overheads reduce this value even further. In many cases far firom the value shown above but, some current AER
these values are acceptable but, currently many address chips may exceed the 40Mevents/s in extreme conditions.
even chps anrodce or ccet) uchhiger pik raes. The theoretical maximum PC132/33 bandwidth is aroundevet ip c ( a ep mch igerspke at 133Mbytes/s. This would allow for approximately
3162
33Mevent/s considering 2 bytes per address and two bytes IV. PCI-AERINTERFACE: HARDWARE DESIGN
for timing information. Realistic figures in practice are The final goal is to transmit an AER sequence to an AER
closer to 15Mevents/s. Thus, in those cases where the based system (for example a convolution chip) to perform
required throughput is higher a possible solution is to video processing. An adequate sequence of events can be
transmit the received information by hardware based generated by software for testing an AER based system.
conversion to/from a frame based representation. Although This sequence of events needs to be sent to the AER based
this solution is adequate in many cases, there are system. For this purpose it is necessary an interface between
circumstances where the developers want to know precisely the computer and the AER bus. Figure 3. shows the VHDL
the timing of each event, thus both alternatives should be architecture of the present hardware interface. This is a PCI
preserved. interface developed under the European project CAVIAR.
Implementing AER to Frame conversion is a relatively The interface, called CAVIAR PIC-AER, has two operation
simple task as it basically requires counting the events over modes that can work in parallel:
the frame period. Producing AER from a frame b.0 ad14Id R3 -
representation is not trivial and several conversion methods =
have been proposed 0[4]. _____)__AA_3__F___TCE
The theoretical event distribution would be that where the "rae====
number of events for a specific pixel is equal to its SO.b _:XA TRTSLH
associated grey level and those events are equally Th41
distributed in time. The normalized mean distance from the
theoretical pixel position in time to the resulting pixel CORE PASTENDOF ,
timing with the different methods is an important NTA ol N
comparison criterion. In 0[4] it is shown that, in most ._ a _- _-
circumstances, the behavior of the methods is similar and, =__3_0] r L(15r01d
thus, hardware implementation complexity is an important ACKO
selection criteria. From the hardware implementation OvFE El _ a
viewpoint random, exhaustive and uniform methods are
especially attractive.
Figure 3. Hardware Interface Architecture.
A. From PCI to AER.
The AER-stream is stored in the computer memory and
then it is sent to the AER system through the PCI bus and
the OFIFO. This stream is saved in memory using 32 bits
for each address event. The sixteen less significant bits
represents the address of the pixel that is emitting the event.
The sixteen more significant bits represent a time difference
from the previous event in clock cycles. The clock cycle is
30 ns, but can be scaled up 16 times. Special words can be
used in the OFIFO to make the state machine to wait the
Fi ureCAVIAPCI-AE boardmaximum time, coded with 16 bits, and then it reads a newFigue 2. CAVIAR PCI-AER board . word of the OFIFO without any event transmission. The
As a result of these considerations the design and OUT-AER state machine keeps continuously reading 32-bit
implementation of the CAVIAR PCI-AER board was words from OFIFO if it is enabled. For each word the state
developed including the bus mastering. The hardware based machine will wait for the configured number of clock cycles
frame to AER conversion has been developed for the before transmitting the address through the AER output bus.
CAVIAR USB-AER board [10]. If the acknowledge is delayed, the timer of the OUT-AER
The physical implementation is implemented into VHDL state machine will discount this time to the wait state of the
for a FPGA. It was established that most of the next event. If the result of the discount is negative no wait
functionality, demanded by the users, could be supported by will be done for the next event and this value will be used as
the larger devices in the less expensive SPARTAN-Il initial wait for the following event. With this treatment the
family. Figure 2. shows the CAVIAR PCI-AER board. delay between events is not relative to the previous one, and
A Windows driver and an API that implements bus a delay in the ACK reception will not cause a distortion in
mastering and a Matlab interface are currently available. A the time distribution of all the events along the time period.
Linux version of the driver is still under development.
3163
B. From AER to PCI Error introduced by PCI-AER interface on ISIs.
The AER sequence arrives to the CAVIAR PCI-AER
interface through the input AER port. The AER-IN state 4.
machine stores the incoming event (16 bits LSbits) into the
IFIFO with temporal information. This temporal
information (16 bits MSbits) is the number of clock cycles 3.5
since the last event. a
The connection to the PCI bus is done by a VHDL bridge 2/
[9] that attends to the Plug & Play protocol of the PCI bus,
decodes the access to the base address by the operating 2 Exhaustive
system, allows the bus mastering and the interruptions. 5 - Scan
f Uniform BF
V. EXPERIMENT 1 Uniform F
e E{3 Uniform VTA
The output AER bus has been connected with the input 06 1 , ,,
AER bus of the same board. The experiment consist on l0 20 30 40 50o 0 70 80 90
transmitting a sequence of events associated to an image
using different methods for synthetic AER generation: Scan, Figure 4. Average IS time difference for TIS and synthetic methods.
Uniform, Random and Exhaustive[4]. It has been
transmitted and received a Test Image Set (TIS) synthesized REFERENCES
by all the methods. The TIS iS composed by 9 Gaussian
histogram images that imply a growing charge of events in [1] M. Sivilotti, "Wiring Considerations in analog VLSI Systems withApplication to Field-Programmable Networks", Ph.D. Thesis,the AER bus. Figure 4. shows the average inter-spike time California Institute of Technology, Pasadena CA, 1991.
difference between the expected (10 ns per event) and the [2] Teresa Serrano-Gotarredona, Andreas G. Andreou, Bernabe Linares-
transmitted/received by the interface (40 ns per event). In Barranco. "AER Image Filtering Architecture for Vision-Processing
the worst case, the difference is 5 ms per event. The delay is Systems". IEEE Transactions on Circuits and Systems. Fundamental
growing with the charge of events because of the saturation Theory and Applications, Vol. 46, NO. 9, September 1999.
of the states machines local time recovery algorithm. The [3] A. Cohen, R. Douglas, C. Koch, T. Sejnowski, S. Shamma, T.Horiuchi, and G. Indiveri, "Report to the National Science Foun-
IFIFO is almost collapsed for the 9000 charge of events, so dation: Workshop on Neuromorphic Engineering", Telluride,
there still are some pauses between events that allow to the Colorado, USA, June-July 2001. [www.ini.unizh.ch/telluride]
state machine to make some wait state. This situation affect [4] A. Linares-Barranco, G. Jimenez-Moreno, A. Civit-Ballcels, and B.
to the error due to the reducedhoped time. Linares-Barranco. "On Algorithmic Rate-Coded AERGeneration". Accepted for publication on IEEE Transaction on
VI. CONCLUSIONS Neural Networks.[5] Kwabena A. Boahen. "Communicating Neuronal Ensembles between
AER format is a neuroinspired communication way Neuromorphic Chips". Neuromorphic Systems. Kluwer Academic
between neuroinspired systems. Many efforts have been Publishers, Boston 1998.
done in real-time vision processing. This paper has [6] Misha Mahowald. "VLSI Analogs of Neuronal Visual Processing: A
Synthesis ofForm and Function". Ph.D. Thesis. California Institute ofpresented a hardware interface with time delays recovery forTehogyPsdn,Clfri192
transferring events firom a PC to an AER system. TehooyPsdn,Clfri 19.[7] Kwabena A. Boahen. "Retinomorphic vision systems II: Commu-
The hardware has been tested with several charges of nication channel design". Proceedings of the IEEE ISCAS, volume
events produced by several methods of synthetic AER supplement, pp. 14-17. May 1996.
generation using the TIS. [8] Mortara, Eric A. Vittoz, Philippe Venier. A communication Scheme
A hardware interface that allows the communication for Analog VLSI Perceptive Systems. IEEE Journal of Solid-StateCircuits, vol. 30, No. 6, pp. 660-669, June 1995.between a PC and a AER based system, through the PCI
i o.3,N. p nbetw ee aiPC andat ha base system, withr PC,did [9] R. Paz. "AnMlisis del bus PCI. Desarrollo de puentes basados enbus, iS detailed and it has been tested with a bandwidth FPGA para placas PCI". Trabajo de investigacion para obtenci6n de
support from 1 Meventlsecond (worst case) to 16,6 suficiencia investigadora. Sevilla, Junio 2003.
Meventlsecond (best case), using PCI-mastering capabilities. [10] R. Paz, F. Gomez-Rodriguez, M. A. Rodriguez, A. Linares-Barranco,
G. Jimenez, A. Civit. Test Infrastructure for Address-Event-
ACKNOWLEDGMENT Representation Communications. International Work-Conference on
Artificial Neural Networks. Vilanova I la Gertru. SPAIN. June-2005.
This work was in part supported by EU grant IST-2001-
34124 (CAVIAR), and spanish grant TIC-2003-08164-C03-
02 (SAMANTA).
3164
