Abstract-This paper presents a general purpose address-event (AER) processor based on a SPARC-compatible LEON3 core with a custom data interface for asynchronous sensor data. The main focus in the design of the sensor interface was on precisely maintaining the inherent timing information of AER sensor data while providing robust peak-rate handling, DMA functionality and a novel event-rate dependent system control mechanism. Hardware-accelerated event pre-processing includes pre-FIFO high-resolution time-stamping, address masking for ROI and event-rate dependent IRQ generation without loading the processor core. The System-on-Chip has been implemented in a 0.18µm CMOS process and achieves peak AER input event rates of 33M AE/s and sustained event rates of 5.125M AE/s at 10ns time-stamp resolution. The core processes AEs at >1M AE/s sustained rate. We discuss design considerations and implementation details and show measurement results from the fabricated chip.
INTRODUCTION Address-Event-Representation (AER) was proposed in 1991 [1] for transferring the state of an array of artificial neurons from one VLSI chip to another. It uses mixed analog and digital principles and exploits pulse frequency modulation for coding information. Subsequently AER has been adapted as a protocol for transmitting asynchronous data from neuromorphic, spike-based sensors like silicon retinas or cochleas to processing units like multi-neuron ensembles or convolution chips [2] .
In a typical AER system, the emitter chip contains an array of cells where each cell (like e.g. the pixel of a vision sensor) operates autonomously and event-driven. Each time a pixel generates a pulse ("event"), it communicates with the periphery and its address is placed on an asynchronous digital bus, the AER bus. AER encodes spike-based data as digital addresses, but in continuous time. The exact time of occurrence of each event inherently encodes essential information.
Recent developments in embedded sensory systems and the emergence of new areas of applications for neuromorphic systems fuel the interest in interfacing spike-based sensors to standard synchronous computational units. For most applications it is crucial to maintain the event-inherent precise timing information of AER sensor data, hence the timestamping of incoming address-events at high temporal resolution is essential. Several AER-computer interfaces have been built and are in use today [3] [4] [5] [6] . However the transfer of the timing information into the synchronous domain at high 1 This work was supported under the EU FP7 framework (eMorph, 231467) temporal resolution (sub-microsecond), combined with local processing capability functionality, remain open issues. AER applications, e.g. involving fast moving objects in a vision sensor`s field of view, demand high time-stamping (TS) resolution. In state-of-the-art AE interface devices computing power is needed to execute event acquisition and timestamping, hence reducing core time that is available for data processing and limiting the TS resolution (e.g. to 1ms [6] ). For applications requiring precision timing like event-based optical measurement [9] or asynchronous PWM imaging [10] , such a TS resolution is not adequate.
In response to these demands, a general purpose SPARCcompatible processor based on a LEON3 core [7] with 20-bit parallel asynchronous AER sensor interface implementing 10ns resolution time-stamping, hardware-accelerated event pre-processing and DMA functionality, has been designed around a central 32-bit AMBA bus architecture [8] . The LEON3 processor is a proven, well-established processor architecture with a comprehensive GPL software tool chain and an extensive library for standard peripherals.
The presented device was fabricated on a standard 0.18µm CMOS process (UMC L180). This paper presents the implemented processor architecture and sensor interface concept along with design considerations and implementation details, and shows measurement results from the fabricated chip with respect to input event rates and core processing power.
II. PROCESSOR ARCHITECTURE A block diagram of the processor system architecture is shown in Figure 1 (with inset chip photograph cutouts). The system is based on a central 32bit AMBA AHB bus [8] that connects all modules requiring medium to high data transfer rates. The bus protocol supports multiple bus masters and is centrally controlled (block "AHB Controller"). Bus arbitration is organized according to the "Round Robin" principle which grants equal rights to all bus masters. The block "AHB/APB Bridge" connects modules with lower bandwidth needs via a reduced-complexity 32bit-Bus (AMBA APB).
The system can be coarsely divided into two main blocks, the arithmetic and logic unit with local memory system and the bus system (AMBA AHB, APB) with attached periphery modules. The two blocks particularly differ in clock frequencies whereby the ALU is driven by a double-frequency clock. The primary bus master of the system is the computing core "Leon3" [7] , a SPARC V8 compatible 32bit processor. The customized kernel configuration contains a 16bit "multiply-accumulate" (MAC16) unit and a hardware divider. The processor memory system is composed of a quadruple associative buffer for instructions (I-cache) of 32kB (8kB per set) with "least recently used" (LRU) replacement policy and local data memory (AHB RAM) of 64kB for fast access to high-priority data. The peripheral modules "SPI", "UART", "general purpose timer (GP timer), "general purpose input/output (GP I/O)", "JTAG debug link", "Ethernet MAC", "debug support unit" (DSU), "interrupt controller" and "memory controller" are connected to the bus system. The memory controller supports SRAM, SDRAM, ROM and "memory mapped" I/O over common data and address lines. Support for software debugging is provided by the DSU and two debug links ("JTAG Debug link" and "Ethernet Debug link"). Both debug links are implemented as AHB bus masters and are able to control and read out the DSU via AHB. In addition they have access also to all peripheral modules. The block "chip control unit" (CCU) controls clock generation, ports and the "design-for-test" (DFT) provisions "memory built-in self test" (BIST), "boundary scan" and "full scan". The clock frequency can be adjusted in 10MHz steps without software intervention. A clock port for the supply of external components is provided. The main system parameters are listed in TABLE I.
III. SENSOR INTERFACE (SIF) A. Basics
The goal of this AE interface design was to realize asynchronous data acquisition, synchronization and timestamping along with advanced, hardware-accelerated preprocessing features like address masking and event-rate controlled IRQ generation without loading any processing tasks onto the actual processor core.
The main functional blocks of the proposed sensor interface (SIF) include data transfer and receipt acknowledgement, data rate measurement, data filtering, timestamp assignment and input data buffer management. The asynchronous AER bus is directly connected to the addressevent interface (AE IF), whereby the bus width is hardware limited to 20 bits. The range of sensor data can be sub-divided into several definable regions-of-interest (ROIs). The transfer of the address event data from the SIF into the input data buffer implements a direct memory access (DMA) scheme and does not require any interaction from the processing device. A further substantial improvement over previous designs is a data rate-dependent system control with smart input data buffer management. This functionality is implemented based on interrupt requests (IRQs) which signal instantaneous data-rate over-or under-run with respect to adjustable thresholds. Fig. 2 shows a block diagram of the SIF and its connectivity to other system components. The SIF contains the functional blocks "ROI Filter" (including the HW bus connection AE-IF), "Data Rate Measurement", "pTAE generation", a FIFO buffer and "SIF Control", a parameterization unit controlling the functional blocks of the SIF and the data flow between them. Finally a "DMA Transfer" unit controls bus and memory access of the sensor interface. Directly attached to the SIF is the "pTAE Data Buffer" memory that intermittently holds the formatted sensor data for processing. Two data transfer paths are marked in the schematic. "I" illustrates the data transfer from the SIF to the pTAE data buffer, a dedicated memory space. Concurrently the address of the concerned memory space is signaled to the post processing unit. "II" shows a pTAE transfer from the buffer to the post-processor. TABLE II contains the main specifications of the implemented SIF.
B. SIF Implementation Details

C. SIF functional blocks 1) ROI Filter
This block performs a selection of the incoming AEs according to pre-programmed rules. For this purpose, each AE is divided into three numeric sections and compared to the individually programmable upper and lower thresholds of four configurable event filters. If the corresponding AE section value lies outside these threshold-defined regions, the AE is 
2) Data-Rate Measurement
In the "Data Rate Measurement" block, the number of events during a configurable measurement interval TMI is constantly monitored and evaluated. The interval TMI is derived from the system clock frequency f clk and is related to the time-stamp period. With a system clock frequency f clk of 100MHz, the minimum time-stamp resolution is 10ns.
3) pTAE Generation
The block "pTAE Generation" performs the actual timestamping operation and a format transformation to 32-bit processor format. Essentially, a 24-bit time stamp TS, provided by the "Time-stamp Generator" block, is assigned to each AE and the resulting data item, "Timed Address-Event" (TAE), is converted to 32-bit granularity. The SIF can also handle already time-stamped TAEs of different formats and converts them to the 32-bit pTAE-format for processing.
4) FIFO and DMA Transfer
The FIFO buffer memory holds 512 TAEs and is conceived to balance event rate peaks. The instantaneous FIFO fill level is stored in a register which is accessible by the SIF control block. The FIFO can be cleared. The direct memory access block "DMA Transfer" transfers the pTAE data according to the control information of "SIF Control" into the input data memory "pTAE Data Buffer". This storage area can be addressed by the processor core.
5) Bus Interfaces
The SIF comprises two AHB bus interfaces, one "slave" interface for communication with the core and one "master" interface for transferring TAE data via DMA (direct memory access) into a defined storage area (input data buffer). The slave interface is predominantly used for configuration of the peripheral units. The AHB bus in this system is implemented with a data width of 32-bit. As an extension to the standard AHB signals, two additional interrupts request lines are provided.
D. SIF Control
The block "SIF control" steers and parameterizes all SIF blocks and serves as central control unit for data flow in the system. Additionally it generates system control signals (IRQs) for the processor core. Data-IRQ lines signal data flow conditions in the SIF blocks like data-rate overruns and underruns, block transfer and DMA related signals and software triggers. The error-IRQs include FIFO and DMA buffer signals. The functional unit "DMA control" contains a state machine, which monitors the maskable data IRQs, the FIFO level, start and end addresses of the DMA buffers and the DMA mode. The unit supports four DMA modes that use one common DMA channel. The DMA modes "ring buffer", "double buffer", "multiple buffer" and "single buffer" essentially differ by the input data buffer management and the HW-SW handshake.
IV. RESULTS
A. Input AE peak rate
Input AE peak rate and sustained DMA transfer TAE rates have been measured on the fabricated prototype chip. Deterministic address event sequences were generated by a programmable data generator with rates up to 33M events/sec and sent to the SIF input. After recording, the received data were read out from the "pTAE Data Buffer" and checked for completeness and integrity. In order to assess the input peak event rate, bursts of address events of different lengths were applied at intra-burst event rates between 2.5 and 33M events/sec. Fig. 3 shows the percentage of lost events as a function of the event rate for bursts of between 512 and 2048 events. For bursts of 512 events or shorter (FIFO depth), the full 33M events/sec peak rate of the SIF data processing and transfer is reached. The DMA memory space available for recording at peak rates is 64MB corresponding to 8M TAEs.
B. AE core processing rate
Maximum core processing rates for processing kernels of different complexities were measured on address-events from a variable-rate programmable data generator. Incoming events are first copied to a register and classified. Subsequently a set of processor operations is carried out on each event. TABLE III contains the results as maximum sustained AE rate at the input (SIF) and processor internal byte rates as a function of kernel complexity for exemplary processing kernels.
V. CONCLUSIONS A general purpose address-event (AER) processor based on a SPARC-compatible LEON3 core with a custom data interface for asynchronous sensor data is presented. The core achieves sustained/constant load AE processing rates of >1M AE/s for limited-complexity processing kernels. The data interface provides hardware-accelerated event pre-processing including high-resolution time-stamping, robust peak-rate handling, ROI/RONI filtering and flexible DMA functionality and achieves peak AER event rates of up to 33M AE/s and sustained event rates of 5.125M AE/s at 10ns time-stamp resolution.
