Near-chip Dynamic Vision Filtering for Low-Bandwidth Pedestrian
  Detection by Bisulco, Anthony et al.
Near-chip Dynamic Vision Filtering for
Low-Bandwidth Pedestrian Detection
Anthony Bisulco*, Fernando Cladera Ojeda*, Volkan Isler, Daniel D. Lee
Samsung AI Center NY
New York, NY, USA
saic-ny@samsung.com
Abstract—This paper presents a novel end-to-end system for
pedestrian detection using Dynamic Vision Sensors (DVSs). We
target applications where multiple sensors transmit data to a
local processing unit, which executes a detection algorithm. Our
system is composed of (i) a near-chip event filter that compresses
and denoises the event stream from the DVS, and (ii) a Binary
Neural Network (BNN) detection module that runs on a low-
computation edge computing device (in our case a STM32F4
microcontroller).
We present the system architecture and provide an end-to-end
implementation for pedestrian detection in an office environment.
Our implementation reduces transmission size by up to 99.6%
compared to transmitting the raw event stream. The average
packet size in our system is only 1397 bits, while 307.2 kb are
required to send an uncompressed DVS time window. Our
detector is able to perform a detection every 450ms, with an
overall testing F1 score of 83%. The low bandwidth and energy
properties of our system make it ideal for IoT applications.
Index Terms—dynamic vision sensors, binary neural networks,
pedestrian detection, FPGA
I. INTRODUCTION
Dynamic Vision Sensor (DVS) technologies hold the po-
tential to revolutionize imaging systems by enabling asyn-
chronous, event-based image acquisition. DVS pixels generate
and transmit events only when there is a change in light
intensity of a pixel. This approach has many advantages com-
pared to Conventional Image-based Sensors (CIS), such as:
(i) higher dynamic range, (ii) higher sampling rates, (iii) lower
bandwidth requirements between the sensor and the processing
unit, and (iv) lower power consumption. These characteristics
make DVSs attractive sensors for energy-constrained scenarios
such as the Internet of Things (IoT) applications.
In this paper, we focus on the application of DVS based
systems to pedestrian detection. A common solution to this
problem involves streaming data from a CIS to a processing
module that runs the detection algorithm. Since the raw data
from the imaging sensor can be overwhelming, usually the
images are compressed before transmission. This approach
(i) requires a large bandwidth or low frame rate to stream the
data in a bandwidth constrained environment, and (ii) raises in-
herent privacy concerns, as streamed images may be accessed
by malicious third-party actors. Inference at the edge [1],
*Both authors contributed equally to this work.
Fig. 1: Near-chip DVS filter architecture stages. We observe a
reduction of the required bandwidth across the different stages
of the filter, as well as a sparser event stream. The privacy is
enhanced due to lossy subsampling, as shown by windowed
events on the figure.
where data acquisition and processing are performed on-
device, has been proposed as a solution for these problems.
Unfortunately, the amount of energy required for inference at
the edge when using CIS limits its applicability. Near-chip
feature extraction and data compression has the potential to
provide a middle-ground solution.
Towards this goal, we propose a near-chip filtering architec-
ture for pedestrian detection (Fig. 1). Our solution requires low
bandwidth for transmitting the intermediate representations
between the sensor and the processing platform. Moreover,
it enhances privacy because of lossy subsampling, which
makes it impossible to recover the original event represen-
tation. A single compressed packet issued from our near-
chip filter has a total length of 1397 bits on average, and
may be streamed through low-bitrate channels to a centralized
networking node (Fig. 2).
Contributions: we have two main contributions 1) A low-
complexity hardware implementation of an event-filter suited
for DVS which reduces the bandwidth required to transmit
the events by up to 99.6%, targeted for a pedestrian detection
system, and 2) An efficient detection algorithm that uses
our intermediate representation, using a 32-bit microcontroller
architecture.
ar
X
iv
:2
00
4.
01
68
9v
1 
 [c
s.C
V]
  3
 A
pr
 20
20
II. RELATED WORK
We start with an overview of related work. The use of DVS
for detection and pattern recognition has received recent atten-
tion [2]. Usual target tasks include digit recognition and simple
shapes such as card suits [3], face detection [4], and pedestrian
detection [5]–[7]. Most of these approaches are implemented
on Graphic Processing Unit (GPU) or microprocessor-based
architectures and are not specifically targeted for IoT appli-
cations. Typically, IoT applications require low energy con-
sumption due to their strict energy budgets.
The asynchronous and low-bandwidth nature of event cam-
eras make them potentially ground-breaking for IoT. However,
DVS sensors are inherently noisy, making their application
challenging. Recent work addresses the filtering of DVS
noise [8], [9]. A description of filtering techniques with their
respective hardware implementations is presented in [10].
However, these filters are targeted for high-bandwidth appli-
cations, and they are not specifically suited for bandwidth
reduction. Hence, these filters are not necessarily suitable for
IoT scenarios.
Several end-to-end IoT detection architectures have been
proposed as well. In [11], Rusci et al. showcased the advan-
tages of the sensor for always-on applications, by coupling an
event-based image sensor with a PULPv3 processor. While
significant reductions in energy consumption are shown in
this work, the event stream is sent to an embedded platform
without any further preprocessing. Thus, this method requires
the processing platform to be near the DVS due to the
bandwidth required to transmit the events. In [12], the authors
present a FPGA suitable architecture for DVS detection using
principal component analysis (PCA). While this architecture
has good performance on classification, it is not particularly
targeted for low-power architectures as they use a high-end
Field-Programmable Gate Array (FPGA) family. Other end-to-
end architectures, such as TrueNorth [13] make use of specific
neuromorphic hardware to process the event stream. In this
work, the gesture recognition task is analyzed, obtaining an
accuracy of 96.5% when detecting 11 different gestures, while
consuming 200mW. Compared to this work, our system has
the advantage of (i) a lower energy consumption, and (ii) the
capability of saving the low-bandwidth features for further
down-stream applications.
III. METHOD
The method presented in this paper consists of two main
modules. Each one is composed of intermediate submodules.
In this section, we will describe the algorithms and present
implementation details. Specifically, we will address:
• The filtering module: a network-aware component which
runs near chip. It denoises the DVS stream and com-
presses the data for further downstreaming processing.
The input of this module is the raw event stream issued
from the sensor, and the outputs are discrete Huffman-
coded packets that are transmitted to the detection mod-
ule.
Near Chip FilterNear Chip Filter
DVSDVS
Event
Stream
Low Bandwidth
Stream
Low-complexity
Edge Compute
Database
D
etection
Database Compute
Conventional
Image Sensor
Conventional
Image Sensor
High Bandwidth 
Stream
Event
Stream
Fig. 2: IoT system with DVS for near-device classification.
The bandwidth required to transmit the low-bandwidth stream
is significantly lower than a compressed video stream such
as H.265. For instance, a 10 FPS 640x480 H.265 stream
may require at least 100 kbps, compared to 2.23 kbps of our
approach, at a similar FPS.
• The detection module: receives the coded event repre-
sentation packet from the filtering module, decodes it,
and performs the pedestrian detection.
The combination of these two modules reduces the filtered
event bandwidth while maintaining high detection accuracy.
A. Filtering Module Description and Implementation
The filtering module consists of four main submod-
ules: Event Parsing, Coincidence Detection, Aggregation-
Downsampling, and Huffman encoding. The architecture was
implemented using the Chisel 3 [14] hardware description
language. We will describe these submodules in this section,
as well as the sensor used during our experiments.
1) Sensor: We used a sensor similar to the one described
in [15], with an operated resolution of 480× 320 pixels. The
event rate of our sensor was 50M events/s. The DVS was
connected directly to the FPGA, which was responsible for
processing the events in the group address-event representation
(G-AER) packet format [16].
2) Event Parser:
The event parser submodule translates the G-AER repre-
sentation of the sensor to a (x, y, p) representation, where x
and y are the row and column addresses of the pixel in the
sensor array, and p is the polarity encoded with two bits. While
G-AER allows for a significant bandwidth reduction between
the sensor and the FPGA, inside our architecture it is easier
to work with the representation described above for memory
addressing purposes.
Implementation: The Event Parsing submodule was imple-
mented as an input First-In, First-Out (FIFO) queue capable of
storing 256 G-AER events, followed by a LUT-based decoder.
Preamble
(32 bits)
Payload
(variable)
Checksum
(32 bits)
Bit Packet
Fig. 3: DVS near-chip filter implementation end-to-end diagram, displaying its submodules (Event Parser, Coincidence
Detection, Aggregation-Downsampling, Huffman Coding). For simplicity, in this work we used an UART interface to
communicate with the detection module, but this submodule may be replaced with other communication interfaces. The
packet format output fields is also displayed, showing the lengths of the different fields.
The FIFO allows us to handle a rapid burst of events from the
sensor.
3) Coincidence Detection:
DVS pixel arrays are susceptible to background activity
noise, which is displayed as impulse noise when DVS events
are observed in a finite time window. Commonly, noisy pixels
will be isolated compared to signal pixels, and thus may be
removed by observing a sequence of pixel activations over
space or time. Our filter works by detecting tuples of active
pixels in the vertical and horizontal spatial directions.
The coincidence detection serves a dual purpose in our
architecture: first, it collects events in a predefined time
window of length τ . Then it performs a logical AND operation
between adjacent pixels. This filter is inspired by the Object
Motion Detector (OMD) filter described in [10], but it has two
fundamental differences: (i) we use simpler bitwise operations
between the pixels instead of a complex integrate-and-fire
model, and (ii) a coincidence is detected only if two pixels with
the same polarity are detected. In our architecture, τ = 3ms.
Implementation: The coincidence detection is imple-
mented as two discrete memories (M0,M1) each of size
480 × 320 × 2 bits. In phase 1, t = n · τ , the memory
array M0 starts in a cleared state, and it collects events until
t = (n+1) · τ , when the window period has elapsed. In phase
2, from t = (n+1) · τ until t = (n+2) · τ , the memory array
M0 is read and the coincidences are evaluated by observing
adjacent active vertical and horizontal pixels. At the same
time, M1 is collecting the events corresponding to this time
window. The output of this submodule is composed of two
channels, corresponding to the filter applied in the vertical
and horizontal dimensions. Only active pixels are sent to the
aggregation submodule.
On the FPGA, all the memory blocks were implemented
with dual-port BRAM slices. In the readout operation, a line
buffer of 480 pixels is used to store the intermediate pixels
read. The coincidence detection submodule also propagates a
signal indicating the start and end of a time window to the
aggregation submodule.
4) Aggregation and Subsampling:
In a static DVS application, when binning events in a time
window, the thickness of the edge depends on both the velocity
of the object and the length of the time window. The function
of the aggregation submodule is to increase the thickness of
the edge to a normalized size before performing inference. For
this, the aggregation submodule performs successive logical
OR operations across the temporal dimension until the number
of events in the aggregated frame is above a threshold. If the
threshold is not achieved in a 5τ time window, the frame buffer
is cleared and no events are propagated.
After performing the aggregation operation, an 8× 8 max-
pooling operation is performed to the aggregated time window.
The max-pool operation aims to reduce the scale dependency
of the object in the scene, and it reduces the dimensional-
ity of the data. The subsampling submodule operates asyn-
chronously, only when the aggregation submodule reports new
data.
Implementation: The aggregation submodule described is
duplicated in order to independently process each channel
coming from the coincidence detection submodule. Each pixel
streamed into aggregation is stored in the aggregated frame
block memory (480 × 320). At the start of every τ window,
a window counter is incremented. This counter is used for
implementing the temporal window integration limit of 5τ .
Also, an event counter is kept for the number of pixels in
the max pooled and aggregated window. At the end of every
τ -sized window, the event counter is checked to be above
the event threshold (1000 events). Given this condition, the
aggregated frame is sent to subsampling.
The subsampling submodule is implemented using a block
memory layout. Normally to store an image in memory, a
column based layout is used, where pixels are stored sequen-
tially based on columns index. A problem with using column
indexing for max-pooling is that for each operation different
memory blocks must be accessed. Instead, we decided to use a
block memory layout: each memory block stores pixels in the
same 8 × 8 max-pooling area. Hence, a single memory read
operation and comparison against 0 can perform max-pooling
in a single clock cycle.
5) Huffman encoder and Filter Interface:
After aggregation, the output of the filter is a discrete packet
of 2× 60× 40 bits, corresponding to the pixel readouts of the
downsampled aggregated image, for the vertical and horizontal
channel. To further reduce the data bandwidth, we perform a
Huffman encoding using a precomputed 256-word dictionary.
On average, this results in 3.6× reduction of the payload size.
Implementation: The Huffman filter is implemented by
storing the codeword dictionary in BRAM and doing a lookup
over the output of the aggregation submodule. The data is
preceded by a 32-bit preamble header, and a low-complexity
Fletcher-32 checksum [17] is appended at the end of the
packet (Fig. 3).
For testing purposes, we streamed the event representation
using an UART serial interface between the FPGA and the de-
tection module. Nonetheless, other communication interfaces
may be used by just changing the last submodule in the near-
chip filter. For instance, we could use the same filtering scheme
with inter-chip communication protocols, such as I2C or SPI,
as well as other field-bus protocols.
B. Detection Module Architecture and Implementation (BNN)
The detection module is used to perform binary classifica-
tion for pedestrian detection from the sparse output of the
filter. It is a Convolutional Neural Network (CNN) based
architecture with binary weights as described in [18].
The network architecture, presented in Fig. 4, is composed
of two hundred 10 × 10 convolutional filters with binary
(±1) weights. As the output of the filter is encoded using
a binary {0, 1} representation, the convolution operation is
implemented as a binary AND operation with the positive and
negative convolution filters, followed by a population count
operation.
To accelerate our calculations, we used Digital Signal
Processing (DSP) single instruction, multiple data (SIMD)
instructions, as well as the floating-point unit (FPU) of the
Cortex-M4 core. This processor does not have a dedicated
population count instruction (popcnt), which is required for the
neural network inference process. Therefore, we implemented
this operation as a LUT in flash. While this approach increases
the storage space required, it is a tradeoff for faster execution
speeds.
The resulting convolution value is then scaled by a positive
factor α, followed by a ReLU nonlinearity, and whole frame
max-pooling. The detection value is obtained after a linear
readout and thresholding.
Fig. 4: Binary architecture of the detector module running
on the STM32F4 microcontroller. The required memory and
storage requirements are indicated for each submodule of the
architecture.
IV. RESULTS
A. Filtering Module Hardware Implementation Results
The filter was synthesized on a Spartan-6 FPGA. The maxi-
mum clock frequency achieved by our design was 51.18MHz.
We observed that our solution uses few resources
on a low-end FPGA platform: the utilization of Regis-
ters is 1.10% (598 slices), LUTs 14.85% (4053 slices) and
DSPs 1.72% (1 slice). The BRAM utilization is higher,
88.79% (103 slices) mainly due to the intermediate represen-
tations required to acquire events during a time window in
the coincidence detection and aggregation submodules. This
utilization includes the FIFO buffer for the DVS packets
in the event parser submodule, as well as the Chisel 3
DecoupledIO interfaces used to transmit information asyn-
chronously between the submodules.
For reference, PCA-RECT [12] reports an utilization of
2065 registers, 18238LUTs, 48BRAMs and 4DSPs in a Zync-
7020 SoC. Other detection architectures, such as NullHop [19],
report an even higher resource utilization. Our architecture
outperforms PCA-RECT on all utilization parameters but the
number of BRAMs. This is not a surprise, as offloading
the computational load of the detection algorithm to the
computing node helps keep the slice count low. Our approach
has low utilization, yet it keeps the bandwidth reduced without
requiring a full detector implementation near-chip.
We also synthesized RetinaFilter, which is the implemen-
tation of the background activity filter for DVS by Linares-
Barranco et al. [10]. For this, we used the same target archi-
tecture and configuration parameters that we used in our filter.
This architecture requires 216 registers, 350LUTs, 16BRAMs
and no DSPs. Undoubtedly, this architecture has less slices
requirements compared to our filter, but it does not offer any
gains in bandwidth reduction beyond just denoising the image.
Thus, this implementation may not be directly applied in an
IoT environment for object detection.
To assess the power consumption of the near-chip architec-
ture, we used Xilinx XPower Analyzer. Our module requires
a total of 91.1mW at 50MHz. For reference, the static
power consumption of [12] is 3W and the dynamic power
consumption is 0.37W.
B. Detection Module Implementation Results
The detection module was implemented on a STM32F429
microcontroller. The output of the filtering module is fed
into the microcontroller using an UART port working at
115200 bps. Packets are copied into the memory using Direct
Memory Access (DMA), and the network starts processing
them as soon as a full packet is detected and the checksum
is verified. We note that the microcontroller is kept in sleep
state when there are no packets sent through the UART, which
corresponds to the case when there is no significant activity
to output events in the DVS event filter. This helps reduce the
overall energy consumption of the detection module.
Our network achieved an average of one inference every
450ms with the microcontroller core running at 180MHz.
It required 26.76KB of flash memory to operate: 18.76KB
corresponded to our network parameters, and 8KB were used
to accelerate calculations through precomputed look-up tables
(such as the previously described population count). Finally,
our network requires 3.25KB of RAM to operate.
C. Filtering Module Performance
Throughout this work various approaches were tested in
order to achieve high compression with little reduction in
testing accuracy. The entire pipeline of our filtering module
consists of: Coincidence Detection (CO), Aggregation (AG),
Max Pooling (MP) and Huffman Coding. Each of these
submodules reduce the bandwidth of the event stream and
increase the detection accuracy. For the purposes of explaining
the design choices made, we present various ablations of our
method (Fig. 5). We used F1 as a metric to compare the
ablations. F1 score is the harmonic mean between precision
and recall scores.
To perform our measurements, we used a DVS dataset of
273 clips of humans, each one with a duration of 2.5s, and
548–0.75s clips of non-humans. The dataset was split into a
training set of 80% and a testing set of 20%. This dataset
resulted in 92380 3ms time windows of person and object
movement. The raw event stream bandwidth was 22.25Mb/s
on average.
First, we trained our binary neural network using our full
pipeline, and we obtained a 83% F1 testing score. Addition-
ally, the measured bandwidth after filtering was 74.58 kbps.
The first ablations was removing the coincidence detection
submodule. This resulted in lower testing F1 score and higher
bandwidth compared to the full pipeline. This shows the effect
of the coincidence detection removing noise: DVS noisy events
increase bandwidth, and noisier images are harder to detect.
The second ablation was removing the aggregation submod-
ule. This resulted in the testing F1 score was smaller and the
output bandwidth of the filter was higher. Higher bandwidth
is due to the additional frames from not temporally filtering.
A lower testing F1 score without aggregation is due to less
defined features for shallow learning using BNNs.
The third ablation was changing the max-pooling size. The
default value used in our pipeline was 8×8. When increasing
this default value, bandwidth decreased and testing F1 score
Full ArchOurs:
Better
MP[8]
AGMP
AG
COMP
COAGCO
RAW
COAGMP
-16
COAGMP
-8
COAGMP
-4
Fig. 5: Testing F1 score and Bandwidth trade-off between
different compression pipelines. The labels in this chart refer
to: Coincidence Detection (CO), Aggregation (AG), Max
Pooling (MP-#) with the max pooling ratio indicated after the
dash. Our full architecture is COAGMP-8+Huffman Coding.
The bandwidths recorded were calculated using 92380 3ms
time windows. F1 score was calculated from testing on DVS
person detection dataset.
decreased. This is due to the lack of features due to the large
dimensionality reduction. As for decreasing the max-pooling
size, bandwidth increased, yet performance increased by little
(near 1%). This performance increase was small enough, that
we incurred this trade off for a smaller bandwidth model.
Our filter is capable of processing data as fast as the coin-
cidence detection window (3ms), resulting in the bandwidth
reported below (74.58 kbps). We may further reduce the band-
width by temporally downsampling the detection rate, through
a refractory period mechanism. For instance, if the filter
produces an output every 100ms the final required bandwidth
is 2.23 kbps on average (when some time windows are not
propagated due to low event counts), and at most 13.97 kbps.
This enables the use of our architecture on IoT applications
using low rate wireless standards such as 802.15.4 [20] and
LoRA and NB-IoT [21].
V. CONCLUSION AND FURTHER WORK
This paper introduces a novel end-to-end system for pedes-
trian detection in IoT environments. Our system consists of
two modules: a near-chip filtering module and a detection
module.
The near-chip filtering enabled a reduction of up to 99.6%
in bandwidth, enabling the use of DVS for low-bandwidth
communication links. Our architecture uses few resources on
a low-end FPGA. The main bottleneck in our design was the
number of BRAMs. We estimated that our module has a power
consumption of 91.1mW
It was shown that, despite significant reduction in size, this
representation is still useful for learning. Additionally, it was
shown that a centralized detection module may process this
representation and detect pedestrians in the scene. The compu-
tational complexity of the detection algorithm is low because
(i) we use a shallow network with a low number of feature
detectors, and (ii) the use of a binary network representation
reduces the execution time on a low-end microcontroller.
Some future work for this investigation involves imple-
menting applications fully on the edge. This would require
integrating the filtering algorithm along with the detection
algorithm. Therefore, one would benefit from the additional
bandwidth savings of only sending the detection result over
the wire rather than a representation. Due to the optimized
nature of our filter for a single detection task, we expect to
get better results compared to [12].
Another direction for future work would be an on-chip
Application Specific Integrated Circuit (ASIC) implementation
combining our filtering algorithm with the DVS. This would
produce additional power savings along with bandwidths sav-
ings, by removing the need to go off-sensor for filtering.
Finally, an interesting approach for this work would be to
perform classification based on multiple DVS streams from
different cameras. As the output of the filter is lightweight, we
could imagine using multiple sensors for a single classification
performed on the low complexity edge compute.
REFERENCES
[1] Carole-Jean Wu, David Brooks, Kevin Chen, Douglas Chen,
Sy Choudhury, Marat Dukhan, Kim Hazelwood, Eldad Isaac,
Yangqing Jia, Bill Jia, et al., “Machine learning at facebook:
Understanding inference at the edge,” in 2019 IEEE Interna-
tional Symposium on High Performance Computer Architecture
(HPCA). IEEE, 2019, pp. 331–344.
[2] Guillermo Gallego, Tobi Delbruck, Garrick Orchard, Chiara
Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, An-
drew Davison, Joerg Conradt, Kostas Daniilidis, et al., “Event-
based vision: A survey,” arXiv:1904.08405, 2019.
[3] Xavier Lagorce, Garrick Orchard, Francesco Galluppi,
Bertram E Shi, and Ryad B Benosman, “Hots: a hierarchy
of event-based time-surfaces for pattern recognition,” IEEE
transactions on pattern analysis and machine intelligence, vol.
39, no. 7, pp. 1346–1359, 2016.
[4] Souptik Barua, Yoshitaka Miyatani, and Ashok Veeraraghavan,
“Direct face detection and video reconstruction from event
cameras,” in 2016 IEEE WACV. IEEE, 2016, pp. 1–9.
[5] Rohan Ghosh, Abhishek Mishra, Garrick Orchard, and Nitish V
Thakor, “Real-time object recognition and orientation estima-
tion using an event-based camera and CNN,” in 2014 IEEE
BioCAS, 2014, pp. 544–547.
[6] Jia Li, Feng Shi, Wei-Heng Liu, Dongqing Zou, Qiang Wang,
Paul KJ Park, and Hyunsurk Ryu, “Adaptive temporal pooling
for object detection using dynamic vision sensor.,” in BMVC,
2017.
[7] Z. Jiang, P. Xia, K. Huang, W. Stechele, G. Chen, Z. Bing, and
A. Knoll, “Mixed frame-/event-driven fast pedestrian detection,”
in 2019 International Conference on Robotics and Automation
(ICRA), May 2019, pp. 8332–8338.
[8] A. Khodamoradi and R. Kastner, “O(n)-space spatiotemporal
filter for reducing noise in neuromorphic vision sensors,” IEEE
Transactions on Emerging Topics in Computing, pp. 1–1, 2017.
[9] A. Linares-Barranco, F. Go´mez-Rodrı´guez, V. Villanueva,
L. Longinotti, and T. Delbru¨ck, “A USB3.0 FPGA event-based
filtering and tracking framework for dynamic vision sensors,”
in 2015 IEEE International Symposium on Circuits and Systems
(ISCAS), May 2015, pp. 2417–2420.
[10] A. Linares-Barranco, F. Perez-Pen˜a, D. P. Moeys, F. Gomez-
Rodriguez, G. Jimenez-Moreno, S. Liu, and T. Delbruck, “Low
latency event-based filtering and feature extraction for dynamic
vision sensors in real-time fpga applications,” IEEE Access, vol.
7, pp. 134926–134942, 2019.
[11] Manuele Rusci, Davide Rossi, Eric Flamand, Massimo Gottardi,
Elisabetta Farella, and Luca Benini, “Always-ON visual node
with a hardware-software event-based binarized neural network
inference engine,” in Proceedings of the 15th ACM International
Conference on Computing Frontiers. ACM, 2018, pp. 314–319.
[12] Bharath Ramesh, Andre´s Ussa, Luca Della Vedova, Hong
Yang, and Garrick Orchard, “PCA-RECT: an energy-efficient
object detection approach for event cameras,” CoRR, vol.
abs/1904.12665, 2019.
[13] A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. D.
Nolfo, T. Nayak, A. Andreopoulos, G. Garreau, M. Mendoza,
J. Kusnitz, M. Debole, S. Esser, T. Delbruck, M. Flickner, and
D. Modha, “A low power, fully event-based gesture recognition
system,” in 2017 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), July 2017, pp. 7388–7397.
[14] Jonathan Bachrach, Huy Vo, Brian Richards, Yunsup Lee,
Andrew Waterman, Rimas Avizˇienis, John Wawrzynek, and
Krste Asanovic´, “Chisel: constructing hardware in a scala
embedded language,” in DAC Design Automation Conference
2012. IEEE, 2012, pp. 1212–1221.
[15] Bongki Son, Yunjae Suh, Sungho Kim, Heejae Jung, Jun-Seok
Kim, Changwoo Shin, Keunju Park, Kyoobin Lee, Jinman Park,
Jooyeon Woo, et al., “4.1 A 640× 480 dynamic vision sensor
with a 9µm pixel and 300Meps address-event representation,”
in 2017 IEEE ISSCC, 2017, pp. 66–67.
[16] Yoel Yaffe, Nathan Levy, Evgeny Soloveichik, Sebastien Derhy,
Ayal Keisar, Elad Rozin, Liron Artsi, Jun-Seok Kim, Keunju
Park, Bongki Son, Yunjae Suh, Heejae Jung, Changwoo Shin,
Jooyeon Woo, Yohan Roh, Hyunku Lee, and Hyunsurk(Eric)
Ryu, “Dynamic vision sensor: The road to market,” http://rpg.
ifi.uzh.ch/docs/ICRA17workshop/Samsung.pdf.
[17] J. Fletcher, “An arithmetic checksum for serial transmissions,”
IEEE Transactions on Communications, vol. 30, no. 1, pp. 247–
252, January 1982.
[18] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and
Ali Farhadi, “XNOR-Net: ImageNet classification using binary
convolutional neural networks,” arXiv:1603.05279, 2016.
[19] Alejandro Linares-Barranco, Antonio Rios-Navarro, Ricardo
Tapiador-Morales, and Tobi Delbruck, “Dynamic Vision Sensor
integration on FPGA-based CNN accelerators for high-speed
visual classification,” arXiv:1905.07419, 2019.
[20] C/LM - LAN/MAN Standards Committee, “802.15.4-
2015 - IEEE Standard for Low-Rate Wireless
Networks,” https://standards.ieee.org/content/ieee-
standards/en/standard/802 15 4-2015.html, 2015.
[21] Rashmi Sharan Sinha, Yiqiao Wei, and Seung-Hoon Hwang, “A
survey on lpwa technology: Lora and nb-iot,” Ict Express, vol.
3, no. 1, pp. 14–21, 2017.
