5,747 research outputs found
MorphIC: A 65-nm 738k-Synapse/mm Quad-Core Binary-Weight Digital Neuromorphic Processor with Stochastic Spike-Driven Online Learning
Recent trends in the field of neural network accelerators investigate weight
quantization as a means to increase the resource- and power-efficiency of
hardware devices. As full on-chip weight storage is necessary to avoid the high
energy cost of off-chip memory accesses, memory reduction requirements for
weight storage pushed toward the use of binary weights, which were demonstrated
to have a limited accuracy reduction on many applications when
quantization-aware training techniques are used. In parallel, spiking neural
network (SNN) architectures are explored to further reduce power when
processing sparse event-based data streams, while on-chip spike-based online
learning appears as a key feature for applications constrained in power and
resources during the training phase. However, designing power- and
area-efficient spiking neural networks still requires the development of
specific techniques in order to leverage on-chip online learning on binary
weights without compromising the synapse density. In this work, we demonstrate
MorphIC, a quad-core binary-weight digital neuromorphic processor embedding a
stochastic version of the spike-driven synaptic plasticity (S-SDSP) learning
rule and a hierarchical routing fabric for large-scale chip interconnection.
The MorphIC SNN processor embeds a total of 2k leaky integrate-and-fire (LIF)
neurons and more than two million plastic synapses for an active silicon area
of 2.86mm in 65nm CMOS, achieving a high density of 738k synapses/mm.
MorphIC demonstrates an order-of-magnitude improvement in the area-accuracy
tradeoff on the MNIST classification task compared to previously-proposed SNNs,
while having no penalty in the energy-accuracy tradeoff.Comment: This document is the paper as accepted for publication in the IEEE
Transactions on Biomedical Circuits and Systems journal (2019), the
fully-edited paper is available at
https://ieeexplore.ieee.org/document/876400
XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference
Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to
conventional deep neural networks at a fraction of the cost in terms of memory
and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully
digital configurable hardware accelerator IP for BNNs, integrated within a
microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid
SRAM / standard cell memory. The XNE is able to fully compute convolutional and
dense layers in autonomy or in cooperation with the core in the MCU to realize
more complex behaviors. We show post-synthesis results in 65nm and 22nm
technology for the XNE IP and post-layout results in 22nm for the full MCU
indicating that this system can drop the energy cost per binary operation to
21.6fJ per operation at 0.4V, and at the same time is flexible and performant
enough to execute state-of-the-art BNN topologies such as ResNet-34 in less
than 2.2mJ per frame at 8.9 fps.Comment: 11 pages, 8 figures, 2 tables, 3 listings. Accepted for presentation
at CODES'18 and for publication in IEEE Transactions on Computer-Aided Design
of Circuits and Systems (TCAD) as part of the ESWEEK-TCAD special issu
Criticality Aware Soft Error Mitigation in the Configuration Memory of SRAM based FPGA
Efficient low complexity error correcting code(ECC) is considered as an
effective technique for mitigation of multi-bit upset (MBU) in the
configuration memory(CM)of static random access memory (SRAM) based Field
Programmable Gate Array (FPGA) devices. Traditional multi-bit ECCs have large
overhead and complex decoding circuit to correct adjacent multibit error. In
this work, we propose a simple multi-bit ECC which uses Secure Hash Algorithm
for error detection and parity based two dimensional Erasure Product Code for
error correction. Present error mitigation techniques perform error correction
in the CM without considering the criticality or the execution period of the
tasks allocated in different portion of CM. In most of the cases, error
correction is not done in the right instant, which sometimes either suspends
normal system operation or wastes hardware resources for less critical tasks.
In this paper,we advocate for a dynamic priority-based hardware scheduling
algorithm which chooses the tasks for error correction based on their area,
execution period and criticality. The proposed method has been validated in
terms of overhead due to redundant bits, error correction time and system
reliabilityComment: 6 pages, 8 figures, conferenc
SecuCode: Intrinsic PUF Entangled Secure Wireless Code Dissemination for Computational RFID Devices
The simplicity of deployment and perpetual operation of energy harvesting
devices provides a compelling proposition for a new class of edge devices for
the Internet of Things. In particular, Computational Radio Frequency
Identification (CRFID) devices are an emerging class of battery-free,
computational, sensing enhanced devices that harvest all of their energy for
operation. Despite wireless connectivity and powering, secure wireless firmware
updates remains an open challenge for CRFID devices due to: intermittent
powering, limited computational capabilities, and the absence of a supervisory
operating system. We present, for the first time, a secure wireless code
dissemination (SecuCode) mechanism for CRFIDs by entangling a device intrinsic
hardware security primitive Static Random Access Memory Physical Unclonable
Function (SRAM PUF) to a firmware update protocol. The design of SecuCode: i)
overcomes the resource-constrained and intermittently powered nature of the
CRFID devices; ii) is fully compatible with existing communication protocols
employed by CRFID devices in particular, ISO-18000-6C protocol; and ii) is
built upon a standard and industry compliant firmware compilation and update
method realized by extending a recent framework for firmware updates provided
by Texas Instruments. We build an end-to-end SecuCode implementation and
conduct extensive experiments to demonstrate standards compliance, evaluate
performance and security.Comment: Accepted to the IEEE Transactions on Dependable and Secure Computin
Statistical analysis and comparison of 2T and 3T1D e-DRAM minimum energy operation
Bio-medical wearable devices restricted to their small-capacity embedded-battery require energy-efficiency of the highest order. However, minimum-energy point (MEP) at sub-threshold voltages is unattainable with SRAM memory, which fails to hold below 0.3V because of its vanishing noise margins. This paper examines the minimum-energy operation point of 2T and 3T1D e-DRAM gain cells at the 32-nm technology node with different design points: up-sizing transistors, using high- V th transistors, read/write wordline assists; as well as operating conditions (i.e., temperature). First, the e-DRAM cells are evaluated without considering any process variations. Then, a full-factorial statistical analysis of e-DRAM cells is performed in the presence of threshold voltage variations and the effect of upsizing on mean MEP is reported. Finally, it is shown that the product of the read and write lengths provides a knob to tradeoff energy-efficiency for reliable MEP energy operation.Peer ReviewedPostprint (author's final draft
An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics
Near-sensor data analytics is a promising direction for IoT endpoints, as it
minimizes energy spent on communication and reduces network load - but it also
poses security concerns, as valuable data is stored or sent over the network at
various stages of the analytics pipeline. Using encryption to protect sensitive
data at the boundary of the on-chip analytics engine is a way to address data
security issues. To cope with the combined workload of analytics and encryption
in a tight power envelope, we propose Fulmine, a System-on-Chip based on a
tightly-coupled multi-core cluster augmented with specialized blocks for
compute-intensive data processing and encryption functions, supporting software
programmability for regular computing tasks. The Fulmine SoC, fabricated in
65nm technology, consumes less than 20mW on average at 0.8V achieving an
efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to
25MIPS/mW in software. As a strong argument for real-life flexible application
of our platform, we show experimental results for three secure analytics use
cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN
consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with
secured remote recognition in 5.74pJ/op; and seizure detection with encrypted
data collection from EEG within 12.7pJ/op.Comment: 15 pages, 12 figures, accepted for publication to the IEEE
Transactions on Circuits and Systems - I: Regular Paper
- …