12,065 research outputs found
Power Aware Learning for Class AB Analogue VLSI Neural Network
Recent research into Artificial Neural Networks (ANN) has highlighted the potential of using compact analogue ANN hardware cores in embedded mobile devices, where power consumption of ANN hardware is a very significant implementation issue. This paper proposes a learning mechanism suitable for low-power class AB type analogue ANN that not only tunes the network to obtain minimum error, but also adaptively learns to reduce power consumption. Our experiments show substantial reductions in the power budget (30% to 50%) for a variety of example networks as a result of our power-aware learning
The Rolf of Test Chips in Coordinating Logic and Circuit Design and Layout Aids for VLSI
This paper emphasizes the need for multipurpose test chips and comprehensive procedures for use in supplying accurate input data to both logic and circuit simulators and chip layout aids. It is shown that the location of test structures within test chips is critical in obtaining representative data, because geometrical distortions introduced during the photomasking process can lead to
significant intrachip parameter variations. In order to transfer test chip designs quickly, accurately, and economically, a commonly accepted portable chip layout notation and commonly accepted parametric tester language are needed. In order to measure test chips more accurately and more rapidly, parametric testers with improved architecture need to be developed in conjunction with
innovative test structures with on-chip signal conditioning
Hydra: An Accelerator for Real-Time Edge-Aware Permeability Filtering in 65nm CMOS
Many modern video processing pipelines rely on edge-aware (EA) filtering
methods. However, recent high-quality methods are challenging to run in
real-time on embedded hardware due to their computational load. To this end, we
propose an area-efficient and real-time capable hardware implementation of a
high quality EA method. In particular, we focus on the recently proposed
permeability filter (PF) that delivers promising quality and performance in the
domains of HDR tone mapping, disparity and optical flow estimation. We present
an efficient hardware accelerator that implements a tiled variant of the PF
with low on-chip memory requirements and a significantly reduced external
memory bandwidth (6.4x w.r.t. the non-tiled PF). The design has been taped out
in 65 nm CMOS technology, is able to filter 720p grayscale video at 24.8 Hz and
achieves a high compute density of 6.7 GFLOPS/mm2 (12x higher than embedded
GPUs when scaled to the same technology node). The low area and bandwidth
requirements make the accelerator highly suitable for integration into SoCs
where silicon area budget is constrained and external memory is typically a
heavily contended resource
An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics
Near-sensor data analytics is a promising direction for IoT endpoints, as it
minimizes energy spent on communication and reduces network load - but it also
poses security concerns, as valuable data is stored or sent over the network at
various stages of the analytics pipeline. Using encryption to protect sensitive
data at the boundary of the on-chip analytics engine is a way to address data
security issues. To cope with the combined workload of analytics and encryption
in a tight power envelope, we propose Fulmine, a System-on-Chip based on a
tightly-coupled multi-core cluster augmented with specialized blocks for
compute-intensive data processing and encryption functions, supporting software
programmability for regular computing tasks. The Fulmine SoC, fabricated in
65nm technology, consumes less than 20mW on average at 0.8V achieving an
efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to
25MIPS/mW in software. As a strong argument for real-life flexible application
of our platform, we show experimental results for three secure analytics use
cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN
consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with
secured remote recognition in 5.74pJ/op; and seizure detection with encrypted
data collection from EEG within 12.7pJ/op.Comment: 15 pages, 12 figures, accepted for publication to the IEEE
Transactions on Circuits and Systems - I: Regular Paper
- …