40,747 research outputs found
Exploring Computation-Communication Tradeoffs in Camera Systems
Cameras are the defacto sensor. The growing demand for real-time and
low-power computer vision, coupled with trends towards high-efficiency
heterogeneous systems, has given rise to a wide range of image processing
acceleration techniques at the camera node and in the cloud. In this paper, we
characterize two novel camera systems that use acceleration techniques to push
the extremes of energy and performance scaling, and explore the
computation-communication tradeoffs in their design. The first case study
targets a camera system designed to detect and authenticate individual faces,
running solely on energy harvested from RFID readers. We design a
multi-accelerator SoC design operating in the sub-mW range, and evaluate it
with real-world workloads to show performance and energy efficiency
improvements over a general purpose microprocessor. The second camera system
supports a 16-camera rig processing over 32 Gb/s of data to produce real-time
3D-360 degree virtual reality video. We design a multi-FPGA processing pipeline
that outperforms CPU and GPU configurations by up to 10x in computation time,
producing panoramic stereo video directly from the camera rig at 30 frames per
second. We find that an early data reduction step, either before complex
processing or offloading, is the most critical optimization for in-camera
systems
TraNNsformer: Neural network transformation for memristive crossbar based neuromorphic system design
Implementation of Neuromorphic Systems using post Complementary
Metal-Oxide-Semiconductor (CMOS) technology based Memristive Crossbar Array
(MCA) has emerged as a promising solution to enable low-power acceleration of
neural networks. However, the recent trend to design Deep Neural Networks
(DNNs) for achieving human-like cognitive abilities poses significant
challenges towards the scalable design of neuromorphic systems (due to the
increase in computation/storage demands). Network pruning [7] is a powerful
technique to remove redundant connections for designing optimally connected
(maximally sparse) DNNs. However, such pruning techniques induce irregular
connections that are incoherent to the crossbar structure. Eventually they
produce DNNs with highly inefficient hardware realizations (in terms of area
and energy). In this work, we propose TraNNsformer - an integrated training
framework that transforms DNNs to enable their efficient realization on
MCA-based systems. TraNNsformer first prunes the connectivity matrix while
forming clusters with the remaining connections. Subsequently, it retrains the
network to fine tune the connections and reinforce the clusters. This is done
iteratively to transform the original connectivity into an optimally pruned and
maximally clustered mapping. Without accuracy loss, TraNNsformer reduces the
area (energy) consumption by 28% - 55% (49% - 67%) with respect to the original
network. Compared to network pruning, TraNNsformer achieves 28% - 49% (15% -
29%) area (energy) savings. Furthermore, TraNNsformer is a technology-aware
framework that allows mapping a given DNN to any MCA size permissible by the
memristive technology for reliable operations.Comment: (8 pages, 9 figures) Published in Computer-Aided Design (ICCAD), 2017
IEEE/ACM International Conference o
- …
