1,307 research outputs found
Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing
Deep networks are now able to achieve human-level performance on a broad
spectrum of recognition tasks. Independently, neuromorphic computing has now
demonstrated unprecedented energy-efficiency through a new chip architecture
based on spiking neurons, low precision synapses, and a scalable communication
network. Here, we demonstrate that neuromorphic computing, despite its novel
architectural primitives, can implement deep convolution networks that i)
approach state-of-the-art classification accuracy across 8 standard datasets,
encompassing vision and speech, ii) perform inference while preserving the
hardware's underlying energy-efficiency and high throughput, running on the
aforementioned datasets at between 1200 and 2600 frames per second and using
between 25 and 275 mW (effectively > 6000 frames / sec / W) and iii) can be
specified and trained using backpropagation with the same ease-of-use as
contemporary deep learning. For the first time, the algorithmic power of deep
learning can be merged with the efficiency of neuromorphic processors, bringing
the promise of embedded, intelligent, brain-inspired computing one step closer.Comment: 7 pages, 6 figure
Hierarchical Representations for Efficient Architecture Search
We explore efficient neural architecture search methods and show that a
simple yet powerful evolutionary algorithm can discover new architectures with
excellent performance. Our approach combines a novel hierarchical genetic
representation scheme that imitates the modularized design pattern commonly
adopted by human experts, and an expressive search space that supports complex
topologies. Our algorithm efficiently discovers architectures that outperform a
large number of manually designed models for image classification, obtaining
top-1 error of 3.6% on CIFAR-10 and 20.3% when transferred to ImageNet, which
is competitive with the best existing neural architecture search approaches. We
also present results using random search, achieving 0.3% less top-1 accuracy on
CIFAR-10 and 0.1% less on ImageNet whilst reducing the search time from 36
hours down to 1 hour.Comment: Accepted as a conference paper at ICLR 201
FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review
Due to recent advances in digital technologies, and availability of credible
data, an area of artificial intelligence, deep learning, has emerged, and has
demonstrated its ability and effectiveness in solving complex learning problems
not possible before. In particular, convolution neural networks (CNNs) have
demonstrated their effectiveness in image detection and recognition
applications. However, they require intensive CPU operations and memory
bandwidth that make general CPUs fail to achieve desired performance levels.
Consequently, hardware accelerators that use application specific integrated
circuits (ASICs), field programmable gate arrays (FPGAs), and graphic
processing units (GPUs) have been employed to improve the throughput of CNNs.
More precisely, FPGAs have been recently adopted for accelerating the
implementation of deep learning networks due to their ability to maximize
parallelism as well as due to their energy efficiency. In this paper, we review
recent existing techniques for accelerating deep learning networks on FPGAs. We
highlight the key features employed by the various techniques for improving the
acceleration performance. In addition, we provide recommendations for enhancing
the utilization of FPGAs for CNNs acceleration. The techniques investigated in
this paper represent the recent trends in FPGA-based accelerators of deep
learning networks. Thus, this review is expected to direct the future advances
on efficient hardware accelerators and to be useful for deep learning
researchers.Comment: This article has been accepted for publication in IEEE Access
(December, 2018
Optimization Methods for Convolutional Sparse Coding
Sparse and convolutional constraints form a natural prior for many
optimization problems that arise from physical processes. Detecting motifs in
speech and musical passages, super-resolving images, compressing videos, and
reconstructing harmonic motions can all leverage redundancies introduced by
convolution. Solving problems involving sparse and convolutional constraints
remains a difficult computational problem, however. In this paper we present an
overview of convolutional sparse coding in a consistent framework. The
objective involves iteratively optimizing a convolutional least-squares term
for the basis functions, followed by an L1-regularized least squares term for
the sparse coefficients. We discuss a range of optimization methods for solving
the convolutional sparse coding objective, and the properties that make each
method suitable for different applications. In particular, we concentrate on
computational complexity, speed to {\epsilon} convergence, memory usage, and
the effect of implied boundary conditions. We present a broad suite of examples
covering different signal and application domains to illustrate the general
applicability of convolutional sparse coding, and the efficacy of the available
optimization methods
Primitive Fitting Using Deep Boundary Aware Geometric Segmentation
To identify and fit geometric primitives (e.g., planes, spheres, cylinders,
cones) in a noisy point cloud is a challenging yet beneficial task for fields
such as robotics and reverse engineering. As a multi-model multi-instance
fitting problem, it has been tackled with different approaches including
RANSAC, which however often fit inferior models in practice with noisy inputs
of cluttered scenes. Inspired by the corresponding human recognition process,
and benefiting from the recent advancements in image semantic segmentation
using deep neural networks, we propose BAGSFit as a new framework addressing
this problem. Firstly, through a fully convolutional neural network, the input
point cloud is point-wisely segmented into multiple classes divided by jointly
detected instance boundaries without any geometric fitting. Thus, segments can
serve as primitive hypotheses with a probability estimation of associating
primitive classes. Finally, all hypotheses are sent through a geometric
verification to correct any misclassification by fitting primitives
respectively. We performed training using simulated range images and tested it
with both simulated and real-world point clouds. Quantitative and qualitative
experiments demonstrated the superiority of BAGSFit
Energy-Efficient Hybrid Stochastic-Binary Neural Networks for Near-Sensor Computing
Recent advances in neural networks (NNs) exhibit unprecedented success at
transforming large, unstructured data streams into compact higher-level
semantic information for tasks such as handwriting recognition, image
classification, and speech recognition. Ideally, systems would employ
near-sensor computation to execute these tasks at sensor endpoints to maximize
data reduction and minimize data movement. However, near- sensor computing
presents its own set of challenges such as operating power constraints, energy
budgets, and communication bandwidth capacities. In this paper, we propose a
stochastic- binary hybrid design which splits the computation between the
stochastic and binary domains for near-sensor NN applications. In addition, our
design uses a new stochastic adder and multiplier that are significantly more
accurate than existing adders and multipliers. We also show that retraining the
binary portion of the NN computation can compensate for precision losses
introduced by shorter stochastic bit-streams, allowing faster run times at
minimal accuracy losses. Our evaluation shows that our hybrid stochastic-binary
design can achieve 9.8x energy efficiency savings, and application-level
accuracies within 0.05% compared to conventional all-binary designs.Comment: 6 pages, 3 figures, Design, Automata and Test in Europe (DATE) 201
Attention Augmented Convolutional Networks
Convolutional networks have been the paradigm of choice in many computer
vision applications. The convolution operation however has a significant
weakness in that it only operates on a local neighborhood, thus missing global
information. Self-attention, on the other hand, has emerged as a recent advance
to capture long range interactions, but has mostly been applied to sequence
modeling and generative modeling tasks. In this paper, we consider the use of
self-attention for discriminative visual tasks as an alternative to
convolutions. We introduce a novel two-dimensional relative self-attention
mechanism that proves competitive in replacing convolutions as a stand-alone
computational primitive for image classification. We find in control
experiments that the best results are obtained when combining both convolutions
and self-attention. We therefore propose to augment convolutional operators
with this self-attention mechanism by concatenating convolutional feature maps
with a set of feature maps produced via self-attention. Extensive experiments
show that Attention Augmentation leads to consistent improvements in image
classification on ImageNet and object detection on COCO across many different
models and scales, including ResNets and a state-of-the art mobile constrained
network, while keeping the number of parameters similar. In particular, our
method achieves a top-1 accuracy improvement on ImageNet classification
over a ResNet50 baseline and outperforms other attention mechanisms for images
such as Squeeze-and-Excitation. It also achieves an improvement of 1.4 mAP in
COCO Object Detection on top of a RetinaNet baseline.Comment: ICCV 201
Binarized Convolutional Neural Networks for Efficient Inference on GPUs
Convolutional neural networks have recently achieved significant
breakthroughs in various image classification tasks. However, they are
computationally expensive,which can make their feasible mplementation on
embedded and low-power devices difficult. In this paper convolutional neural
network binarization is implemented on GPU-based platforms for real-time
inference on resource constrained devices. In binarized networks, all weights
and intermediate computations between layers are quantized to +1 and -1,
allowing multiplications and additions to be replaced with bit-wise operations
between 32-bit words. This representation completely eliminates the need for
floating point multiplications and additions and decreases both the
computational load and the memory footprint compared to a full-precision
network implemented in floating point, making it well-suited for
resource-constrained environments. We compare the performance of our
implementation with an equivalent floating point implementation on one desktop
and two embedded GPU platforms. Our implementation achieves a maximum speed up
of 7. 4X with only 4.4% loss in accuracy compared to a reference
implementation.Comment: IEEE EUSIPCO 201
ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator
Binary Neural Networks enable smart IoT devices, as they significantly reduce
the required memory footprint and computational complexity while retaining a
high network performance and flexibility. This paper presents ChewBaccaNN, a
0.7 mm sized binary convolutional neural network (CNN) accelerator designed
in GlobalFoundries 22 nm technology. By exploiting efficient data re-use, data
buffering, latch-based memories, and voltage scaling, a throughput of 241 GOPS
is achieved while consuming just 1.1 mW at 0.4V/154MHz during inference of
binary CNNs with up to 7x7 kernels, leading to a peak core energy efficiency of
223 TOPS/W. ChewBaccaNN's flexibility allows to run a much wider range of
binary CNNs than other accelerators, drastically improving the accuracy-energy
trade-off beyond what can be captured by the TOPS/W metric. In fact, it can
perform CIFAR-10 inference at 86.8% accuracy with merely 1.3 , thus
exceeding the accuracy while at the same time lowering the energy cost by 2.8x
compared to even the most efficient and much larger analog processing-in-memory
devices, while keeping the flexibility of running larger CNNs for higher
accuracy when needed. It also runs a binary ResNet-18 trained on the 1000-class
ILSVRC dataset and improves the energy efficiency by 4.4x over accelerators of
similar flexibility. Furthermore, it can perform inference on a binarized
ResNet-18 trained with 8-bases Group-Net to achieve a 67.5% Top-1 accuracy with
only 3.0 mJ/frame -- at an accuracy drop of merely 1.8% from the full-precision
ResNet-18.Comment: Accepted at IEEE ISCAS 2021, Daegu, South Korea, 23-26 May 202
Privado: Practical and Secure DNN Inference with Enclaves
Cloud providers are extending support for trusted hardware primitives such as
Intel SGX. Simultaneously, the field of deep learning is seeing enormous
innovation as well as an increase in adoption. In this paper, we ask a timely
question: "Can third-party cloud services use Intel SGX enclaves to provide
practical, yet secure DNN Inference-as-a-service?" We first demonstrate that
DNN models executing inside enclaves are vulnerable to access pattern based
attacks. We show that by simply observing access patterns, an attacker can
classify encrypted inputs with 97% and 71% attack accuracy for MNIST and
CIFAR10 datasets on models trained to achieve 99% and 79% original accuracy
respectively. This motivates the need for PRIVADO, a system we have designed
for secure, easy-to-use, and performance efficient inference-as-a-service.
PRIVADO is input-oblivious: it transforms any deep learning framework that is
written in C/C++ to be free of input-dependent access patterns thus eliminating
the leakage. PRIVADO is fully-automated and has a low TCB: with zero developer
effort, given an ONNX description of a model, it generates compact and
enclave-compatible code which can be deployed on an SGX cloud platform. PRIVADO
incurs low performance overhead: we use PRIVADO with Torch framework and show
its overhead to be 17.18% on average on 11 different contemporary neural
networks.Comment: 13 pages, 5 figure
- …