11,031 research outputs found
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
Deep neural networks have evolved remarkably over the past few years and they
are currently the fundamental tools of many intelligent systems. At the same
time, the computational complexity and resource consumption of these networks
also continue to increase. This will pose a significant challenge to the
deployment of such networks, especially in real-time applications or on
resource-limited devices. Thus, network acceleration has become a hot topic
within the deep learning community. As for hardware implementation of deep
neural networks, a batch of accelerators based on FPGA/ASIC have been proposed
in recent years. In this paper, we provide a comprehensive survey of recent
advances in network acceleration, compression and accelerator design from both
algorithm and hardware points of view. Specifically, we provide a thorough
analysis of each of the following topics: network pruning, low-rank
approximation, network quantization, teacher-student networks, compact network
design and hardware accelerators. Finally, we will introduce and discuss a few
possible future directions.Comment: 14 pages, 3 figure
Event-triggered Natural Hazard Monitoring with Convolutional Neural Networks on the Edge
In natural hazard warning systems fast decision making is vital to avoid
catastrophes. Decision making at the edge of a wireless sensor network promises
fast response times but is limited by the availability of energy, data transfer
speed, processing and memory constraints. In this work we present a realization
of a wireless sensor network for hazard monitoring based on an array of
event-triggered single-channel micro-seismic sensors with advanced signal
processing and characterization capabilities based on a novel co-detection
technique. On the one hand we leverage an ultra-low power, threshold-triggering
circuit paired with on-demand digital signal acquisition capable of extracting
relevant information exactly and efficiently at times when it matters most and
consequentially not wasting precious resources when nothing can be observed. On
the other hand we utilize machine-learning-based classification implemented on
low-power, off-the-shelf microcontrollers to avoid false positive warnings and
to actively identify humans in hazard zones. The sensors' response time and
memory requirement is substantially improved by quantizing and pipelining the
inference of a convolutional neural network. In this way, convolutional neural
networks that would not run unmodified on a memory constrained device can be
executed in real-time and at scale on low-power embedded devices. A field study
with our system is running on the rockfall scarp of the Matterhorn H\"ornligrat
at 3500 m a.s.l. since 08/2018
FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
Convolutional Neural Networks have rapidly become the most successful machine
learning algorithm, enabling ubiquitous machine vision and intelligent
decisions on even embedded computing-systems. While the underlying arithmetic
is structurally simple, compute and memory requirements are challenging. One of
the promising opportunities is leveraging reduced-precision representations for
inputs, activations and model parameters. The resulting scalability in
performance, power efficiency and storage footprint provides interesting design
compromises in exchange for a small reduction in accuracy. FPGAs are ideal for
exploiting low-precision inference engines leveraging custom precisions to
achieve the required numerical accuracy for a given application. In this
article, we describe the second generation of the FINN framework, an end-to-end
tool which enables design space exploration and automates the creation of fully
customized inference engines on FPGAs. Given a neural network description, the
tool optimizes for given platforms, design targets and a specific precision. We
introduce formalizations of resource cost functions and performance
predictions, and elaborate on the optimization algorithms. Finally, we evaluate
a selection of reduced precision neural networks ranging from CIFAR-10
classifiers to YOLO-based object detection on a range of platforms including
PYNQ and AWS\,F1, demonstrating new unprecedented measured throughput at
50TOp/s on AWS-F1 and 5TOp/s on embedded devices.Comment: to be published in ACM TRETS Special Edition on Deep Learnin
A deep learning based solution for construction equipment detection: from development to deployment
This paper aims at providing researchers and engineering professionals with a
practical and comprehensive deep learning based solution to detect construction
equipment from the very first step of its development to the last one which is
deployment. This paper focuses on the last step of deployment. The first phase
of solution development, involved data preparation, model selection, model
training, and model evaluation. The second phase of the study comprises of
model optimization, application specific embedded system selection, and
economic analysis. Several embedded systems were proposed and compared. The
review of the results confirms superior real-time performance of the solutions
with a consistent above 90% rate of accuracy. The current study validates the
practicality of deep learning based object detection solutions for construction
scenarios. Moreover, the detailed knowledge, presented in this study, can be
employed for several purposes such as, safety monitoring, productivity
assessments, and managerial decisions.Comment: 17 pages, 16 figures, 6 table
NetScore: Towards Universal Metrics for Large-scale Performance Analysis of Deep Neural Networks for Practical On-Device Edge Usage
Much of the focus in the design of deep neural networks has been on improving
accuracy, leading to more powerful yet highly complex network architectures
that are difficult to deploy in practical scenarios, particularly on edge
devices such as mobile and other consumer devices given their high
computational and memory requirements. As a result, there has been a recent
interest in the design of quantitative metrics for evaluating deep neural
networks that accounts for more than just model accuracy as the sole indicator
of network performance. In this study, we continue the conversation towards
universal metrics for evaluating the performance of deep neural networks for
practical on-device edge usage. In particular, we propose a new balanced metric
called NetScore, which is designed specifically to provide a quantitative
assessment of the balance between accuracy, computational complexity, and
network architecture complexity of a deep neural network, which is important
for on-device edge operation. In what is one of the largest comparative
analysis between deep neural networks in literature, the NetScore metric, the
top-1 accuracy metric, and the popular information density metric were compared
across a diverse set of 60 different deep convolutional neural networks for
image classification on the ImageNet Large Scale Visual Recognition Challenge
(ILSVRC 2012) dataset. The evaluation results across these three metrics for
this diverse set of networks are presented in this study to act as a reference
guide for practitioners in the field. The proposed NetScore metric, along with
the other tested metrics, are by no means perfect, but the hope is to push the
conversation towards better universal metrics for evaluating deep neural
networks for use in practical on-device edge scenarios to help guide
practitioners in model design for such scenarios.Comment: 9 page
Adversarial Generation of Training Examples: Applications to Moving Vehicle License Plate Recognition
Generative Adversarial Networks (GAN) have attracted much research attention
recently, leading to impressive results for natural image generation. However,
to date little success was observed in using GAN generated images for improving
classification tasks. Here we attempt to explore, in the context of car license
plate recognition, whether it is possible to generate synthetic training data
using GAN to improve recognition accuracy. With a carefully-designed pipeline,
we show that the answer is affirmative. First, a large-scale image set is
generated using the generator of GAN, without manual annotation. Then, these
images are fed to a deep convolutional neural network (DCNN) followed by a
bidirectional recurrent neural network (BRNN) with long short-term memory
(LSTM), which performs the feature learning and sequence labelling. Finally,
the pre-trained model is fine-tuned on real images. Our experimental results on
a few data sets demonstrate the effectiveness of using GAN images: an
improvement of 7.5% over a strong baseline with moderate-sized real data being
available. We show that the proposed framework achieves competitive recognition
accuracy on challenging test datasets. We also leverage the depthwise separate
convolution to construct a lightweight convolutional RNN, which is about half
size and 2x faster on CPU. Combining this framework and the proposed pipeline,
we make progress in performing accurate recognition on mobile and embedded
devices
Digital Passport: A Novel Technological Strategy for Intellectual Property Protection of Convolutional Neural Networks
In order to prevent deep neural networks from being infringed by unauthorized
parties, we propose a generic solution which embeds a designated digital
passport into a network, and subsequently, either paralyzes the network
functionalities for unauthorized usages or maintain its functionalities in the
presence of a verified passport. Such a desired network behavior is
successfully demonstrated in a number of implementation schemes, which provide
reliable, preventive and timely protections against tens of thousands of
fake-passport deceptions. Extensive experiments also show that the deep neural
network performance under unauthorized usages deteriorate significantly (e.g.
with 33% to 82% reductions of CIFAR10 classification accuracies), while
networks endorsed with valid passports remain intact.Comment: This paper proposes a new timely IPR solution that embed digital
passports into CNN models to prevent the unauthorized network usage (i.e.
infringement) by paralyzing the networks while maintaining its functionality
for verified user
Compact Generalized Non-local Network
The non-local module is designed for capturing long-range spatio-temporal
dependencies in images and videos. Although having shown excellent performance,
it lacks the mechanism to model the interactions between positions across
channels, which are of vital importance in recognizing fine-grained objects and
actions. To address this limitation, we generalize the non-local module and
take the correlations between the positions of any two channels into account.
This extension utilizes the compact representation for multiple kernel
functions with Taylor expansion that makes the generalized non-local module in
a fast and low-complexity computation flow. Moreover, we implement our
generalized non-local method within channel groups to ease the optimization.
Experimental results illustrate the clear-cut improvements and practical
applicability of the generalized non-local module on both fine-grained object
recognition and video classification. Code is available at:
https://github.com/KaiyuYue/cgnl-network.pytorch.Comment: Technical report; To appear at NIPS 2018; Code is available at
https://github.com/KaiyuYue/cgnl-network.pytorc
Learning a Repression Network for Precise Vehicle Search
The growing explosion in the use of surveillance cameras in public security
highlights the importance of vehicle search from large-scale image databases.
Precise vehicle search, aiming at finding out all instances for a given query
vehicle image, is a challenging task as different vehicles will look very
similar to each other if they share same visual attributes. To address this
problem, we propose the Repression Network (RepNet), a novel multi-task
learning framework, to learn discriminative features for each vehicle image
from both coarse-grained and detailed level simultaneously. Besides, benefited
from the satisfactory accuracy of attribute classification, a bucket search
method is proposed to reduce the retrieval time while still maintaining
competitive performance. We conduct extensive experiments on the revised
VehcileID dataset. Experimental results show that our RepNet achieves the
state-of-the-art performance and the bucket search method can reduce the
retrieval time by about 24 times
Distributed Machine Learning in Materials that Couple Sensing, Actuation, Computation and Communication
This paper reviews machine learning applications and approaches to detection,
classification and control of intelligent materials and structures with
embedded distributed computation elements. The purpose of this survey is to
identify desired tasks to be performed in each type of material or structure
(e.g., damage detection in composites), identify and compare common approaches
to learning such tasks, and investigate models and training paradigms used.
Machine learning approaches and common temporal features used in the domains of
structural health monitoring, morphable aircraft, wearable computing and
robotic skins are explored. As the ultimate goal of this research is to
incorporate the approaches described in this survey into a robotic material
paradigm, the potential for adapting the computational models used in these
applications, and corresponding training algorithms, to an amorphous network of
computing nodes is considered. Distributed versions of support vector machines,
graphical models and mixture models developed in the field of wireless sensor
networks are reviewed. Potential areas of investigation, including possible
architectures for incorporating machine learning into robotic nodes, training
approaches, and the possibility of using deep learning approaches for automatic
feature extraction, are discussed
- …