1,797 research outputs found
3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks
Human activity understanding with 3D/depth sensors has received increasing
attention in multimedia processing and interactions. This work targets on
developing a novel deep model for automatic activity recognition from RGB-D
videos. We represent each human activity as an ensemble of cubic-like video
segments, and learn to discover the temporal structures for a category of
activities, i.e. how the activities to be decomposed in terms of
classification. Our model can be regarded as a structured deep architecture, as
it extends the convolutional neural networks (CNNs) by incorporating structure
alternatives. Specifically, we build the network consisting of 3D convolutions
and max-pooling operators over the video segments, and introduce the latent
variables in each convolutional layer manipulating the activation of neurons.
Our model thus advances existing approaches in two aspects: (i) it acts
directly on the raw inputs (grayscale-depth data) to conduct recognition
instead of relying on hand-crafted features, and (ii) the model structure can
be dynamically adjusted accounting for the temporal variations of human
activities, i.e. the network configuration is allowed to be partially activated
during inference. For model training, we propose an EM-type optimization method
that iteratively (i) discovers the latent structure by determining the
decomposed actions for each training example, and (ii) learns the network
parameters by using the back-propagation algorithm. Our approach is validated
in challenging scenarios, and outperforms state-of-the-art methods. A large
human activity database of RGB-D videos is presented in addition.Comment: This manuscript has 10 pages with 9 figures, and a preliminary
version was published in ACM MM'14 conferenc
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators
We show that DNN accelerator micro-architectures and their program mappings
represent specific choices of loop order and hardware parallelism for computing
the seven nested loops of DNNs, which enables us to create a formal taxonomy of
all existing dense DNN accelerators. Surprisingly, the loop transformations
needed to create these hardware variants can be precisely and concisely
represented by Halide's scheduling language. By modifying the Halide compiler
to generate hardware, we create a system that can fairly compare these prior
accelerators. As long as proper loop blocking schemes are used, and the
hardware can support mapping replicated loops, many different hardware
dataflows yield similar energy efficiency with good performance. This is
because the loop blocking can ensure that most data references stay on-chip
with good locality and the processing units have high resource utilization. How
resources are allocated, especially in the memory system, has a large impact on
energy and performance. By optimizing hardware resource allocation while
keeping throughput constant, we achieve up to 4.2X energy improvement for
Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long
Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively.Comment: Published as a conference paper at ASPLOS 202
- …