20,838 research outputs found
On human motion prediction using recurrent neural networks
Human motion modelling is a classical problem at the intersection of graphics
and computer vision, with applications spanning human-computer interaction,
motion synthesis, and motion prediction for virtual and augmented reality.
Following the success of deep learning methods in several computer vision
tasks, recent work has focused on using deep recurrent neural networks (RNNs)
to model human motion, with the goal of learning time-dependent representations
that perform tasks such as short-term motion prediction and long-term human
motion synthesis. We examine recent work, with a focus on the evaluation
methodologies commonly used in the literature, and show that, surprisingly,
state-of-the-art performance can be achieved by a simple baseline that does not
attempt to model motion at all. We investigate this result, and analyze recent
RNN methods by looking at the architectures, loss functions, and training
procedures used in state-of-the-art approaches. We propose three changes to the
standard RNN models typically used for human motion, which result in a simple
and scalable RNN architecture that obtains state-of-the-art performance on
human motion prediction.Comment: Accepted at CVPR 1
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators
We show that DNN accelerator micro-architectures and their program mappings
represent specific choices of loop order and hardware parallelism for computing
the seven nested loops of DNNs, which enables us to create a formal taxonomy of
all existing dense DNN accelerators. Surprisingly, the loop transformations
needed to create these hardware variants can be precisely and concisely
represented by Halide's scheduling language. By modifying the Halide compiler
to generate hardware, we create a system that can fairly compare these prior
accelerators. As long as proper loop blocking schemes are used, and the
hardware can support mapping replicated loops, many different hardware
dataflows yield similar energy efficiency with good performance. This is
because the loop blocking can ensure that most data references stay on-chip
with good locality and the processing units have high resource utilization. How
resources are allocated, especially in the memory system, has a large impact on
energy and performance. By optimizing hardware resource allocation while
keeping throughput constant, we achieve up to 4.2X energy improvement for
Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long
Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively.Comment: Published as a conference paper at ASPLOS 202
- …