Search CORE

20,838 research outputs found

On human motion prediction using recurrent neural networks

Author: Black Michael J.
Martinez Julieta
Romero Javier
Publication venue
Publication date: 06/05/2017
Field of study

Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality. Following the success of deep learning methods in several computer vision tasks, recent work has focused on using deep recurrent neural networks (RNNs) to model human motion, with the goal of learning time-dependent representations that perform tasks such as short-term motion prediction and long-term human motion synthesis. We examine recent work, with a focus on the evaluation methodologies commonly used in the literature, and show that, surprisingly, state-of-the-art performance can be achieved by a simple baseline that does not attempt to model motion at all. We investigate this result, and analyze recent RNN methods by looking at the architectures, loss functions, and training procedures used in state-of-the-art approaches. We propose three changes to the standard RNN models typically used for human motion, which result in a simple and scalable RNN architecture that obtains state-of-the-art performance on human motion prediction.Comment: Accepted at CVPR 1

arXiv.org e-Print Archive

Crossref

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Author: Bouganis Christos-Savvas
Kouris Alexandros
Venieris Stylianos I.
Publication venue
Publication date: 19/02/2018
Field of study

In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

Author: Bell Steven Emberton
Cao Kaidi
Gao Mingyu
Ha Heonjae
Horowitz Mark
Kozyrakis Christos
Liu Qiaoyi
Nayak Ankita
Pu Jing
Raina Priyanka
Setter Jeff Ou
Yang Xuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/04/2020
Field of study

We show that DNN accelerator micro-architectures and their program mappings represent specific choices of loop order and hardware parallelism for computing the seven nested loops of DNNs, which enables us to create a formal taxonomy of all existing dense DNN accelerators. Surprisingly, the loop transformations needed to create these hardware variants can be precisely and concisely represented by Halide's scheduling language. By modifying the Halide compiler to generate hardware, we create a system that can fairly compare these prior accelerators. As long as proper loop blocking schemes are used, and the hardware can support mapping replicated loops, many different hardware dataflows yield similar energy efficiency with good performance. This is because the loop blocking can ensure that most data references stay on-chip with good locality and the processing units have high resource utilization. How resources are allocated, especially in the memory system, has a large impact on energy and performance. By optimizing hardware resource allocation while keeping throughput constant, we achieve up to 4.2X energy improvement for Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively.Comment: Published as a conference paper at ASPLOS 202

arXiv.org e-Print Archive

Crossref

Measurements, Models, Systems and Design

Author: Adamski M. Węgrzyn, M. Węgrzyn, A.
Barkalov A. Titarenko, L.
Benysek G. Jarnut, M. Rusiński, J.
Fedyczak Z. Szcześniak, P. Kaniewski, J.
Furmankiewicz L. Kozioł, M. Kłosiński, R.
Gałkowski K. Paszke, W. Sulikowski, B.
Gielerak R. Kuriata, E. Sawerwain, M. Pawłowski, K.
Kempski A. Smoleński, R. Kot, E.
Korbicz J. Witczak, M. Patan, K. Janczak, A. Mrugalski, M.
Korotyeyev I. Kasperek, R.
Michta E. Markowski, A.
Miczulski W. szulim, R.
Nikiel S. Steć, P.
Obuchowicz A. Pieczyński, A. Kowal, M. Prętki, P
Olencki A. Szmytkiewicz, J. Urbański, K.
Popławski A. Zając, W.
Rybski R. Kaczmarek J. Lal-Jadziak, J.
Uciński D. Patan, M. Kuczewski, B.
Publication venue: Wydawnictwa Komunikacji i Łączności, Warszawa
Publication date: 01/01/2007
Field of study

531 s.

Zielonogórska Biblioteka Cyfrowa (Digital Library of Zielona Gora)