Search CORE

61,233 research outputs found

Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

Author: Cosco Francesco
Dendaluce Jahnke Martin
Gomez-Garay Vicente
Novickis Rihards
Pérez Rastelli Joshué
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking received support from the European Union’s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy, Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL Joint Undertaking under grant agreement No. 692455-2

Directory of Open Access Journals

Exascale Deep Learning for Climate Analytics

Author: Deslippe Jack
Fatica Massimiliano
Houston Michael
Kurth Thorsten
Luehr Nathan
Mahesh Ankur
Matheson Michael
Mudigonda Mayur
Phillips Everett
Prabhat
Romero Joshua
Treichler Sean
Publication venue
Publication date: 03/10/2018
Field of study

We extract pixel-level masks of extreme weather patterns using variants of Tiramisu and DeepLabv3+ neural networks. We describe improvements to the software frameworks, input pipeline, and the network training algorithms necessary to efficiently scale deep learning on the Piz Daint and Summit systems. The Tiramisu network scales to 5300 P100 GPUs with a sustained throughput of 21.0 PF/s and parallel efficiency of 79.0%. DeepLabv3+ scales up to 27360 V100 GPUs with a sustained throughput of 325.8 PF/s and a parallel efficiency of 90.7% in single precision. By taking advantage of the FP16 Tensor Cores, a half-precision version of the DeepLabv3+ network achieves a peak and sustained throughput of 1.13 EF/s and 999.0 PF/s respectively.Comment: 12 pages, 5 tables, 4, figures, Super Computing Conference November 11-16, 2018, Dallas, TX, US

arXiv.org e-Print Archive

eScholarship - University of California

Spherical harmonic transform with GPUs

Author: Falcou Joel
Grigori Laura
Hupca Ioan O.
Stompor Radek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/10/2010
Field of study

We describe an algorithm for computing an inverse spherical harmonic transform suitable for graphic processing units (GPU). We use CUDA and base our implementation on a Fortran90 routine included in a publicly available parallel package, S2HAT. We focus our attention on the two major sequential steps involved in the transforms computation, retaining the efficient parallel framework of the original code. We detail optimization techniques used to enhance the performance of the CUDA-based code and contrast them with those implemented in the Fortran90 version. We also present performance comparisons of a single CPU plus GPU unit with the S2HAT code running on either a single or 4 processors. In particular we find that use of the latest generation of GPUs, such as NVIDIA GF100 (Fermi), can accelerate the spherical harmonic transforms by as much as 18 times with respect to S2HAT executed on one core, and by as much as 5.5 with respect to S2HAT on 4 cores, with the overall performance being limited by the Fast Fourier transforms. The work presented here has been performed in the context of the Cosmic Microwave Background simulations and analysis. However, we expect that the developed software will be of more general interest and applicability

arXiv.org e-Print Archive

HAL-CentraleSupelec

HAL - Lille 3

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Prochlo: Strong Privacy for Analytics in the Crowd

Author: Abadi M.
Abadi M.
Abadi M.
Avent B.
Bellare M.
Bulck J. V.
Buse R. P. L.
Chen R.
Corrigan-Gibbs H.
Dang H.
Denning D. E. R.
Dinh T. T. A.
Dwork
Lee S.
Maniatis P.
Ohrimenko O.
Ravindranath L.
Roy I.
Saltzer J. H.
Viega J.
Wang T.
Warner
Zheng W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/10/2017
Field of study

The large-scale monitoring of computer users' software activities has become commonplace, e.g., for application telemetry, error reporting, or demographic profiling. This paper describes a principled systems architecture---Encode, Shuffle, Analyze (ESA)---for performing such monitoring with high utility while also protecting user privacy. The ESA design, and its Prochlo implementation, are informed by our practical experiences with an existing, large deployment of privacy-preserving software monitoring. (cont.; see the paper

arXiv.org e-Print Archive

University of Toronto Research Repository