Search CORE

279,483 research outputs found

Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs

Author: Benini Luca
de Prado Miguel
Denna Maurizio
Mundy Andrew
Pazos Nuria
Saeed Rabia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/12/2020
Field of study

The spread of deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN). Works have mainly focused on: i) efficient DNN architectures, ii) network optimisation techniques such as pruning and quantisation, iii) optimised algorithms to speed up the execution of the most computational intensive layers and, iv) dedicated hardware to accelerate the data flow and computation. However, there is a lack of research on cross-level optimisation as the space of approaches becomes too large to test and obtain a globally optimised solution. Thus, leading to suboptimal deployment in terms of latency, accuracy, and memory. In this work, we first detail and analyse the methods to improve the deployment of DNNs across the different levels of software optimisation. Building on this knowledge, we present an automated exploration framework to ease the deployment of DNNs. The framework relies on a Reinforcement Learning search that, combined with a deep learning inference framework, automatically explores the design space and learns an optimised solution that speeds up the performance and reduces the memory on embedded CPU platforms. Thus, we present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to 4x improvement in performance and over 2x reduction in memory with negligible loss in accuracy with respect to the BLAS floating-point implementation

arXiv.org e-Print Archive

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

Recommended from our members

Computing infrastructure issues in distributed communications systems : a survey of operating system transport system architectures

Author: Schmidt Douglas C.
Suda Tatsuya
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

The performance of distributed applications (such as file transfer, remote login, tele-conferencing, full-motion video, and scientific visualization) is influenced by several factors that interact in complex ways. In particular, application performance is significantly affected both by communication infrastructure factors and computing infrastructure factors. Several communication infrastructure factors include channel speed, bit-error rate, and congestion at intermediate switching nodes. Computing infrastructure factors include (among other things) both protocol processing activities (such as connection management, flow control, error detection, and retransmission) and general operating system factors (such as memory latency, CPU speed, interrupt and context switching overhead, process architecture, and message buffering). Due to a several orders of magnitude increase in network channel speed and an increase in application diversity, performance bottlenecks are shifting from the network factors to the transport system factors.This paper defines an abstraction called an "Operating System Transport System Architecture" (OSTSA) that is used to classify the major components and services in the computing infrastructure. End-to-end network protocols such as TCP, TP4, VMTP, XTP, and Delta-t typically run on general-purpose computers, where they utilize various operating system resources such as processors, virtual memory, and network controllers. The OSTSA provides services that integrate these resources to support distributed applications running on local and wide area networks.A taxonomy is presented to evaluate OSTSAs in terms of their support for protocol processing activities. We use this taxonomy to compare and contrast five general-purpose commercial and experimental operating systems including System V UNIX, BSD UNIX, the x-kernel, Choices, and Xinu

eScholarship - University of California

Task-based adaptive multiresolution for time-space multi-scale reaction-diffusion systems on multi-core architectures

Author: Descombes Stéphane
Duarte Max
Dumont Thierry
Guillet Thomas
Louvet Violaine
Massot Marc
Publication venue: 'Cellule MathDoc/CEDRAM'
Publication date: 14/10/2016
Field of study

A new solver featuring time-space adaptation and error control has been recently introduced to tackle the numerical solution of stiff reaction-diffusion systems. Based on operator splitting, finite volume adaptive multiresolution and high order time integrators with specific stability properties for each operator, this strategy yields high computational efficiency for large multidimensional computations on standard architectures such as powerful workstations. However, the data structure of the original implementation, based on trees of pointers, provides limited opportunities for efficiency enhancements, while posing serious challenges in terms of parallel programming and load balancing. The present contribution proposes a new implementation of the whole set of numerical methods including Radau5 and ROCK4, relying on a fully different data structure together with the use of a specific library, TBB, for shared-memory, task-based parallelism with work-stealing. The performance of our implementation is assessed in a series of test-cases of increasing difficulty in two and three dimensions on multi-core and many-core architectures, demonstrating high scalability

arXiv.org e-Print Archive

HAL-CentraleSupelec

HAL-UJM

The SMAI journal of computational mathematics

Numérisation de Documents Anciens Mathématiques

Hal-Diderot

HAL-Polytechnique

HAL-Rennes 1

Full-wave 3D simulations using the broadband NSPWMLFMA

Author: Bogaert Ignace
Fostier Jan
Olyslager Femke
Peeters Joris
Publication venue
Publication date: 01/01/2008
Field of study

Ghent University Academic Bibliography

An open and parallel multiresolution framework using block-based adaptive grids

Author: A Brandt
A Harten
A Harten
C Bogey
D Rossinelli
F Bramkamp
I Gargantini
J Reiss
K Schneider
M Holmström
MJ Berger
MO Domingues
MO Domingues
O Roussel
R Deiterding
R Maulik
T Engels
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/01/2019
Field of study

A numerical approach for solving evolutionary partial differential equations in two and three space dimensions on block-based adaptive grids is presented. The numerical discretization is based on high-order, central finite-differences and explicit time integration. Grid refinement and coarsening are triggered by multiresolution analysis, i.e. thresholding of wavelet coefficients, which allow controlling the precision of the adaptive approximation of the solution with respect to uniform grid computations. The implementation of the scheme is fully parallel using MPI with a hybrid data structure. Load balancing relies on space filling curves techniques. Validation tests for 2D advection equations allow to assess the precision and performance of the developed code. Computations of the compressible Navier-Stokes equations for a temporally developing 2D mixing layer illustrate the properties of the code for nonlinear multi-scale problems. The code is open source

arXiv.org e-Print Archive

Crossref

HAL AMU