Search CORE

343 research outputs found

Batch Size Influence on Performance of Graphic and Tensor Processing Units during Training and Inference Phases

Author: Alienin Oleg
Gordienko Nikita
Gordienko Yuri
Kochura Yuriy
Rokovyi Alexandr
Stirenko Sergii
Taran Vlad
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/12/2018
Field of study

The impact of the maximally possible batch size (for the better runtime) on performance of graphic processing units (GPU) and tensor processing units (TPU) during training and inference phases is investigated. The numerous runs of the selected deep neural network (DNN) were performed on the standard MNIST and Fashion-MNIST datasets. The significant speedup was obtained even for extremely low-scale usage of Google TPUv2 units (8 cores only) in comparison to the quite powerful GPU NVIDIA Tesla K80 card with the speedup up to 10x for training stage (without taking into account the overheads) and speedup up to 2x for prediction stage (with and without taking into account overheads). The precise speedup values depend on the utilization level of TPUv2 units and increase with the increase of the data volume under processing, but for the datasets used in this work (MNIST and Fashion-MNIST with images of sizes 28x28) the speedup was observed for batch sizes >512 images for training phase and >40 000 images for prediction phase. It should be noted that these results were obtained without detriment to the prediction accuracy and loss that were equal for both GPU and TPU runs up to the 3rd significant digit for MNIST dataset, and up to the 2nd significant digit for Fashion-MNIST dataset.Comment: 10 pages, 7 figures, 2 table

arXiv.org e-Print Archive

Crossref

Analysing Edge Computing Devices for the Deployment of Embedded AI

Author: García Pérez Asier
Miñón Raúl
Torre-Bastida Ana I
Zulueta Guerrero Ekaitz
Publication venue: MDPI
Publication date: 08/12/2023
Field of study

In recent years, more and more devices are connected to the network, generating an overwhelming amount of data. This term that is booming today is known as the Internet of Things. In order to deal with these data close to the source, the term Edge Computing arises. The main objective is to address the limitations of cloud processing and satisfy the growing demand for applications and services that require low latency, greater efficiency and real-time response capabilities. Furthermore, it is essential to underscore the intrinsic connection between artificial intelligence and edge computing within the context of our study. This integral relationship not only addresses the challenges posed by data proliferation but also propels a transformative wave of innovation, shaping a new era of data processing capabilities at the network’s edge. Edge devices can perform real-time data analysis and make autonomous decisions without relying on constant connectivity to the cloud. This article aims at analysing and comparing Edge Computing devices when artificial intelligence algorithms are deployed on them. To this end, a detailed experiment involving various edge devices, models and metrics is conducted. In addition, we will observe how artificial intelligence accelerators such as Tensor Processing Unit behave. This analysis seeks to respond to the choice of a device that best suits the necessary AI requirements. As a summary, in general terms, the Jetson Nano provides the best performance when only CPU is used. Nevertheless the utilisation of a TPU drastically enhances the results.This work was partially financed by the Basque Government through their Elkartek program (SONETO project, ref. KK-2023/00038)

Archivo Digital para la Docencia y la Investigación

PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices

Author: Brooks David
Chai Yuji
Fedorov Igor
Gope Dibakar
Matas Ramon
Tripathy Devashree
Wei Gu-Yeon
Whatmough Paul
Zhou Chuteng
Publication venue
Publication date: 26/01/2023
Field of study

The ability to accurately predict deep neural network (DNN) inference performance metrics, such as latency, power, and memory footprint, for an arbitrary DNN on a target hardware platform is essential to the design of DNN based models. This ability is critical for the (manual or automatic) design, optimization, and deployment of practical DNNs for a specific hardware deployment platform. Unfortunately, these metrics are slow to evaluate using simulators (where available) and typically require measurement on the target hardware. This work describes PerfSAGE, a novel graph neural network (GNN) that predicts inference latency, energy, and memory footprint on an arbitrary DNN TFlite graph (TFL, 2017). In contrast, previously published performance predictors can only predict latency and are restricted to pre-defined construction rules or search spaces. This paper also describes the EdgeDLPerf dataset of 134,912 DNNs randomly sampled from four task search spaces and annotated with inference performance metrics from three edge hardware platforms. Using this dataset, we train PerfSAGE and provide experimental results that demonstrate state-of-the-art prediction accuracy with a Mean Absolute Percentage Error of <5% across all targets and model search spaces. These results: (1) Outperform previous state-of-art GNN-based predictors (Dudziak et al., 2020), (2) Accurately predict performance on accelerators (a shortfall of non-GNN-based predictors (Zhang et al., 2021)), and (3) Demonstrate predictions on arbitrary input graphs without modifications to the feature extractor

arXiv.org e-Print Archive

Benchmarking GPUs on SVBRDF Extractor Model

Author: Kandel Narayan
Lambert Melanie
Publication venue
Publication date: 19/10/2023
Field of study

With the maturity of deep learning, its use is emerging in every field. Also, as different types of GPUs are becoming more available in the markets, it creates a difficult decision for users. How can users select GPUs to achieve optimal performance for a specific task? Analysis of GPU architecture is well studied, but existing works that benchmark GPUs do not study tasks for networks with significantly larger input. In this work, we tried to differentiate the performance of different GPUs on neural network models that operate on bigger input images (256x256)

arXiv.org e-Print Archive