343 research outputs found
Batch Size Influence on Performance of Graphic and Tensor Processing Units during Training and Inference Phases
The impact of the maximally possible batch size (for the better runtime) on
performance of graphic processing units (GPU) and tensor processing units (TPU)
during training and inference phases is investigated. The numerous runs of the
selected deep neural network (DNN) were performed on the standard MNIST and
Fashion-MNIST datasets. The significant speedup was obtained even for extremely
low-scale usage of Google TPUv2 units (8 cores only) in comparison to the quite
powerful GPU NVIDIA Tesla K80 card with the speedup up to 10x for training
stage (without taking into account the overheads) and speedup up to 2x for
prediction stage (with and without taking into account overheads). The precise
speedup values depend on the utilization level of TPUv2 units and increase with
the increase of the data volume under processing, but for the datasets used in
this work (MNIST and Fashion-MNIST with images of sizes 28x28) the speedup was
observed for batch sizes >512 images for training phase and >40 000 images for
prediction phase. It should be noted that these results were obtained without
detriment to the prediction accuracy and loss that were equal for both GPU and
TPU runs up to the 3rd significant digit for MNIST dataset, and up to the 2nd
significant digit for Fashion-MNIST dataset.Comment: 10 pages, 7 figures, 2 table
Analysing Edge Computing Devices for the Deployment of Embedded AI
In recent years, more and more devices are connected to the network, generating an overwhelming amount of data. This term that is booming today is known as the Internet of Things. In order to deal with these data close to the source, the term Edge Computing arises. The main objective is to address the limitations of cloud processing and satisfy the growing demand for applications and services that require low latency, greater efficiency and real-time response capabilities. Furthermore, it is essential to underscore the intrinsic connection between artificial intelligence and edge computing within the context of our study. This integral relationship not only addresses the challenges posed by data proliferation but also propels a transformative wave of innovation, shaping a new era of data processing capabilities at the network’s edge. Edge devices can perform real-time data analysis and make autonomous decisions without relying on constant connectivity to the cloud. This article aims at analysing and comparing Edge Computing devices when artificial intelligence algorithms are deployed on them. To this end, a detailed experiment involving various edge devices, models and metrics is conducted. In addition, we will observe how artificial intelligence accelerators such as Tensor Processing Unit behave. This analysis seeks to respond to the choice of a device that best suits the necessary AI requirements. As a summary, in general terms, the Jetson Nano provides the best performance when only CPU is used. Nevertheless the utilisation of a TPU drastically enhances the results.This work was partially financed by the Basque Government through their Elkartek program (SONETO project, ref. KK-2023/00038)
PerfSAGE: Generalized Inference Performance Predictor for Arbitrary Deep Learning Models on Edge Devices
The ability to accurately predict deep neural network (DNN) inference
performance metrics, such as latency, power, and memory footprint, for an
arbitrary DNN on a target hardware platform is essential to the design of DNN
based models. This ability is critical for the (manual or automatic) design,
optimization, and deployment of practical DNNs for a specific hardware
deployment platform. Unfortunately, these metrics are slow to evaluate using
simulators (where available) and typically require measurement on the target
hardware. This work describes PerfSAGE, a novel graph neural network (GNN) that
predicts inference latency, energy, and memory footprint on an arbitrary DNN
TFlite graph (TFL, 2017). In contrast, previously published performance
predictors can only predict latency and are restricted to pre-defined
construction rules or search spaces. This paper also describes the EdgeDLPerf
dataset of 134,912 DNNs randomly sampled from four task search spaces and
annotated with inference performance metrics from three edge hardware
platforms. Using this dataset, we train PerfSAGE and provide experimental
results that demonstrate state-of-the-art prediction accuracy with a Mean
Absolute Percentage Error of <5% across all targets and model search spaces.
These results: (1) Outperform previous state-of-art GNN-based predictors
(Dudziak et al., 2020), (2) Accurately predict performance on accelerators (a
shortfall of non-GNN-based predictors (Zhang et al., 2021)), and (3)
Demonstrate predictions on arbitrary input graphs without modifications to the
feature extractor
Benchmarking GPUs on SVBRDF Extractor Model
With the maturity of deep learning, its use is emerging in every field. Also,
as different types of GPUs are becoming more available in the markets, it
creates a difficult decision for users. How can users select GPUs to achieve
optimal performance for a specific task? Analysis of GPU architecture is well
studied, but existing works that benchmark GPUs do not study tasks for networks
with significantly larger input. In this work, we tried to differentiate the
performance of different GPUs on neural network models that operate on bigger
input images (256x256)
- …