20 research outputs found
DLAS: An Exploration and Assessment of the Deep Learning Acceleration Stack
Deep Neural Networks (DNNs) are extremely computationally demanding, which
presents a large barrier to their deployment on resource-constrained devices.
Since such devices are where many emerging deep learning applications lie
(e.g., drones, vision-based medical technology), significant bodies of work
from both the machine learning and systems communities have attempted to
provide optimizations to accelerate DNNs. To help unify these two perspectives,
in this paper we combine machine learning and systems techniques within the
Deep Learning Acceleration Stack (DLAS), and demonstrate how these layers can
be tightly dependent on each other with an across-stack perturbation study. We
evaluate the impact on accuracy and inference time when varying different
parameters of DLAS across two datasets, seven popular DNN architectures, four
DNN compression techniques, three algorithmic primitives with sparse and dense
variants, untuned and auto-scheduled code generation, and four hardware
platforms. Our evaluation highlights how perturbations across DLAS parameters
can cause significant variation and across-stack interactions. The highest
level observation from our evaluation is that the model size, accuracy, and
inference time are not guaranteed to be correlated. Overall we make 13 key
observations, including that speedups provided by compression techniques are
very hardware dependent, and that compiler auto-tuning can significantly alter
what the best algorithm to use for a given configuration is. With DLAS, we aim
to provide a reference framework to aid machine learning and systems
practitioners in reasoning about the context in which their respective DNN
acceleration solutions exist in. With our evaluation strongly motivating the
need for co-design, we believe that DLAS can be a valuable concept for
exploring the next generation of co-designed accelerated deep learning
solutions
Exploring Effects of Computational Parameter Changes to Image Recognition Systems
Image recognition tasks typically use deep learning and require enormous
processing power, thus relying on hardware accelerators like GPUs and FPGAs for
fast, timely processing. Failure in real-time image recognition tasks can occur
due to incorrect mapping on hardware accelerators, which may lead to timing
uncertainty and incorrect behavior. Owing to the increased use of image
recognition tasks in safety-critical applications like autonomous driving and
medical imaging, it is imperative to assess their robustness to changes in the
computational environment as parameters like deep learning frameworks, compiler
optimizations for code generation, and hardware devices are not regulated with
varying impact on model performance and correctness. In this paper we conduct
robustness analysis of four popular image recognition models (MobileNetV2,
ResNet101V2, DenseNet121 and InceptionV3) with the ImageNet dataset, assessing
the impact of the following parameters in the model's computational
environment: (1) deep learning frameworks; (2) compiler optimizations; and (3)
hardware devices. We report sensitivity of model performance in terms of output
label and inference time for changes in each of these environment parameters.
We find that output label predictions for all four models are sensitive to
choice of deep learning framework (by up to 57%) and insensitive to other
parameters. On the other hand, model inference time was affected by all
environment parameters with changes in hardware device having the most effect.
The extent of effect was not uniform across models.Comment: 9 pages, 8 figures, 1 tabl
Compiler-centric across-stack deep learning acceleration
Optimizing the deployment of Deep Neural Networks (DNNs) is hard. Despite deep learning approaches increasingly providing state-of-the-art solutions to a variety of difficult problems, such as computer vision and natural language processing, DNNs can be prohibitively expensive, for example, in terms of inference time or memory usage. Effective exploration of the design space requires a holistic approach, including a range of topics from machine learning, systems, and hardware. The rapid proliferation of deep learning applications has raised demand for efficient exploration and acceleration of deep learning based solutions. However, managing the range of optimization techniques, as well as how they interact with each other across the stack is a non-trivial task. A family of emerging specialized compilers for deep learning, tensor compilers, appear to be a strong candidate to help manage the complexity of across-stack optimization choices, and enable new approaches.
This thesis presents new techniques and explorations of the Deep Learning Acceleration Stack (DLAS), with the perspective that the tensor compiler will increasingly be the center of this stack. First, we motivate the challenges in exploring DLAS, by describing the experience of running a perturbation study varying parameters at every layer of the stack. The core of the study is implemented using a tensor compiler, which reduces the complexity of evaluating the wide range of variants, although still requires a significant engineering effort to realize. Next, we develop a new algorithm for grouped convolution, a model optimization technique for which existing solutions provided poor inference time scaling. We implement and optimize our algorithm using a tensor compiler, outperforming existing approaches by 5.1Ă on average (arithmetic mean). Finally, we propose a technique, transfer-tuning, to reduce the search time required for automatic tensor compiler code optimization, reducing the search time required by 6.5Ă on average.
The techniques and contributions of this thesis across these interconnected domains demonstrate the exciting potential of tensor compilers to simplify and improve design space exploration for DNNs, and their deployment. The outcomes of this thesis enable new lines of research to enable machine learning developers to keep up with the rapidly evolving landscape of neural architectures and hardware
ShadowNet: A Secure and Efficient System for On-device Model Inference
With the increased usage of AI accelerators on mobile and edge devices,
on-device machine learning (ML) is gaining popularity. Consequently, thousands
of proprietary ML models are being deployed on billions of untrusted devices.
This raises serious security concerns about model privacy. However, protecting
the model privacy without losing access to the AI accelerators is a challenging
problem. In this paper, we present a novel on-device model inference system,
ShadowNet. ShadowNet protects the model privacy with Trusted Execution
Environment (TEE) while securely outsourcing the heavy linear layers of the
model to the untrusted hardware accelerators. ShadowNet achieves this by
transforming the weights of the linear layers before outsourcing them and
restoring the results inside the TEE. The nonlinear layers are also kept secure
inside the TEE. The transformation of the weights and the restoration of the
results are designed in a way that can be implemented efficiently. We have
built a ShadowNet prototype based on TensorFlow Lite and applied it on four
popular CNNs, namely, MobileNets, ResNet-44, AlexNet and MiniVGG. Our
evaluation shows that ShadowNet achieves strong security guarantees with
reasonable performance, offering a practical solution for secure on-device
model inference.Comment: single column, 21 pages (29 pages include appendix), 12 figure
Assessing Robustness of Image Recognition Models to Changes in the Computational Environment
Image recognition tasks typically use deep learning and require enormous processing power, thus relying on hardware accelerators like GPUs and TPUs for fast, timely processing. Failure in real-time image recognition tasks can occur due to incorrect mapping on hardware accelerators, which may lead to timing uncertainty and incorrect behavior. In addition, the increasing demand for optimal performance has led to progress towards the optimization of different neural network operations, such as operator fusion. Owing to the increased use of image recognition tasks in safety-critical applications like autonomous driving and medical imaging, it is imperative to assess the performance and impact of such optimizations, and explore their effectiveness.
In this paper we conduct robustness analysis of four popular image recognition models with the ImageNet dataset, assessing the impact of the compiler optimizations applied, utilizing different Deep Learning frameworks and executing on hardware devices of varying capabilities. Our results indicate output label discrepancies of up to 37% across deep learning framework conversions, and up to 81.8% unexpected performance degradation upon application of compiler optimizations
DeltaNN: Assessing the Impact of Computational Environment Parameters on the Performance of Image Recognition Models
Image recognition tasks typically use deep learning and require enormous
processing power, thus relying on hardware accelerators like GPUs and TPUs for
fast, timely processing. Failure in real-time image recognition tasks can occur
due to sub-optimal mapping on hardware accelerators during model deployment,
which may lead to timing uncertainty and erroneous behavior. Mapping on
hardware accelerators is done using multiple software components like deep
learning frameworks, compilers, and device libraries, that we refer to as the
computational environment. Owing to the increased use of image recognition
tasks in safety-critical applications like autonomous driving and medical
imaging, it is imperative to assess their robustness to changes in the
computational environment, as the impact of parameters like deep learning
frameworks, compiler optimizations, and hardware devices on model performance
and correctness is not yet well understood.
In this paper we present a differential testing framework, DeltaNN, that
allows us to assess the impact of different computational environment
parameters on the performance of image recognition models during deployment,
post training. DeltaNN generates different implementations of a given image
recognition model for variations in environment parameters, namely, deep
learning frameworks, compiler optimizations and hardware devices and analyzes
differences in model performance as a result. Using DeltaNN, we conduct an
empirical study of robustness analysis of three popular image recognition
models using the ImageNet dataset. We report the impact in terms of
misclassifications and inference time differences across different settings. In
total, we observed up to 72% output label differences across deep learning
frameworks, and up to 81% unexpected performance degradation in terms of
inference time, when applying compiler optimizations.Comment: 11 pages, 10 figures, 2 table
DeltaNN: Assessing the Impact of Computational Environment Parameters on the Performance of Image Recognition Models
Image recognition tasks typically use deep learning and require enormous processing power, thus relying on hardware accelerators like GPUs and TPUs for fast, timely processing. Failure in real-time image recognition tasks can occur due to sub-optimal mapping on hardware accelerators during model deployment, which may lead to timing uncertainty and erroneous behavior. Mapping on hardware accelerators is done using multiple software components like deep learning frameworks, compilers, and device libraries, that we refer to as the computational environment. Owing to the increased use of image recognition tasks in safety-critical applications like autonomous driving and medical imaging, it is imperative to assess their robustness to changes in the computational environment, as the impact of parameters like deep learning frameworks, compiler optimizations, and hardware devices on model performance and correctness is not yet well understood. In this paper we present a differential testing framework, DeltaNN, that allows us to assess the impact of different computational environment parameters on the performance of image recognition models during deployment, post training. DeltaNN generates different implementations of a given image recognition model for variations in environment parameters, namely, deep learning frameworks, compiler optimizations and hardware devices and analyzes differences in model performance as a result. Using DeltaNN, we conduct an empirical study of robustness analysis of three popular image recognition models using the ImageNet dataset. We report the impact in terms of misclassifications and inference time differences across different settings. In total, we observed up to 72% output label differences across deep learning frameworks, and up to 81% unexpected performance degradation in terms of inference time, when applying compiler optimizations
LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing
LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft
Machine Learning for Microcontroller-Class Hardware -- A Review
The advancements in machine learning opened a new opportunity to bring
intelligence to the low-end Internet-of-Things nodes such as microcontrollers.
Conventional machine learning deployment has high memory and compute footprint
hindering their direct deployment on ultra resource-constrained
microcontrollers. This paper highlights the unique requirements of enabling
onboard machine learning for microcontroller class devices. Researchers use a
specialized model development workflow for resource-limited applications to
ensure the compute and latency budget is within the device limits while still
maintaining the desired performance. We characterize a closed-loop widely
applicable workflow of machine learning model development for microcontroller
class devices and show that several classes of applications adopt a specific
instance of it. We present both qualitative and numerical insights into
different stages of model development by showcasing several use cases. Finally,
we identify the open research challenges and unsolved questions demanding
careful considerations moving forward.Comment: Accepted for publication at IEEE Sensors Journa