338,340 research outputs found
FFT-Based Deep Learning Deployment in Embedded Systems
Deep learning has delivered its powerfulness in many application domains,
especially in image and speech recognition. As the backbone of deep learning,
deep neural networks (DNNs) consist of multiple layers of various types with
hundreds to thousands of neurons. Embedded platforms are now becoming essential
for deep learning deployment due to their portability, versatility, and energy
efficiency. The large model size of DNNs, while providing excellent accuracy,
also burdens the embedded platforms with intensive computation and storage.
Researchers have investigated on reducing DNN model size with negligible
accuracy loss. This work proposes a Fast Fourier Transform (FFT)-based DNN
training and inference model suitable for embedded platforms with reduced
asymptotic complexity of both computation and storage, making our approach
distinguished from existing approaches. We develop the training and inference
algorithms based on FFT as the computing kernel and deploy the FFT-based
inference model on embedded platforms achieving extraordinary processing speed.Comment: Design, Automation, and Test in Europe (DATE) For source code, please
contact Mahdi Nazemi at <[email protected]
To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency
Sequence-to-sequence language models can be used to produce abstractive
summaries which are coherent, relevant, and concise. Still, model sizes can
make deployment in latency-sensitive or web-scale implementations difficult.
This paper studies the relationship between model size, structured pruning,
inference efficiency, and summarization accuracy on widely used summarization
datasets. We show that model accuracy is tied to the encoder size while
inference efficiency is connected to the decoder. Using asymmetric pruning can
lead to nearly 3x improvement in inference latency with ~1 point loss in
Rouge-2. Moreover, we find both the average degradation and the role of
asymmetry to be consistent across model sizes and variations in datasets.Comment: SustaiNLP2023 @ ACL 2023,9 pages, 6 figures, 33 table
IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency
Efficiently optimizing multi-model inference pipelines for fast, accurate,
and cost-effective inference is a crucial challenge in ML production systems,
given their tight end-to-end latency requirements. To simplify the exploration
of the vast and intricate trade-off space of accuracy and cost in inference
pipelines, providers frequently opt to consider one of them. However, the
challenge lies in reconciling accuracy and cost trade-offs. To address this
challenge and propose a solution to efficiently manage model variants in
inference pipelines, we present IPA, an online deep-learning Inference Pipeline
Adaptation system that efficiently leverages model variants for each deep
learning task. Model variants are different versions of pre-trained models for
the same deep learning task with variations in resource requirements, latency,
and accuracy. IPA dynamically configures batch size, replication, and model
variants to optimize accuracy, minimize costs, and meet user-defined latency
SLAs using Integer Programming. It supports multi-objective settings for
achieving different trade-offs between accuracy and cost objectives while
remaining adaptable to varying workloads and dynamic traffic patterns.
Extensive experiments on a Kubernetes implementation with five real-world
inference pipelines demonstrate that IPA improves normalized accuracy by up to
35% with a minimal cost increase of less than 5%
Inference in supervised spectral classifiers for on-board hyperspectral imaging: An overview
Machine learning techniques are widely used for pixel-wise classification of hyperspectral images. These methods can achieve high accuracy, but most of them are computationally intensive models. This poses a problem for their implementation in low-power and embedded systems intended for on-board processing, in which energy consumption and model size are as important as accuracy. With a focus on embedded anci on-board systems (in which only the inference step is performed after an off-line training process), in this paper we provide a comprehensive overview of the inference properties of the most relevant techniques for hyperspectral image classification. For this purpose, we compare the size of the trained models and the operations required during the inference step (which are directly related to the hardware and energy requirements). Our goal is to search for appropriate trade-offs between on-board implementation (such as model size anci energy consumption) anci classification accuracy
LAMP: Large Deep Nets with Automated Model Parallelism for Image Segmentation
Deep Learning (DL) models are becoming larger, because the increase in model
size might offer significant accuracy gain. To enable the training of large
deep networks, data parallelism and model parallelism are two well-known
approaches for parallel training. However, data parallelism does not help
reduce memory footprint per device. In this work, we introduce Large deep 3D
ConvNets with Automated Model Parallelism (LAMP) and investigate the impact of
both input's and deep 3D ConvNets' size on segmentation accuracy. Through
automated model parallelism, it is feasible to train large deep 3D ConvNets
with a large input patch, even the whole image. Extensive experiments
demonstrate that, facilitated by the automated model parallelism, the
segmentation accuracy can be improved through increasing model size and input
context size, and large input yields significant inference speedup compared
with sliding window of small patches in the inference. Code is
available\footnote{https://monai.io/research/lamp-automated-model-parallelism}.Comment: MICCAI 2020 Early Accepted paper. Code is
available\footnote{https://monai.io/research/lamp-automated-model-parallelism
LCNN: Lookup-based Convolutional Neural Network
Porting state of the art deep learning algorithms to resource constrained
compute platforms (e.g. VR, AR, wearables) is extremely challenging. We propose
a fast, compact, and accurate model for convolutional neural networks that
enables efficient learning and inference. We introduce LCNN, a lookup-based
convolutional neural network that encodes convolutions by few lookups to a
dictionary that is trained to cover the space of weights in CNNs. Training LCNN
involves jointly learning a dictionary and a small set of linear combinations.
The size of the dictionary naturally traces a spectrum of trade-offs between
efficiency and accuracy. Our experimental results on ImageNet challenge show
that LCNN can offer 3.2x speedup while achieving 55.1% top-1 accuracy using
AlexNet architecture. Our fastest LCNN offers 37.6x speed up over AlexNet while
maintaining 44.3% top-1 accuracy. LCNN not only offers dramatic speed ups at
inference, but it also enables efficient training. In this paper, we show the
benefits of LCNN in few-shot learning and few-iteration learning, two crucial
aspects of on-device training of deep learning models.Comment: CVPR 1
- …