12 research outputs found
URNet : User-Resizable Residual Networks with Conditional Gating Module
Convolutional Neural Networks are widely used to process spatial scenes, but
their computational cost is fixed and depends on the structure of the network
used. There are methods to reduce the cost by compressing networks or varying
its computational path dynamically according to the input image. However, since
a user can not control the size of the learned model, it is difficult to
respond dynamically if the amount of service requests suddenly increases. We
propose User-Resizable Residual Networks (URNet), which allows users to adjust
the scale of the network as needed during evaluation. URNet includes
Conditional Gating Module (CGM) that determines the use of each residual block
according to the input image and the desired scale. CGM is trained in a
supervised manner using the newly proposed scale loss and its corresponding
training methods. URNet can control the amount of computation according to
user's demand without degrading the accuracy significantly. It can also be used
as a general compression method by fixing the scale size during training. In
the experiments on ImageNet, URNet based on ResNet-101 maintains the accuracy
of the baseline even when resizing it to approximately 80% of the original
network, and demonstrates only about 1% accuracy degradation when using about
65% of the computation.Comment: 12 page
Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data
Few-shot keyword spotting (FS-KWS) models usually require large-scale
annotated datasets to generalize to unseen target keywords. However, existing
KWS datasets are limited in scale and gathering keyword-like labeled data is
costly undertaking. To mitigate this issue, we propose a framework that uses
easily collectible, unlabeled reading speech data as an auxiliary source.
Self-supervised learning has been widely adopted for learning representations
from unlabeled data; however, it is known to be suitable for large models with
enough capacity and is not practical for training a small footprint FS-KWS
model. Instead, we automatically annotate and filter the data to construct a
keyword-like dataset, LibriWord, enabling supervision on auxiliary data. We
then adopt multi-task learning that helps the model to enhance the
representation power from out-of-domain auxiliary data. Our method notably
improves the performance over competitive methods in the FS-KWS benchmark.Comment: Interspeech 202
PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation
As edge devices become prevalent, deploying Deep Neural Networks (DNN) on edge devices has become a critical issue. However, DNN requires a high computational resource which is rarely available for edge devices. To handle this, we propose a novel model compression method for the devices with limited computational resources, called PQK consisting of pruning, quantization, and knowledge distillation (KD) processes. Unlike traditional pruning and KD, PQK makes use of unimportant weights pruned in the pruning process to make a teacher network for training a better student network without pre-training the teacher model. PQK has two phases. Phase 1 exploits iterative pruning and quantization-aware training to make a lightweight and power-efficient model. In phase 2, we make a teacher network by adding unimportant weights unused in phase 1 to a pruned network. By using this teacher network, we train the pruned network as a student network. In doing so, we do not need a pre-trained teacher network for the KD framework because the teacher and the student networks coexist within the same network (See Fig. 1). We apply our method to the recognition model and verify the effectiveness of PQK on keyword spotting (KWS) and image recognition.Y
PROTOTYPE-BASED PERSONALIZED PRUNING
Nowadays, as edge devices such as smartphones become prevalent, there are increasing demands for personalized services. However, traditional personalization methods are not suitable for edge devices because retraining or finetuning is needed with limited personal data. Also, a full model might be too heavy for edge devices with limited resources. Unfortunately, model compression methods which can handle the model complexity issue also require the retraining phase. These multiple training phases generally need huge computational cost during on-device learning which can be a burden to edge devices. In this work, we propose a dynamic personalization method called prototype-based personalized pruning (PPP). PPP considers both ends of personalization and model efficiency. After training a network, PPP can easily prune the network with a prototype representing the characteristics of personal data and it performs well without retraining or finetuning. We verify the usefulness of PPP on a couple of tasks in computer vision and Keyword spotting.Y
Broadcasted Residual Learning for Efficient Keyword Spotting
Keyword spotting is an important research field because it plays a key role
in device wake-up and user interaction on smart devices. However, it is
challenging to minimize errors while operating efficiently in devices with
limited resources such as mobile phones. We present a broadcasted residual
learning method to achieve high accuracy with small model size and
computational load. Our method configures most of the residual functions as 1D
temporal convolution while still allows 2D convolution together using a
broadcasted-residual connection that expands temporal output to
frequency-temporal dimension. This residual mapping enables the network to
effectively represent useful audio features with much less computation than
conventional convolutional neural networks. We also propose a novel network
architecture, Broadcasting-residual network (BC-ResNet), based on broadcasted
residual learning and describe how to scale up the model according to the
target device's resources. BC-ResNets achieve state-of-the-art 98.0% and 98.7%
top-1 accuracy on Google speech command datasets v1 and v2, respectively, and
consistently outperform previous approaches, using fewer computations and
parameters.Comment: Proceedings of INTERSPEECH 202
Variational On-the-Fly Personalization
With the development of deep learning (DL) technologies, the demand for DL-based services on personal devices, such as mobile phones, also increases rapidly. In this paper, we propose a novel personalization method, Variational Onthe-Fly Personalization. Compared to the conventional personalization methods that require additional fine-tuning with personal data, the proposed method only requires forwarding a handful of personal data on-the-fly. Assuming even a single personal data can convey the characteristics of a target person, we develop the variational hyper-personalizer to capture the weight distribution of layers that fits the target person. In the testing phase, the hyper-personalizer estimates the model's weights on-the-fly based on personality by forwarding only a small amount of (even a single) personal enrollment data. Hence, the proposed method can perform the personalization without any training software platform and additional cost in the edge device. In experiments, we show our approach can effectively generate reliable personalized models via forwarding (not back-propagating) a handful of samples.N