79 research outputs found
Slimmable Networks for Contrastive Self-supervised Learning
Self-supervised learning makes great progress in large model pre-training but
suffers in training small models. Previous solutions to this problem mainly
rely on knowledge distillation and indeed have a two-stage learning procedure:
first train a large teacher model, then distill it to improve the
generalization ability of small ones. In this work, we present a new one-stage
solution to obtain pre-trained small models without extra teachers: slimmable
networks for contrastive self-supervised learning (\emph{SlimCLR}). A slimmable
network contains a full network and several weight-sharing sub-networks. We can
pre-train for only one time and obtain various networks including small ones
with low computation costs. However, in self-supervised cases, the interference
between weight-sharing networks leads to severe performance degradation. One
evidence of the interference is \emph{gradient imbalance}: a small proportion
of parameters produces dominant gradients during backpropagation, and the main
parameters may not be fully optimized. The divergence in gradient directions of
various networks may also cause interference between networks. To overcome
these problems, we make the main parameters produce dominant gradients and
provide consistent guidance for sub-networks via three techniques: slow start
training of sub-networks, online distillation, and loss re-weighting according
to model sizes. Besides, a switchable linear probe layer is applied during
linear evaluation to avoid the interference of weight-sharing linear layers. We
instantiate SlimCLR with typical contrastive learning frameworks and achieve
better performance than previous arts with fewer parameters and FLOPs.Comment: preprint,work in progres
Slimmable Generative Adversarial Networks
Generative adversarial networks (GANs) have achieved remarkable progress in
recent years, but the continuously growing scale of models makes them
challenging to deploy widely in practical applications. In particular, for
real-time generation tasks, different devices require generators of different
sizes due to varying computing power. In this paper, we introduce slimmable
GANs (SlimGANs), which can flexibly switch the width of the generator to
accommodate various quality-efficiency trade-offs at runtime. Specifically, we
leverage multiple discriminators that share partial parameters to train the
slimmable generator. To facilitate the \textit{consistency} between generators
of different widths, we present a stepwise inplace distillation technique that
encourages narrow generators to learn from wide ones. As for class-conditional
generation, we propose a sliceable conditional batch normalization that
incorporates the label information into different widths. Our methods are
validated, both quantitatively and qualitatively, by extensive experiments and
a detailed ablation study.Comment: Accepted to AAAI 202
SteppingNet: A Stepping Neural Network with Incremental Accuracy Enhancement
Deep neural networks (DNNs) have successfully been applied in many fields in
the past decades. However, the increasing number of multiply-and-accumulate
(MAC) operations in DNNs prevents their application in resource-constrained and
resource-varying platforms, e.g., mobile phones and autonomous vehicles. In
such platforms, neural networks need to provide acceptable results quickly and
the accuracy of the results should be able to be enhanced dynamically according
to the computational resources available in the computing system. To address
these challenges, we propose a design framework called SteppingNet. SteppingNet
constructs a series of subnets whose accuracy is incrementally enhanced as more
MAC operations become available. Therefore, this design allows a trade-off
between accuracy and latency. In addition, the larger subnets in SteppingNet
are built upon smaller subnets, so that the results of the latter can directly
be reused in the former without recomputation. This property allows SteppingNet
to decide on-the-fly whether to enhance the inference accuracy by executing
further MAC operations. Experimental results demonstrate that SteppingNet
provides an effective incremental accuracy improvement and its inference
accuracy consistently outperforms the state-of-the-art work under the same
limit of computational resources.Comment: accepted by DATE2023 (Design, Automation and Test in Europe
Slimmable Encoders for Flexible Split DNNs in Bandwidth and Resource Constrained IoT Systems
The execution of large deep neural networks (DNN) at mobile edge devices
requires considerable consumption of critical resources, such as energy, while
imposing demands on hardware capabilities. In approaches based on edge
computing the execution of the models is offloaded to a compute-capable device
positioned at the edge of 5G infrastructures. The main issue of the latter
class of approaches is the need to transport information-rich signals over
wireless links with limited and time-varying capacity. The recent split
computing paradigm attempts to resolve this impasse by distributing the
execution of DNN models across the layers of the systems to reduce the amount
of data to be transmitted while imposing minimal computing load on mobile
devices. In this context, we propose a novel split computing approach based on
slimmable ensemble encoders. The key advantage of our design is the ability to
adapt computational load and transmitted data size in real-time with minimal
overhead and time. This is in contrast with existing approaches, where the same
adaptation requires costly context switching and model loading. Moreover, our
model outperforms existing solutions in terms of compression efficacy and
execution time, especially in the context of weak mobile devices. We present a
comprehensive comparison with the most advanced split computing solutions, as
well as an experimental evaluation on GPU-less devices
- …