35 research outputs found
Channel selection for test-time adaptation under distribution shift
To ensure robustness and generalization to real-world scenarios, test-time adaptation has been recently studied as an approach to adjust models to a new data
distribution during inference. Test-time batch normalization is a simple and popular
method that achieved compelling performance on domain shift benchmarks by
recalculating batch normalization statistics on test batches. However, in many
practical applications this technique is vulnerable to label distribution shifts. We
propose to tackle this challenge by only selectively adapting channels in a deep
network, minimizing drastic adaptation that is sensitive to label shifts. We find that
adapted models significantly improve the performance compared to the baseline
models and counteract unknown label shifts
QuickNets: Saving Training and Preventing Overconfidence in Early-Exit Neural Architectures
Deep neural networks have long training and processing times. Early exits
added to neural networks allow the network to make early predictions using
intermediate activations in the network in time-sensitive applications.
However, early exits increase the training time of the neural networks. We
introduce QuickNets: a novel cascaded training algorithm for faster training of
neural networks. QuickNets are trained in a layer-wise manner such that each
successive layer is only trained on samples that could not be correctly
classified by the previous layers. We demonstrate that QuickNets can
dynamically distribute learning and have a reduced training cost and inference
cost compared to standard Backpropagation. Additionally, we introduce
commitment layers that significantly improve the early exits by identifying for
over-confident predictions and demonstrate its success.Comment: 9 pages, 4 figure
When To Grow? A Fitting Risk-Aware Policy for Layer Growing in Deep Neural Networks
Neural growth is the process of growing a small neural network to a large
network and has been utilized to accelerate the training of deep neural
networks. One crucial aspect of neural growth is determining the optimal growth
timing. However, few studies investigate this systematically. Our study reveals
that neural growth inherently exhibits a regularization effect, whose intensity
is influenced by the chosen policy for growth timing. While this regularization
effect may mitigate the overfitting risk of the model, it may lead to a notable
accuracy drop when the model underfits. Yet, current approaches have not
addressed this issue due to their lack of consideration of the regularization
effect from neural growth. Motivated by these findings, we propose an
under/over fitting risk-aware growth timing policy, which automatically adjusts
the growth timing informed by the level of potential under/overfitting risks to
address both risks. Comprehensive experiments conducted using CIFAR-10/100 and
ImageNet datasets show that the proposed policy achieves accuracy improvements
of up to 1.3% in models prone to underfitting while achieving similar
accuracies in models suffering from overfitting compared to the existing
methods.Comment: Accepted by AAAI'2
BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling
Like masked language modeling (MLM) in natural language processing, masked
image modeling (MIM) aims to extract valuable insights from image patches to
enhance the feature extraction capabilities of the underlying deep neural
network (DNN). Contrasted with other training paradigms like supervised
learning and unsupervised contrastive learning, masked image modeling (MIM)
pretraining typically demands significant computational resources in order to
manage large training data batches (e.g., 4096). The significant memory and
computation requirements pose a considerable challenge to its broad adoption.
To mitigate this, we introduce a novel learning framework,
termed~\textit{Block-Wise Masked Image Modeling} (BIM). This framework involves
decomposing the MIM tasks into several sub-tasks with independent computation
patterns, resulting in block-wise back-propagation operations instead of the
traditional end-to-end approach. Our proposed BIM maintains superior
performance compared to conventional MIM while greatly reducing peak memory
consumption. Moreover, BIM naturally enables the concurrent training of
numerous DNN backbones of varying depths. This leads to the creation of
multiple trained DNN backbones, each tailored to different hardware platforms
with distinct computing capabilities. This approach significantly reduces
computational costs in comparison with training each DNN backbone individually.
Our framework offers a promising solution for resource constrained training of
MIM