Search CORE

35 research outputs found

Channel selection for test-time adaptation under distribution shift

Author: Belilovsky Eugene
Chaudhary Muawiz Sajjad
Cloutier Guy
Eickenberg Michael
Tang An
Vianna Pedro
Wolf Guy
Publication venue
Publication date: 15/12/2023
Field of study

To ensure robustness and generalization to real-world scenarios, test-time adaptation has been recently studied as an approach to adjust models to a new data distribution during inference. Test-time batch normalization is a simple and popular method that achieved compelling performance on domain shift benchmarks by recalculating batch normalization statistics on test batches. However, in many practical applications this technique is vulnerable to label distribution shifts. We propose to tackle this challenge by only selectively adapting channels in a deep network, minimizing drastic adaptation that is sensitive to label shifts. We find that adapted models significantly improve the performance compared to the baseline models and counteract unknown label shifts

Dépôt Institutionnel Numérique

LOss-Based SensiTivity rEgulaRization: towards deep sparse neural networks

Author: Bragagnolo Andrea
Grangetto Marco
Skjalg Lepsoy
Tartaglione Enzo
Publication venue
Publication date: 01/01/2020
Field of study

Institutional Research Information System University of Turin

QuickNets: Saving Training and Preventing Overconfidence in Early-Exit Neural Architectures

Author: Patel Devdhar
Siegelmann Hava
Publication venue
Publication date: 25/12/2022
Field of study

Deep neural networks have long training and processing times. Early exits added to neural networks allow the network to make early predictions using intermediate activations in the network in time-sensitive applications. However, early exits increase the training time of the neural networks. We introduce QuickNets: a novel cascaded training algorithm for faster training of neural networks. QuickNets are trained in a layer-wise manner such that each successive layer is only trained on samples that could not be correctly classified by the previous layers. We demonstrate that QuickNets can dynamically distribute learning and have a reduced training cost and inference cost compared to standard Backpropagation. Additionally, we introduce commitment layers that significantly improve the early exits by identifying for over-confident predictions and demonstrate its success.Comment: 9 pages, 4 figure

arXiv.org e-Print Archive

When To Grow? A Fitting Risk-Aware Policy for Layer Growing in Deep Neural Networks

Author: Halgamuge Saman
Malepathirana Tamasha
Oetomo Denny
Senanayake Damith
Wang Wei
Wu Haihang
Publication venue
Publication date: 05/01/2024
Field of study

Neural growth is the process of growing a small neural network to a large network and has been utilized to accelerate the training of deep neural networks. One crucial aspect of neural growth is determining the optimal growth timing. However, few studies investigate this systematically. Our study reveals that neural growth inherently exhibits a regularization effect, whose intensity is influenced by the chosen policy for growth timing. While this regularization effect may mitigate the overfitting risk of the model, it may lead to a notable accuracy drop when the model underfits. Yet, current approaches have not addressed this issue due to their lack of consideration of the regularization effect from neural growth. Motivated by these findings, we propose an under/over fitting risk-aware growth timing policy, which automatically adjusts the growth timing informed by the level of potential under/overfitting risks to address both risks. Comprehensive experiments conducted using CIFAR-10/100 and ImageNet datasets show that the proposed policy achieves accuracy improvements of up to 1.3% in models prone to underfitting while achieving similar accuracies in models suffering from overfitting compared to the existing methods.Comment: Accepted by AAAI'2

arXiv.org e-Print Archive

BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling

Author: Luo Yixuan
Ren Mengye
Zhang Sai Qian
Publication venue
Publication date: 28/11/2023
Field of study

Like masked language modeling (MLM) in natural language processing, masked image modeling (MIM) aims to extract valuable insights from image patches to enhance the feature extraction capabilities of the underlying deep neural network (DNN). Contrasted with other training paradigms like supervised learning and unsupervised contrastive learning, masked image modeling (MIM) pretraining typically demands significant computational resources in order to manage large training data batches (e.g., 4096). The significant memory and computation requirements pose a considerable challenge to its broad adoption. To mitigate this, we introduce a novel learning framework, termed~\textit{Block-Wise Masked Image Modeling} (BIM). This framework involves decomposing the MIM tasks into several sub-tasks with independent computation patterns, resulting in block-wise back-propagation operations instead of the traditional end-to-end approach. Our proposed BIM maintains superior performance compared to conventional MIM while greatly reducing peak memory consumption. Moreover, BIM naturally enables the concurrent training of numerous DNN backbones of varying depths. This leads to the creation of multiple trained DNN backbones, each tailored to different hardware platforms with distinct computing capabilities. This approach significantly reduces computational costs in comparison with training each DNN backbone individually. Our framework offers a promising solution for resource constrained training of MIM

arXiv.org e-Print Archive