Search CORE

1,285 research outputs found

Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training

Author: Kong Zhenglun
Li Sheng
Li Yanyu
Ren Jian
Tang Xulong
Tulyakov Sergey
Wang Yanzhi
Yuan Geng
Publication venue
Publication date: 22/09/2022
Field of study

Recently, sparse training has emerged as a promising paradigm for efficient deep learning on edge devices. The current research mainly devotes efforts to reducing training costs by further increasing model sparsity. However, increasing sparsity is not always ideal since it will inevitably introduce severe accuracy degradation at an extremely high sparsity level. This paper intends to explore other possible directions to effectively and efficiently reduce sparse training costs while preserving accuracy. To this end, we investigate two techniques, namely, layer freezing and data sieving. First, the layer freezing approach has shown its success in dense model training and fine-tuning, yet it has never been adopted in the sparse training domain. Nevertheless, the unique characteristics of sparse training may hinder the incorporation of layer freezing techniques. Therefore, we analyze the feasibility and potentiality of using the layer freezing technique in sparse training and find it has the potential to save considerable training costs. Second, we propose a data sieving method for dataset-efficient training, which further reduces training costs by ensuring only a partial dataset is used throughout the entire training process. We show that both techniques can be well incorporated into the sparse training algorithm to form a generic framework, which we dub SpFDE. Our extensive experiments demonstrate that SpFDE can significantly reduce training costs while preserving accuracy from three dimensions: weight sparsity, layer freezing, and dataset sieving.Comment: Published in 36th Conference on Neural Information Processing Systems (NeurIPS 2022

arXiv.org e-Print Archive

Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off

Author: Ding Caiwen
Huang Shaoyi
Lei Bowen
Peng Hongwu
Sun Yue
Xie Mimi
Xu Dongkuan
Publication venue
Publication date: 24/04/2023
Field of study

Over-parameterization of deep neural networks (DNNs) has shown high prediction accuracy for many applications. Although effective, the large number of parameters hinders its popularity on resource-limited devices and has an outsize environmental impact. Sparse training (using a fixed number of nonzero weights in each iteration) could significantly mitigate the training costs by reducing the model size. However, existing sparse training methods mainly use either random-based or greedy-based drop-and-grow strategies, resulting in local minimal and low accuracy. In this work, we consider the dynamic sparse training as a sparse connectivity search problem and design an exploitation and exploration acquisition function to escape from local optima and saddle points. We further design an acquisition function and provide the theoretical guarantees for the proposed method and clarify its convergence property. Experimental results show that sparse models (up to 98\% sparsity) obtained by our proposed method outperform the SOTA sparse training methods on a wide variety of deep learning tasks. On VGG-19 / CIFAR-100, ResNet-50 / CIFAR-10, ResNet-50 / CIFAR-100, our method has even higher accuracy than dense models. On ResNet-50 / ImageNet, the proposed method has up to 8.2\% accuracy improvement compared to SOTA sparse training methods

arXiv.org e-Print Archive

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

Author: Chen Tianlong
Chen Xiaohan
Liu Shiwei
Mocanu Decebal Constantin
Pechenizkiy Mykola
Shen Li
Wang Zhangyang
Publication venue: OpenReview
Publication date: 20/01/2022
Field of study

Random pruning is arguably the most naive way to attain sparsity in neural networks, but has been deemed uncompetitive by either post-training pruning or sparse training. In this paper, we focus on sparse training and highlight a perhaps counter-intuitive finding, that random pruning at initialization can be quite powerful for the sparse training of modern neural networks. Without any delicate pruning criteria or carefully pursued sparsity structures, we empirically demonstrate that sparsely training a randomly pruned network from scratch can match the performance of its dense equivalent. There are two key factors that contribute to this revival: (i) the network sizes matter: as the original dense networks grow wider and deeper, the performance of training a randomly pruned sparse network will quickly grow to matching that of its dense equivalent, even at high sparsity ratios; (ii) appropriate layer-wise sparsity ratios can be pre-chosen for sparse training, which shows to be another important performance booster. Simple as it looks, a randomly pruned subnetwork of Wide ResNet-50 can be sparsely trained to outperforming a dense Wide ResNet-50, on ImageNet. We also observed such randomly pruned networks outperform dense counterparts in other favorable aspects, such as out-of-distribution detection, uncertainty estimation, and adversarial robustness. Overall, our results strongly suggest there is larger-than-expected room for sparse training at scale, and the benefits of sparsity might be more universal beyond carefully designed pruning. Our source code can be found at https://github.com/VITA-Group/Random_Pruning.Comment: Published as a conference paper at ICLR 2022. Code is available at https://github.com/VITA-Group/Random_Prunin

arXiv.org e-Print Archive

University of Twente Research Information

Sparse Training Theory for Scalable and Efficient Agents

Author: Curci Selima
Ernst Damien
Gibescu Madeleine
Mocanu Decebal Constantin
Mocanu Elena
Nguyen Phuong H.
Pinto Tiago
Vale Zita A.
Publication venue
Publication date: 01/01/2021
Field of study

A fundamental task for artificial intelligence is learning. Deep Neural Networks have proven to cope perfectly with all learning paradigms, i.e. supervised, unsupervised, and reinforcement learning. Nevertheless, traditional deep learning approaches make use of cloud computing facilities and do not scale well to autonomous agents with low computational resources. Even in the cloud, they suffer from computational and memory limitations, and they cannot be used to model adequately large physical worlds for agents which assume networks with billions of neurons. These issues are addressed in the last few years by the emerging topic of sparse training, which trains sparse networks from scratch. This paper discusses sparse training state-of-the-art, its challenges and limitations while introducing a couple of new theoretical research directions which has the potential of alleviating sparse training limitations to push deep learning scalability well beyond its current boundaries. Nevertheless, the theoretical advancements impact in complex multi-agents settings is discussed from a real-world perspective, using the smart grid case study

arXiv.org e-Print Archive

Pure OAI Repository

Open Repository and Bibliography - Liège

University of Twente Research Information

Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training

Author: Liu Shiwei
Mocanu Decebal Constantin
Pechenizkiy Mykola
Yin Lu
Publication venue
Publication date: 08/05/2021
Field of study

In this paper, we introduce a new perspective on training deep neural networks capable of state-of-the-art performance without the need for the expensive over-parameterization by proposing the concept of In-Time Over-Parameterization (ITOP) in sparse training. By starting from a random sparse network and continuously exploring sparse connectivities during training, we can perform an Over-Parameterization in the space-time manifold, closing the gap in the expressibility between sparse training and dense training. We further use ITOP to understand the underlying mechanism of Dynamic Sparse Training (DST) and indicate that the benefits of DST come from its ability to consider across time all possible parameters when searching for the optimal sparse connectivity. As long as there are sufficient parameters that have been reliably explored during training, DST can outperform the dense neural network by a large margin. We present a series of experiments to support our conjecture and achieve the state-of-the-art sparse training performance with ResNet-50 on ImageNet. More impressively, our method achieves dominant performance over the overparameterization-based sparse methods at extreme sparsity levels. When trained on CIFAR-100, our method can match the performance of the dense model even at an extreme sparsity (98%). Code can be found https://github.com/Shiweiliuiiiiiii/In-Time-Over-Parameterization.Comment: 16 pages; 10 figures; Published in Proceedings of the 38th International Conference on Machine Learning. Code can be found https://github.com/Shiweiliuiiiiiii/In-Time-Over-Parameterizatio

arXiv.org e-Print Archive

University of Twente Research Information

JaxPruner: A concise library for sparsity research

Author: Agrawal Shivani
Bik Aart
Castro Pablo Samuel
Dauphin Yann
Dziugaite Gintare Karolina
Evci Utku
Ferev Milen
Frantar Elias
Gale Trevor
Han Woohyun
Han Zhonglin
Kao Sheng-Chun
Kim Han-Byul
Kim Hong-Seok
Lee Joo Hyung
Lee Namhoon
Long Yun
Mitchell Nicole
Obando-Ceron Johan
Park Wonpyo
Pilault Jonathan
Subramanian Suvinay
Wang Xin
Yazdanbakhsh Amir
Zhang Xingyao
Publication venue
Publication date: 02/05/2023
Field of study

This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks.Comment: Jaxpruner is hosted at http://github.com/google-research/jaxprune

arXiv.org e-Print Archive

Cross likelihood ratio based speaker clustering using eigenvoice models

Author: Dean David
Sridharan Sridha
Vogt Robert
Wang David
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2011
Field of study

This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system

CiteSeerX

Queensland University of Technology ePrints Archive