238 research outputs found
Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks
The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, sometimes even better than, the original dense networks. Sparsity promises to reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field
ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity
When the available hardware cannot meet the memory and compute requirements
to efficiently train high performing machine learning models, a compromise in
either the training quality or the model complexity is needed. In Federated
Learning (FL), nodes are orders of magnitude more constrained than traditional
server-grade hardware and are often battery powered, severely limiting the
sophistication of models that can be trained under this paradigm. While most
research has focused on designing better aggregation strategies to improve
convergence rates and in alleviating the communication costs of FL, fewer
efforts have been devoted to accelerating on-device training. Such stage, which
repeats hundreds of times (i.e. every round) and can involve thousands of
devices, accounts for the majority of the time required to train federated
models and, the totality of the energy consumption at the client side. In this
work, we present the first study on the unique aspects that arise when
introducing sparsity at training time in FL workloads. We then propose ZeroFL,
a framework that relies on highly sparse operations to accelerate on-device
training. Models trained with ZeroFL and 95% sparsity achieve up to 2.3% higher
accuracy compared to competitive baselines obtained from adapting a
state-of-the-art sparse training framework to the FL setting.Comment: Published as a conference paper at ICLR 202
- …