13 research outputs found
The Dormant Neuron Phenomenon in Deep Reinforcement Learning
In this work we identify the dormant neuron phenomenon in deep reinforcement
learning, where an agent's network suffers from an increasing number of
inactive neurons, thereby affecting network expressivity. We demonstrate the
presence of this phenomenon across a variety of algorithms and environments,
and highlight its effect on learning. To address this issue, we propose a
simple and effective method (ReDo) that Recycles Dormant neurons throughout
training. Our experiments demonstrate that ReDo maintains the expressive power
of networks by reducing the number of dormant neurons and results in improved
performance.Comment: Oral at ICML 202
Dynamic Sparse Training with Structured Sparsity
Dynamic Sparse Training (DST) methods achieve state-of-the-art results in
sparse neural network training, matching the generalization of dense models
while enabling sparse training and inference. Although the resulting models are
highly sparse and theoretically less computationally expensive, achieving
speedups with unstructured sparsity on real-world hardware is challenging. In
this work, we propose a sparse-to-sparse DST method, Structured RigL (SRigL),
to learn a variant of fine-grained structured N:M sparsity by imposing a
constant fan-in constraint. Using our empirical analysis of existing DST
methods at high sparsity, we additionally employ a neuron ablation method which
enables SRigL to achieve state-of-the-art sparse-to-sparse structured DST
performance on a variety of Neural Network (NN) architectures. Using a 90%
sparse linear layer, we demonstrate a real-world acceleration of 3.4x/2.5x on
CPU for online inference and 1.7x/13.0x on GPU for inference with a batch size
of 256 when compared to equivalent dense/unstructured (CSR) sparse layers,
respectively.Comment: ICLR 2024, 29 pages, 22 figure
Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask
Sparsity has become one of the promising methods to compress and accelerate
Deep Neural Networks (DNNs). Among different categories of sparsity, structured
sparsity has gained more attention due to its efficient execution on modern
accelerators. Particularly, N:M sparsity is attractive because there are
already hardware accelerator architectures that can leverage certain forms of
N:M structured sparsity to yield higher compute-efficiency. In this work, we
focus on N:M sparsity and extensively study and evaluate various training
recipes for N:M sparsity in terms of the trade-off between model accuracy and
compute cost (FLOPs). Building upon this study, we propose two new decay-based
pruning methods, namely "pruning mask decay" and "sparse structure decay". Our
evaluations indicate that these proposed methods consistently deliver
state-of-the-art (SOTA) model accuracy, comparable to unstructured sparsity, on
a Transformer-based model for a translation task. The increase in the accuracy
of the sparse model using the new training recipes comes at the cost of
marginal increase in the total training compute (FLOPs).Comment: 11 pages, 2 figures, and 9 tables. Published at the ICML Workshop on
Sparsity in Neural Networks Advancing Understanding and Practice, 2022. First
two authors contributed equall
One Step from the Locomotion to the Stepping Pattern
The locomotion pattern is characterized by a translation displacement mostly occurring along the forward frontal body direction, whereas local repositioning with large re-orientations, i.e. stepping, may induce translations both along the frontal and the lateral body directions (holonomy). We consider here a stepping pattern with initial and final null speeds within a radius of 40% of the body height and re-orientation up to 180°. We propose a robust step detection method for such a context and identify a consistent intra-subject behavior in terms of the choice of starting foot and the number of steps