Search CORE

13 research outputs found

The Dormant Neuron Phenomenon in Deep Reinforcement Learning

Author: Agarwal Rishabh
Castro Pablo Samuel
Evci Utku
Sokar Ghada
Publication venue
Publication date: 13/06/2023
Field of study

In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective method (ReDo) that Recycles Dormant neurons throughout training. Our experiments demonstrate that ReDo maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance.Comment: Oral at ICML 202

arXiv.org e-Print Archive

Dynamic Sparse Training with Structured Sparsity

Author: Evci Utku
Golubeva Anna
Ioannou Yani
Lasby Mike
Nica Mihai
Publication venue
Publication date: 21/02/2024
Field of study

Dynamic Sparse Training (DST) methods achieve state-of-the-art results in sparse neural network training, matching the generalization of dense models while enabling sparse training and inference. Although the resulting models are highly sparse and theoretically less computationally expensive, achieving speedups with unstructured sparsity on real-world hardware is challenging. In this work, we propose a sparse-to-sparse DST method, Structured RigL (SRigL), to learn a variant of fine-grained structured N:M sparsity by imposing a constant fan-in constraint. Using our empirical analysis of existing DST methods at high sparsity, we additionally employ a neuron ablation method which enables SRigL to achieve state-of-the-art sparse-to-sparse structured DST performance on a variety of Neural Network (NN) architectures. Using a 90% sparse linear layer, we demonstrate a real-world acceleration of 3.4x/2.5x on CPU for online inference and 1.7x/13.0x on GPU for inference with a batch size of 256 when compared to equivalent dense/unstructured (CSR) sparse layers, respectively.Comment: ICLR 2024, 29 pages, 22 figure

arXiv.org e-Print Archive

Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask

Author: Agrawal Shivani
Evci Utku
Kao Sheng-Chun
Krishna Tushar
Subramanian Suvinay
Yazdanbakhsh Amir
Publication venue
Publication date: 15/09/2022
Field of study

Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DNNs). Among different categories of sparsity, structured sparsity has gained more attention due to its efficient execution on modern accelerators. Particularly, N:M sparsity is attractive because there are already hardware accelerator architectures that can leverage certain forms of N:M structured sparsity to yield higher compute-efficiency. In this work, we focus on N:M sparsity and extensively study and evaluate various training recipes for N:M sparsity in terms of the trade-off between model accuracy and compute cost (FLOPs). Building upon this study, we propose two new decay-based pruning methods, namely "pruning mask decay" and "sparse structure decay". Our evaluations indicate that these proposed methods consistently deliver state-of-the-art (SOTA) model accuracy, comparable to unstructured sparsity, on a Transformer-based model for a translation task. The increase in the accuracy of the sparse model using the new training recipes comes at the cost of marginal increase in the total training compute (FLOPs).Comment: 11 pages, 2 figures, and 9 tables. Published at the ICML Workshop on Sparsity in Neural Networks Advancing Understanding and Practice, 2022. First two authors contributed equall

arXiv.org e-Print Archive

One Step from the Locomotion to the Stepping Pattern

Author: Boulic Ronan
Evci Utku
Molla Eray
Pisupati Phanindra
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/08/2017
Field of study

The locomotion pattern is characterized by a translation displacement mostly occurring along the forward frontal body direction, whereas local repositioning with large re-orientations, i.e. stepping, may induce translations both along the frontal and the lateral body directions (holonomy). We consider here a stepping pattern with initial and final null speeds within a radius of 40% of the body height and re-orientation up to 180°. We propose a robust step detection method for such a context and identify a consistent intra-subject behavior in terms of the choice of starting foot and the number of steps

Infoscience - École polytechnique fédérale de Lausanne