348 research outputs found

    Soft Threshold Weight Reparameterization for Learnable Sparsity

    Full text link
    Sparsity in Deep Neural Networks (DNNs) is studied extensively with the focus of maximizing prediction accuracy given an overall parameter budget. Existing methods rely on uniform or heuristic non-uniform sparsity budgets which have sub-optimal layer-wise parameter allocation resulting in a) lower prediction accuracy or b) higher inference cost (FLOPs). This work proposes Soft Threshold Reparameterization (STR), a novel use of the soft-threshold operator on DNN weights. STR smoothly induces sparsity while learning pruning thresholds thereby obtaining a non-uniform sparsity budget. Our method achieves state-of-the-art accuracy for unstructured sparsity in CNNs (ResNet50 and MobileNetV1 on ImageNet-1K), and, additionally, learns non-uniform budgets that empirically reduce the FLOPs by up to 50%. Notably, STR boosts the accuracy over existing results by up to 10% in the ultra sparse (99%) regime and can also be used to induce low-rank (structured sparsity) in RNNs. In short, STR is a simple mechanism which learns effective sparsity budgets that contrast with popular heuristics. Code, pretrained models and sparsity budgets are at https://github.com/RAIVNLab/STR.Comment: 19 pages, 10 figures, Published at International Conference on Machine Learning (ICML) 202

    A Classification Supervised Auto-Encoder Based on Predefined Evenly-Distributed Class Centroids

    Full text link
    Classic variational autoencoders are used to learn complex data distributions, that are built on standard function approximators. Especially, VAE has shown promise on a lot of complex task. In this paper, a new autoencoder model - classification supervised autoencoder (CSAE) based on predefined evenly-distributed class centroids (PEDCC) is proposed. Our method uses PEDCC of latent variables to train the network to ensure the maximization of inter-class distance and the minimization of inner-class distance. Instead of learning mean/variance of latent variables distribution and taking reparameterization of VAE, latent variables of CSAE are directly used to classify and as input of decoder. In addition, a new loss function is proposed to combine the loss function of classification. Based on the basic structure of the universal autoencoder, we realized the comprehensive optimal results of encoding, decoding, classification, and good model generalization performance at the same time. Theoretical advantages are reflected in experimental results.Comment: 16 pages,12 figures, 4 table

    Dynamic Channel Pruning: Feature Boosting and Suppression

    Full text link
    Making deep convolutional neural networks more accurate typically comes at the cost of increased computational and memory resources. In this paper, we reduce this cost by exploiting the fact that the importance of features computed by convolutional layers is highly input-dependent, and propose feature boosting and suppression (FBS), a new method to predictively amplify salient convolutional channels and skip unimportant ones at run-time. FBS introduces small auxiliary connections to existing convolutional layers. In contrast to channel pruning methods which permanently remove channels, it preserves the full network structures and accelerates convolution by dynamically skipping unimportant input and output channels. FBS-augmented networks are trained with conventional stochastic gradient descent, making it readily available for many state-of-the-art CNNs. We compare FBS to a range of existing channel pruning and dynamic execution schemes and demonstrate large improvements on ImageNet classification. Experiments show that FBS can respectively provide 5×5\times and 2×2\times savings in compute on VGG-16 and ResNet-18, both with less than 0.6%0.6\% top-5 accuracy loss.Comment: 14 pages, 5 figures, 4 tables, published as a conference paper at ICLR 201

    Dynamic Routing Networks

    Full text link
    The deployment of deep neural networks in real-world applications is mostly restricted by their high inference costs. Extensive efforts have been made to improve the accuracy with expert-designed or algorithm-searched architectures. However, the incremental improvement is typically achieved with increasingly more expensive models that only a small portion of input instances really need. Inference with a static architecture that processes all input instances via the same transformation would thus incur unnecessary computational costs. Therefore, customizing the model capacity in an instance-aware manner is much needed for higher inference efficiency. In this paper, we propose Dynamic Routing Networks (DRNets), which support efficient instance-aware inference by routing the input instance to only necessary transformation branches selected from a candidate set of branches for each connection between transformation nodes. The branch selection is dynamically determined via the corresponding branch importance weights, which are first generated from lightweight hypernetworks (RouterNets) and then recalibrated with Gumbel-Softmax before the selection. Extensive experiments show that DRNets can reduce a substantial amount of parameter size and FLOPs during inference with prediction performance comparable to state-of-the-art architectures.Comment: 10 pages, 3 figures, 3 table

    Fisher-Bures Adversary Graph Convolutional Networks

    Full text link
    In a graph convolutional network, we assume that the graph GG is generated wrt some observation noise. During learning, we make small random perturbations ΔG\Delta{}G of the graph and try to improve generalization. Based on quantum information geometry, ΔG\Delta{}G can be characterized by the eigendecomposition of the graph Laplacian matrix. We try to minimize the loss wrt the perturbed G+ΔGG+\Delta{G} while making ΔG\Delta{G} to be effective in terms of the Fisher information of the neural network. Our proposed model can consistently improve graph convolutional networks on semi-supervised node classification tasks with reasonable computational overhead. We present three different geometries on the manifold of graphs: the intrinsic geometry measures the information theoretic dynamics of a graph; the extrinsic geometry characterizes how such dynamics can affect externally a graph neural network; the embedding geometry is for measuring node embeddings. These new analytical tools are useful in developing a good understanding of graph neural networks and fostering new techniques.Comment: Published in UAI 201

    Routing Networks and the Challenges of Modular and Compositional Computation

    Full text link
    Compositionality is a key strategy for addressing combinatorial complexity and the curse of dimensionality. Recent work has shown that compositional solutions can be learned and offer substantial gains across a variety of domains, including multi-task learning, language modeling, visual question answering, machine comprehension, and others. However, such models present unique challenges during training when both the module parameters and their composition must be learned jointly. In this paper, we identify several of these issues and analyze their underlying causes. Our discussion focuses on routing networks, a general approach to this problem, and examines empirically the interplay of these challenges and a variety of design decisions. In particular, we consider the effect of how the algorithm decides on module composition, how the algorithm updates the modules, and if the algorithm uses regularization

    Balanced Sparsity for Efficient DNN Inference on GPU

    Full text link
    In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. But this method often sacrifices model accuracy. In this paper, we propose a novel fine-grained sparsity approach, balanced sparsity, to achieve high model accuracy with commercial hardwares efficiently. Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. Experiment results show that balanced sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as fine-grained sparsity

    The State of Sparsity in Deep Neural Networks

    Full text link
    We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet. Across thousands of experiments, we demonstrate that complex techniques (Molchanov et al., 2017; Louizos et al., 2017b) shown to yield high compression rates on smaller datasets perform inconsistently, and that simple magnitude pruning approaches achieve comparable or better results. Additionally, we replicate the experiments performed by (Frankle & Carbin, 2018) and (Liu et al., 2018) at scale and show that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization. Together, these results highlight the need for large-scale benchmarks in the field of model compression. We open-source our code, top performing model checkpoints, and results of all hyperparameter configurations to establish rigorous baselines for future work on compression and sparsification

    ReSet: Learning Recurrent Dynamic Routing in ResNet-like Neural Networks

    Full text link
    Neural Network is a powerful Machine Learning tool that shows outstanding performance in Computer Vision, Natural Language Processing, and Artificial Intelligence. In particular, recently proposed ResNet architecture and its modifications produce state-of-the-art results in image classification problems. ResNet and most of the previously proposed architectures have a fixed structure and apply the same transformation to all input images. In this work, we develop a ResNet-based model that dynamically selects Computational Units (CU) for each input object from a learned set of transformations. Dynamic selection allows the network to learn a sequence of useful transformations and apply only required units to predict the image label. We compare our model to ResNet-38 architecture and achieve better results than the original ResNet on CIFAR-10.1 test set. While examining the produced paths, we discovered that the network learned different routes for images from different classes and similar routes for similar images.Comment: Published in Proceedings of The 10th Asian Conference on Machine Learning, http://proceedings.mlr.press/v95/kemaev18a.htm

    Improving Variational Inference with Inverse Autoregressive Flow

    Full text link
    The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables. We propose a new type of normalizing flow, inverse autoregressive flow (IAF), that, in contrast to earlier published flows, scales well to high-dimensional latent spaces. The proposed flow consists of a chain of invertible transformations, where each transformation is based on an autoregressive neural network. In experiments, we show that IAF significantly improves upon diagonal Gaussian approximate posteriors. In addition, we demonstrate that a novel type of variational autoencoder, coupled with IAF, is competitive with neural autoregressive models in terms of attained log-likelihood on natural images, while allowing significantly faster synthesis
    corecore