39 research outputs found

    Utilizing Class Information for Deep Network Representation Shaping

    Full text link
    Statistical characteristics of deep network representations, such as sparsity and correlation, are known to be relevant to the performance and interpretability of deep learning. When a statistical characteristic is desired, often an adequate regularizer can be designed and applied during the training phase. Typically, such a regularizer aims to manipulate a statistical characteristic over all classes together. For classification tasks, however, it might be advantageous to enforce the desired characteristic per class such that different classes can be better distinguished. Motivated by the idea, we design two class-wise regularizers that explicitly utilize class information: class-wise Covariance Regularizer (cw-CR) and class-wise Variance Regularizer (cw-VR). cw-CR targets to reduce the covariance of representations calculated from the same class samples for encouraging feature independence. cw-VR is similar, but variance instead of covariance is targeted to improve feature compactness. For the sake of completeness, their counterparts without using class information, Covariance Regularizer (CR) and Variance Regularizer (VR), are considered together. The four regularizers are conceptually simple and computationally very efficient, and the visualization shows that the regularizers indeed perform distinct representation shaping. In terms of classification performance, significant improvements over the baseline and L1/L2 weight regularization methods were found for 21 out of 22 tasks over popular benchmark datasets. In particular, cw-VR achieved the best performance for 13 tasks including ResNet-32/110.Comment: Published in AAAI 201

    Receptive fields optimization in deep learning for enhanced interpretability, diversity, and resource efficiency.

    Get PDF
    In both supervised and unsupervised learning settings, deep neural networks (DNNs) are known to perform hierarchical and discriminative representation of data. They are capable of automatically extracting excellent hierarchy of features from raw data without the need for manual feature engineering. Over the past few years, the general trend has been that DNNs have grown deeper and larger, amounting to huge number of final parameters and highly nonlinear cascade of features, thus improving the flexibility and accuracy of resulting models. In order to account for the scale, diversity and the difficulty of data DNNs learn from, the architectural complexity and the excessive number of weights are often deliberately built in into their design. This flexibility and performance usually come with high computational and memory demands both during training and inference. In addition, insight into the mappings DNN models perform and human ability to understand them still remain very limited. This dissertation addresses some of these limitations by balancing three conflicting objectives: computational/ memory demands, interpretability, and accuracy. This dissertation first introduces some unsupervised feature learning methods in a broader context of dictionary learning. It also sets the tone for deep autoencoder learning and constraints for data representations in light of removing some of the aforementioned bottlenecks such as the feature interpretability of deep learning models with nonnegativity constraints on receptive fields. In addition, the two main classes of solution to the drawbacks associated with overparameterization/ over-complete representation in deep learning models are also presented. Subsequently, two novel methods, one for each solution class, are presented to address the problems resulting from over-complete representation exhibited by most deep learning models. The first method is developed to achieve inference-cost-efficient models via elimination of redundant features with negligible deterioration of prediction accuracy. This is important especially for deploying deep learning models into resource-limited portable devices. The second method aims at diversifying the features of DNNs in the learning phase to improve their performance without undermining their size and capacity. Lastly, feature diversification is considered to stabilize adversarial learning and extensive experimental outcomes show that these methods have the potential of advancing the current state-of-the-art on different learning tasks and benchmark datasets

    Regularization and Compression of Deep Neural Networks

    Get PDF
    Deep neural networks (DNN) are the state-of-the-art machine learning models outperforming traditional machine learning methods in a number of domains from vision and speech to natural language understanding and autonomous control. With large amounts of data becoming available, the task performance of DNNs in these domains predictably scales with the size of the DNNs. However, in data-scarce scenarios, large DNNs overfit to the training dataset resulting in inferior performance. Additionally, in scenarios where enormous amounts of data is available, large DNNs incur large inference latencies and memory costs. Thus, while imperative for achieving state-of-the-art performances, large DNNs require large amounts of data for training and large computational resources during inference. These two problems could be mitigated by sparsely training large DNNs. Imposing sparsity constraints during training limits the capacity of the model to overfit to the training set while still being able to obtain good generalization. Sparse DNNs have most of their weights close to zero after training. Therefore, most of the weights could be removed resulting in smaller inference costs. To effectively train sparse DNNs, this thesis proposes two new sparse stochastic regularization techniques called Bridgeout and Sparseout. Furthermore, Bridgeout is used to prune convolutional neural networks for low-cost inference. Bridgeout randomly perturbs the weights of a parametric model such as a DNN. It is theoretically shown that Bridgeout constrains the weights of linear models to a sparse subspace. Empirically, Bridgeout has been shown to perform better, on image classification tasks, than state-of-the-art DNNs when the data is limited. Sparseout is an activations counter-part of Bridgeout, operating on the outputs of the neurons instead of the weights of the neurons. Theoretically, Sparseout has been shown to be a general case of the commonly used Dropout regularization method. Empirical evidence suggests that Sparseout is capable of controlling the level of activations sparsity in neural networks. This flexibility allows Sparseout to perform better than Dropout on image classification and language modelling tasks. Furthermore, using Sparseout, it is found that activation sparsity is beneficial to recurrent neural networks for language modeling but densification of activations favors convolutional neural networks for image classification. To address the problem of high computational cost during inference, this thesis evaluates Bridgeout for pruning convolutional neural networks (CNN). It is shown that recent CNN architectures such as VGG, ResNet and Wide-ResNet trained with Bridgeout are more robust to one-shot filter pruning compared to non-sparse stochastic regularization
    corecore