Search CORE

26 research outputs found

Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation

Author: Chen Tse-Wei
Gao Hongxing
Kato Masami
Liu Junjie
Osa Kinya
Tao Wei
Wen Dongchao
Publication venue
Publication date: 13/11/2019
Field of study

Despite the recent works on knowledge distillation (KD) have achieved a further improvement through elaborately modeling the decision boundary as the posterior knowledge, their performance is still dependent on the hypothesis that the target network has a powerful capacity (representation ability). In this paper, we propose a knowledge representing (KR) framework mainly focusing on modeling the parameters distribution as prior knowledge. Firstly, we suggest a knowledge aggregation scheme in order to answer how to represent the prior knowledge from teacher network. Through aggregating the parameters distribution from teacher network into more abstract level, the scheme is able to alleviate the phenomenon of residual accumulation in the deeper layers. Secondly, as the critical issue of what the most important prior knowledge is for better distilling, we design a sparse recoding penalty for constraining the student network to learn with the penalized gradients. With the proposed penalty, the student network can effectively avoid the over-regularization during knowledge distilling and converge faster. The quantitative experiments exhibit that the proposed framework achieves the state-ofthe-arts performance, even though the target network does not have the expected capacity. Moreover, the framework is flexible enough for combining with other KD methods based on the posterior knowledge

arXiv.org e-Print Archive

Crossref

Stochastically Rank-Regularized Tensor Regression Networks

Author: Anandkumar Anima
Kolbeinsson Arinbjörn
Kossaifi Jean
Matthews Paul
Panagakis Yannis
Tzoulaki Ioanna
Publication venue
Publication date: 27/02/2019
Field of study

Over-parametrization of deep neural networks has recently been shown to be key to their successful training. However, it also renders them prone to overfitting and makes them expensive to store and train. Tensor regression networks significantly reduce the number of effective parameters in deep neural networks while retaining accuracy and the ease of training. They replace the flattening and fully-connected layers with a tensor regression layer, where the regression weights are expressed through the factors of a low-rank tensor decomposition. In this paper, to further improve tensor regression networks, we propose a novel stochastic rank-regularization. It consists of a novel randomized tensor sketching method to approximate the weights of tensor regression layers. We theoretically and empirically establish the link between our proposed stochastic rank-regularization and the dropout on low-rank tensor regression. Extensive experimental results with both synthetic data and real world datasets (i.e., CIFAR-100 and the UK Biobank brain MRI dataset) support that the proposed approach i) improves performance in both classification and regression tasks, ii) decreases overfitting, iii) leads to more stable training and iv) improves robustness to adversarial attacks and random noise

Caltech Authors

Robust Deep Networks with Randomized Tensor Regression Layers

Author: Anandkumar Anima
Bulat Adrian
Kolbeinsson Arinbjörn
Kossaifi Jean
Matthews Paul
Panagakis Yannis
Tzoulaki Ioanna
Publication venue
Publication date: 27/02/2019
Field of study

In this paper, we propose a novel randomized tensor decomposition for tensor regression. It allows to stochastically approximate the weights of tensor regression layers by randomly sampling in the low-rank subspace. We theoretically and empirically establish the link between our proposed stochastic rank-regularization and the dropout on low-rank tensor regression. This acts as an additional stochastic regularization on the regression weight, which, combined with the deterministic regularization imposed by the low-rank constraint, improves both the performance and robustness of neural networks augmented with it. In particular, it makes the model more robust to adversarial attacks and random noise, without requiring any adversarial training. We perform a thorough study of our method on synthetic data, object classification on the CIFAR100 and ImageNet datasets, and large scale brain-age prediction on UK Biobank brain MRI dataset. We demonstrate superior performance in all cases, as well as significant improvement in robustness to adversarial attacks and random noise

Caltech Authors