51 research outputs found
Predicting Parameters in Deep Learning
We demonstrate that there is significant redundancy in the parameterization
of several deep learning models. Given only a few weight values for each
feature it is possible to accurately predict the remaining values. Moreover, we
show that not only can the parameter values be predicted, but many of them need
not be learned at all. We train several different architectures by learning
only a small number of weights and predicting the rest. In the best case we are
able to predict more than 95% of the weights of a network without any drop in
accuracy
Building high-level features using large scale unsupervised learning
We consider the problem of building high-level, class-specific feature
detectors from only unlabeled data. For example, is it possible to learn a face
detector using only unlabeled images? To answer this, we train a 9-layered
locally connected sparse autoencoder with pooling and local contrast
normalization on a large dataset of images (the model has 1 billion
connections, the dataset has 10 million 200x200 pixel images downloaded from
the Internet). We train this network using model parallelism and asynchronous
SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to
what appears to be a widely-held intuition, our experimental results reveal
that it is possible to train a face detector without having to label images as
containing a face or not. Control experiments show that this feature detector
is robust not only to translation but also to scaling and out-of-plane
rotation. We also find that the same network is sensitive to other high-level
concepts such as cat faces and human bodies. Starting with these learned
features, we trained our network to obtain 15.8% accuracy in recognizing 20,000
object categories from ImageNet, a leap of 70% relative improvement over the
previous state-of-the-art
Score Function Features for Discriminative Learning
Feature learning forms the cornerstone for tackling challenging learning problems in domains such as speech, computer vision and natural language processing. In this paper, we consider a novel class of matrix and tensor-valued features, which can be pre-trained using unlabeled samples. We present efficient algorithms for extracting discriminative information, given these pre-trained features and labeled samples for any related task. Our class of features are based on higher-order score functions, which capture local variations in the probability density function of the input. We establish a theoretical framework to characterize the nature of discriminative information that can be extracted from score-function features, when used in conjunction with labeled samples. We employ efficient spectral decomposition algorithms (on matrices and tensors) for extracting discriminative components. The advantage of employing tensor-valued features is that we can extract richer discriminative information in the form of an overcomplete representations. Thus, we present a novel framework for employing generative models of the input for discriminative learning
- …