307 research outputs found
Visualizing and Understanding Convolutional Networks
Large Convolutional Network models have recently demonstrated impressive
classification performance on the ImageNet benchmark. However there is no clear
understanding of why they perform so well, or how they might be improved. In
this paper we address both issues. We introduce a novel visualization technique
that gives insight into the function of intermediate feature layers and the
operation of the classifier. We also perform an ablation study to discover the
performance contribution from different model layers. This enables us to find
model architectures that outperform Krizhevsky \etal on the ImageNet
classification benchmark. We show our ImageNet model generalizes well to other
datasets: when the softmax classifier is retrained, it convincingly beats the
current state-of-the-art results on Caltech-101 and Caltech-256 datasets
Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks
We introduce a simple semi-supervised learning approach for images based on
in-painting using an adversarial loss. Images with random patches removed are
presented to a generator whose task is to fill in the hole, based on the
surrounding pixels. The in-painted images are then presented to a discriminator
network that judges if they are real (unaltered training images) or not. This
task acts as a regularizer for standard supervised training of the
discriminator. Using our approach we are able to directly train large VGG-style
networks in a semi-supervised fashion. We evaluate on STL-10 and PASCAL
datasets, where our approach obtains performance comparable or superior to
existing methods
Deep Poselets for Human Detection
We address the problem of detecting people in natural scenes using a part
approach based on poselets. We propose a bootstrapping method that allows us to
collect millions of weakly labeled examples for each poselet type. We use these
examples to train a Convolutional Neural Net to discriminate different poselet
types and separate them from the background class. We then use the trained CNN
as a way to represent poselet patches with a Pose Discriminative Feature (PDF)
vector -- a compact 256-dimensional feature vector that is effective at
discriminating pose from appearance. We train the poselet model on top of PDF
features and combine them with object-level CNNs for detection and bounding box
prediction. The resulting model leads to state-of-the-art performance for human
detection on the PASCAL datasets
One-shot learning of object categories
Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by maximum likelihood (ML) and maximum a posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully
- …