28 research outputs found
Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models
Deep learning has shown state-of-art classification performance on datasets
such as ImageNet, which contain a single object in each image. However,
multi-object classification is far more challenging. We present a unified
framework which leverages the strengths of multiple machine learning methods,
viz deep learning, probabilistic models and kernel methods to obtain
state-of-art performance on Microsoft COCO, consisting of non-iconic images. We
incorporate contextual information in natural images through a conditional
latent tree probabilistic model (CLTM), where the object co-occurrences are
conditioned on the extracted fc7 features from pre-trained Imagenet CNN as
input. We learn the CLTM tree structure using conditional pairwise
probabilities for object co-occurrences, estimated through kernel methods, and
we learn its node and edge potentials by training a new 3-layer neural network,
which takes fc7 features as input. Object classification is carried out via
inference on the learnt conditional tree model, and we obtain significant gain
in precision-recall and F-measures on MS-COCO, especially for difficult object
categories. Moreover, the latent variables in the CLTM capture scene
information: the images with top activations for a latent node have common
themes such as being a grasslands or a food scene, and on on. In addition, we
show that a simple k-means clustering of the inferred latent nodes alone
significantly improves scene classification performance on the MIT-Indoor
dataset, without the need for any retraining, and without using scene labels
during training. Thus, we present a unified framework for multi-object
classification and unsupervised scene understanding
HD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale Visual Recognition
In image classification, visual separability between different object
categories is highly uneven, and some categories are more difficult to
distinguish than others. Such difficult categories demand more dedicated
classifiers. However, existing deep convolutional neural networks (CNN) are
trained as flat N-way classifiers, and few efforts have been made to leverage
the hierarchical structure of categories. In this paper, we introduce
hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category
hierarchy. An HD-CNN separates easy classes using a coarse category classifier
while distinguishing difficult classes using fine category classifiers. During
HD-CNN training, component-wise pretraining is followed by global finetuning
with a multinomial logistic loss regularized by a coarse category consistency
term. In addition, conditional executions of fine category classifiers and
layer parameter compression make HD-CNNs scalable for large-scale visual
recognition. We achieve state-of-the-art results on both CIFAR100 and
large-scale ImageNet 1000-class benchmark datasets. In our experiments, we
build up three different HD-CNNs and they lower the top-1 error of the standard
CNNs by 2.65%, 3.1% and 1.1%, respectively.Comment: Add new results on ImageNet using VGG-16-layer building block ne