31,848 research outputs found
Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models
Deep learning has shown state-of-art classification performance on datasets
such as ImageNet, which contain a single object in each image. However,
multi-object classification is far more challenging. We present a unified
framework which leverages the strengths of multiple machine learning methods,
viz deep learning, probabilistic models and kernel methods to obtain
state-of-art performance on Microsoft COCO, consisting of non-iconic images. We
incorporate contextual information in natural images through a conditional
latent tree probabilistic model (CLTM), where the object co-occurrences are
conditioned on the extracted fc7 features from pre-trained Imagenet CNN as
input. We learn the CLTM tree structure using conditional pairwise
probabilities for object co-occurrences, estimated through kernel methods, and
we learn its node and edge potentials by training a new 3-layer neural network,
which takes fc7 features as input. Object classification is carried out via
inference on the learnt conditional tree model, and we obtain significant gain
in precision-recall and F-measures on MS-COCO, especially for difficult object
categories. Moreover, the latent variables in the CLTM capture scene
information: the images with top activations for a latent node have common
themes such as being a grasslands or a food scene, and on on. In addition, we
show that a simple k-means clustering of the inferred latent nodes alone
significantly improves scene classification performance on the MIT-Indoor
dataset, without the need for any retraining, and without using scene labels
during training. Thus, we present a unified framework for multi-object
classification and unsupervised scene understanding
Thoracic Disease Identification and Localization with Limited Supervision
Accurate identification and localization of abnormalities from radiology
images play an integral part in clinical diagnosis and treatment planning.
Building a highly accurate prediction model for these tasks usually requires a
large number of images manually annotated with labels and finding sites of
abnormalities. In reality, however, such annotated data are expensive to
acquire, especially the ones with location annotations. We need methods that
can work well with only a small amount of location annotations. To address this
challenge, we present a unified approach that simultaneously performs disease
identification and localization through the same underlying model for all
images. We demonstrate that our approach can effectively leverage both class
information as well as limited location annotation, and significantly
outperforms the comparative reference baseline in both classification and
localization tasks.Comment: Conference on Computer Vision and Pattern Recognition 2018 (CVPR
2018). V1: CVPR submission; V2: +supplementary; V3: CVPR camera-ready; V4:
correction, update reference baseline results according to their latest post;
V5: minor correction; V6: Identification results using NIH data splits and
various image model
- …