34,457 research outputs found
Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models
Deep learning has shown state-of-art classification performance on datasets
such as ImageNet, which contain a single object in each image. However,
multi-object classification is far more challenging. We present a unified
framework which leverages the strengths of multiple machine learning methods,
viz deep learning, probabilistic models and kernel methods to obtain
state-of-art performance on Microsoft COCO, consisting of non-iconic images. We
incorporate contextual information in natural images through a conditional
latent tree probabilistic model (CLTM), where the object co-occurrences are
conditioned on the extracted fc7 features from pre-trained Imagenet CNN as
input. We learn the CLTM tree structure using conditional pairwise
probabilities for object co-occurrences, estimated through kernel methods, and
we learn its node and edge potentials by training a new 3-layer neural network,
which takes fc7 features as input. Object classification is carried out via
inference on the learnt conditional tree model, and we obtain significant gain
in precision-recall and F-measures on MS-COCO, especially for difficult object
categories. Moreover, the latent variables in the CLTM capture scene
information: the images with top activations for a latent node have common
themes such as being a grasslands or a food scene, and on on. In addition, we
show that a simple k-means clustering of the inferred latent nodes alone
significantly improves scene classification performance on the MIT-Indoor
dataset, without the need for any retraining, and without using scene labels
during training. Thus, we present a unified framework for multi-object
classification and unsupervised scene understanding
DeepOtsu: Document Enhancement and Binarization using Iterative Deep Learning
This paper presents a novel iterative deep learning framework and apply it
for document enhancement and binarization. Unlike the traditional methods which
predict the binary label of each pixel on the input image, we train the neural
network to learn the degradations in document images and produce the uniform
images of the degraded input images, which allows the network to refine the
output iteratively. Two different iterative methods have been studied in this
paper: recurrent refinement (RR) which uses the same trained neural network in
each iteration for document enhancement and stacked refinement (SR) which uses
a stack of different neural networks for iterative output refinement. Given the
learned uniform and enhanced image, the binarization map can be easy to obtain
by a global or local threshold. The experimental results on several public
benchmark data sets show that our proposed methods provide a new clean version
of the degraded image which is suitable for visualization and promising results
of binarization using the global Otsu's threshold based on the enhanced images
learned iteratively by the neural network.Comment: Accepted by Pattern Recognitio
- …