19,197 research outputs found
A Discriminative Representation of Convolutional Features for Indoor Scene Recognition
Indoor scene recognition is a multi-faceted and challenging problem due to
the diverse intra-class variations and the confusing inter-class similarities.
This paper presents a novel approach which exploits rich mid-level
convolutional features to categorize indoor scenes. Traditionally used
convolutional features preserve the global spatial structure, which is a
desirable property for general object recognition. However, we argue that this
structuredness is not much helpful when we have large variations in scene
layouts, e.g., in indoor scenes. We propose to transform the structured
convolutional activations to another highly discriminative feature space. The
representation in the transformed space not only incorporates the
discriminative aspects of the target dataset, but it also encodes the features
in terms of the general object categories that are present in indoor scenes. To
this end, we introduce a new large-scale dataset of 1300 object categories
which are commonly present in indoor scenes. Our proposed approach achieves a
significant performance boost over previous state of the art approaches on five
major scene classification datasets
Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification
Recent work on scene classification still makes use of generic CNN features
in a rudimentary manner. In this ICCV 2015 paper, we present a novel pipeline
built upon deep CNN features to harvest discriminative visual objects and parts
for scene classification. We first use a region proposal technique to generate
a set of high-quality patches potentially containing objects, and apply a
pre-trained CNN to extract generic deep features from these patches. Then we
perform both unsupervised and weakly supervised learning to screen these
patches and discover discriminative ones representing category-specific objects
and parts. We further apply discriminative clustering enhanced with local CNN
fine-tuning to aggregate similar objects and parts into groups, called meta
objects. A scene image representation is constructed by pooling the feature
response maps of all the learned meta objects at multiple spatial scales. We
have confirmed that the scene image representation obtained using this new
pipeline is capable of delivering state-of-the-art performance on two popular
scene benchmark datasets, MIT Indoor 67~\cite{MITIndoor67} and
Sun397~\cite{Sun397}Comment: To Appear in ICCV 201
Hybrid image representation methods for automatic image annotation: a survey
In most automatic image annotation systems, images are represented with low level features using either global
methods or local methods. In global methods, the entire image is used as a unit. Local methods divide images into blocks where fixed-size sub-image blocks are adopted as sub-units; or into regions by using segmented regions as sub-units in images. In contrast to typical automatic image annotation methods that use either global or local features exclusively, several recent methods have considered incorporating the two kinds of information, and believe that the combination of the two levels of features is
beneficial in annotating images. In this paper, we provide a
survey on automatic image annotation techniques according to
one aspect: feature extraction, and, in order to complement
existing surveys in literature, we focus on the emerging image annotation methods: hybrid methods that combine both global and local features for image representation
Im2Pano3D: Extrapolating 360 Structure and Semantics Beyond the Field of View
We present Im2Pano3D, a convolutional neural network that generates a dense
prediction of 3D structure and a probability distribution of semantic labels
for a full 360 panoramic view of an indoor scene when given only a partial
observation (<= 50%) in the form of an RGB-D image. To make this possible,
Im2Pano3D leverages strong contextual priors learned from large-scale synthetic
and real-world indoor scenes. To ease the prediction of 3D structure, we
propose to parameterize 3D surfaces with their plane equations and train the
model to predict these parameters directly. To provide meaningful training
supervision, we use multiple loss functions that consider both pixel level
accuracy and global context consistency. Experiments demon- strate that
Im2Pano3D is able to predict the semantics and 3D structure of the unobserved
scene with more than 56% pixel accuracy and less than 0.52m average distance
error, which is significantly better than alternative approaches.Comment: Video summary: https://youtu.be/Au3GmktK-S
Conceptual spatial representations for indoor mobile robots
We present an approach for creating conceptual representations of human-made indoor environments using mobile
robots. The concepts refer to spatial and functional properties of typical indoor environments. Following ďŹndings
in cognitive psychology, our model is composed of layers representing maps at diďŹerent levels of abstraction. The
complete system is integrated in a mobile robot endowed with laser and vision sensors for place and object recognition.
The system also incorporates a linguistic framework that actively supports the map acquisition process, and which
is used for situated dialogue. Finally, we discuss the capabilities of the integrated system
- âŚ