936 research outputs found
A Reverse Hierarchy Model for Predicting Eye Fixations
A number of psychological and physiological evidences suggest that early
visual attention works in a coarse-to-fine way, which lays a basis for the
reverse hierarchy theory (RHT). This theory states that attention propagates
from the top level of the visual hierarchy that processes gist and abstract
information of input, to the bottom level that processes local details.
Inspired by the theory, we develop a computational model for saliency detection
in images. First, the original image is downsampled to different scales to
constitute a pyramid. Then, saliency on each layer is obtained by image
super-resolution reconstruction from the layer above, which is defined as
unpredictability from this coarse-to-fine reconstruction. Finally, saliency on
each layer of the pyramid is fused into stochastic fixations through a
probabilistic model, where attention initiates from the top layer and
propagates downward through the pyramid. Extensive experiments on two standard
eye-tracking datasets show that the proposed method can achieve competitive
results with state-of-the-art models.Comment: CVPR 2014, 27th IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). CVPR 201
Data Driven Approaches for Image & Video Understanding: from Traditional to Zero-shot Supervised Learning
In the present age of advanced computer vision, the necessity of (user-annotated) data is a key factor in image & video understanding. Recent success of deep learning on large scale data has only acted as a catalyst. There are certain problems that exist in this regard: 1) scarcity of (annotated) data, 2) need of expensive manual annotation, 3) problem of change in domain, 4) knowledge base not exhaustive.
To make efficient learning systems, one has to be prepared to deal with such diverse set of problems.
In terms of data availability, extensive manual annotation can be beneficial in obtaining category specific knowledge. Even then, learning efficient representation for the related task is challenging and requires special attention. On the other hand, when labelled data is scarce, learning category specific representation itself becomes challenging. In this work, I investigate data driven approaches that cater to traditional supervised learning setup as well as an extreme case of data scarcity where no
data from test classes are available during training, known as zero-shot learning. First, I look into supervised learning setup with ample annotations and propose efficient dictionary learning technique for better learning of data representation for the task of action classification in images & videos. Then I propose robust mid-level feature representations for action videos that are equally effective in traditional supervised learning as well as zero-shot learning. Finally, I come up with novel approach that
cater to zero-shot learning specifically. Thorough discussions followed by experimental validations establish the worth of these novel techniques in solving computer vision related tasks under varying data-dependent scenarios
Patch-based semantic labelling of images.
PhDThe work presented in this thesis is focused at associating a semantics
to the content of an image, linking the content to high level
semantic categories. The process can take place at two levels: either
at image level, towards image categorisation, or at pixel level, in se-
mantic segmentation or semantic labelling. To this end, an analysis
framework is proposed, and the different steps of part (or patch) extraction,
description and probabilistic modelling are detailed. Parts of
different nature are used, and one of the contributions is a method to
complement information associated to them. Context for parts has to
be considered at different scales. Short range pixel dependences are accounted
by associating pixels to larger patches. A Conditional Random
Field, that is, a probabilistic discriminative graphical model, is used
to model medium range dependences between neighbouring patches.
Another contribution is an efficient method to consider rich neighbourhoods
without having loops in the inference graph. To this end, weak
neighbours are introduced, that is, neighbours whose label probability
distribution is pre-estimated rather than mutable during the inference.
Longer range dependences, that tend to make the inference problem
intractable, are addressed as well. A novel descriptor based on local
histograms of visual words has been proposed, meant to both complement
the feature descriptor of the patches and augment the context
awareness in the patch labelling process. Finally, an alternative approach
to consider multiple scales in a hierarchical framework based
on image pyramids is proposed. An image pyramid is a compositional
representation of the image based on hierarchical clustering. All the
presented contributions are extensively detailed throughout the thesis,
and experimental results performed on publicly available datasets are
reported to assess their validity. A critical comparison with the state
of the art in this research area is also presented, and the advantage in
adopting the proposed improvements are clearly highlighted
Salient Object Detection Techniques in Computer Vision-A Survey.
Detection and localization of regions of images that attract immediate human visual attention is currently an intensive area of research in computer vision. The capability of automatic identification and segmentation of such salient image regions has immediate consequences for applications in the field of computer vision, computer graphics, and multimedia. A large number of salient object detection (SOD) methods have been devised to effectively mimic the capability of the human visual system to detect the salient regions in images. These methods can be broadly categorized into two categories based on their feature engineering mechanism: conventional or deep learning-based. In this survey, most of the influential advances in image-based SOD from both conventional as well as deep learning-based categories have been reviewed in detail. Relevant saliency modeling trends with key issues, core techniques, and the scope for future research work have been discussed in the context of difficulties often faced in salient object detection. Results are presented for various challenging cases for some large-scale public datasets. Different metrics considered for assessment of the performance of state-of-the-art salient object detection models are also covered. Some future directions for SOD are presented towards end
- …