246,419 research outputs found
Volumetric Super-Resolution of Multispectral Data
Most multispectral remote sensors (e.g. QuickBird, IKONOS, and Landsat 7
ETM+) provide low-spatial high-spectral resolution multispectral (MS) or
high-spatial low-spectral resolution panchromatic (PAN) images, separately. In
order to reconstruct a high-spatial/high-spectral resolution multispectral
image volume, either the information in MS and PAN images are fused (i.e.
pansharpening) or super-resolution reconstruction (SRR) is used with only MS
images captured on different dates. Existing methods do not utilize temporal
information of MS and high spatial resolution of PAN images together to improve
the resolution. In this paper, we propose a multiframe SRR algorithm using
pansharpened MS images, taking advantage of both temporal and spatial
information available in multispectral imagery, in order to exceed spatial
resolution of given PAN images. We first apply pansharpening to a set of
multispectral images and their corresponding PAN images captured on different
dates. Then, we use the pansharpened multispectral images as input to the
proposed wavelet-based multiframe SRR method to yield full volumetric SRR. The
proposed SRR method is obtained by deriving the subband relations between
multitemporal MS volumes. We demonstrate the results on Landsat 7 ETM+ images
comparing our method to conventional techniques.Comment: arXiv admin note: text overlap with arXiv:1705.0125
Single Image Action Recognition by Predicting Space-Time Saliency
We propose a novel approach based on deep Convolutional Neural Networks (CNN)
to recognize human actions in still images by predicting the future motion, and
detecting the shape and location of the salient parts of the image. We make the
following major contributions to this important area of research: (i) We use
the predicted future motion in the static image (Walker et al., 2015) as a
means of compensating for the missing temporal information, while using the
saliency map to represent the the spatial information in the form of location
and shape of what is predicted as significant. (ii) We cast action
classification in static images as a domain adaptation problem by transfer
learning. We first map the input static image to a new domain that we refer to
as the Predicted Optical Flow-Saliency Map domain (POF-SM), and then fine-tune
the layers of a deep CNN model trained on classifying the ImageNet dataset to
perform action classification in the POF-SM domain. (iii) We tested our method
on the popular Willow dataset. But unlike existing methods, we also tested on a
more realistic and challenging dataset of over 2M still images that we
collected and labeled by taking random frames from the UCF-101 video dataset.
We call our dataset the UCF Still Image dataset or UCFSI-101 in short. Our
results outperform the state of the art
View-Invariant Recognition of Action Style Self-Dissimilarity
Self-similarity was recently introduced as a measure of inter-class
congruence for classification of actions. Herein, we investigate the dual
problem of intra-class dissimilarity for classification of action styles. We
introduce self-dissimilarity matrices that discriminate between same actions
performed by different subjects regardless of viewing direction and camera
parameters. We investigate two frameworks using these invariant style
dissimilarity measures based on Principal Component Analysis (PCA) and Fisher
Discriminant Analysis (FDA). Extensive experiments performed on IXMAS dataset
indicate remarkably good discriminant characteristics for the proposed
invariant measures for gender recognition from video data
Learning Semantics for Image Annotation
Image search and retrieval engines rely heavily on textual annotation in
order to match word queries to a set of candidate images. A system that can
automatically annotate images with meaningful text can be highly beneficial for
such engines. Currently, the approaches to develop such systems try to
establish relationships between keywords and visual features of images. In this
paper, We make three main contributions to this area: (i) We transform this
problem from the low-level keyword space to the high-level semantics space that
we refer to as the "{\em image theme}", (ii) Instead of treating each possible
keyword independently, we use latent Dirichlet allocation to learn image themes
from the associated texts in a training phase. Images are then annotated with
image themes rather than keywords, using a modified continuous relevance model,
which takes into account the spatial coherence and the visual continuity among
images of common theme. (iii) To achieve more coherent annotations among images
of common theme, we have integrated ConceptNet in learning the semantics of
images, and hence augment image descriptions beyond annotations provided by
humans. Images are thus further annotated by a few most significant words of
the prominent image theme. Our extensive experiments show that a coherent
theme-based image annotation using high-level semantics results in improved
precision and recall as compared with equivalent classical keyword annotation
systems
Image Annotation using Multi-Layer Sparse Coding
Automatic annotation of images with descriptive words is a challenging
problem with vast applications in the areas of image search and retrieval. This
problem can be viewed as a label-assignment problem by a classifier dealing
with a very large set of labels, i.e., the vocabulary set. We propose a novel
annotation method that employs two layers of sparse coding and performs
coarse-to-fine labeling. Themes extracted from the training data are treated as
coarse labels. Each theme is a set of training images that share a common
subject in their visual and textual contents. Our system extracts coarse labels
for training and test images without requiring any prior knowledge. Vocabulary
words are the fine labels to be associated with images. Most of the annotation
methods achieve low recall due to the large number of available fine labels,
i.e., vocabulary words. These systems also tend to achieve high precision for
highly frequent words only while relatively rare words are more important for
search and retrieval purposes. Our system not only outperforms various
previously proposed annotation systems, but also achieves symmetric response in
terms of precision and recall. Our system scores and maintains high precision
for words with a wide range of frequencies. Such behavior is achieved by
intelligently reducing the number of available fine labels or words for each
image based on coarse labels assigned to it
Non-Linear Phase-Shifting of Haar Wavelets for Run-Time All-Frequency Lighting
This paper focuses on real-time all-frequency image-based rendering using an
innovative solution for run-time computation of light transport. The approach
is based on new results derived for non-linear phase shifting in the Haar
wavelet domain. Although image-based methods for real-time rendering of dynamic
glossy objects have been proposed, they do not truly scale to all possible
frequencies and high sampling rates without trading storage, glossiness, or
computational time, while varying both lighting and viewpoint. This is due to
the fact that current approaches are limited to precomputed radiance transfer
(PRT), which is prohibitively expensive in terms of memory requirements and
real-time rendering when both varying light and viewpoint changes are required
together with high sampling rates for high frequency lighting of glossy
material. On the other hand, current methods cannot handle object rotation,
which is one of the paramount issues for all PRT methods using wavelets. This
latter problem arises because the precomputed data are defined in a global
coordinate system and encoded in the wavelet domain, while the object is
rotated in a local coordinate system. At the root of all the above problems is
the lack of efficient run-time solution to the nontrivial problem of rotating
wavelets (a non-linear phase-shift), which we solve in this paper
Video Object Segmentation using Supervoxel-Based Gerrymandering
Pixels operate locally. Superpixels have some potential to collect
information across many pixels; supervoxels have more potential by implicitly
operating across time. In this paper, we explore this well established notion
thoroughly analyzing how supervoxels can be used in place of and in conjunction
with other means of aggregating information across space-time. Focusing on the
problem of strictly unsupervised video object segmentation, we devise a method
called supervoxel gerrymandering that links masks of foregroundness and
backgroundness via local and non-local consensus measures. We pose and answer a
series of critical questions about the ability of supervoxels to adequately
sway local voting; the questions regard type and scale of supervoxels as well
as local versus non-local consensus, and the questions are posed in a general
way so as to impact the broader knowledge of the use of supervoxels in video
understanding. We work with the DAVIS dataset and find that our analysis yields
an unsupervised method that outperforms all other known unsupervised methods
and even many supervised ones
Vision-based Human Gender Recognition: A Survey
Gender is an important demographic attribute of people. This paper provides a
survey of human gender recognition in computer vision. A review of approaches
exploiting information from face and whole body (either from a still image or
gait sequence) is presented. We highlight the challenges faced and survey the
representative methods of these approaches. Based on the results, good
performance have been achieved for datasets captured under controlled
environments, but there is still much work that can be done to improve the
robustness of gender recognition under real-life environments.Comment: 30 page
An Invariant Model of the Significance of Different Body Parts in Recognizing Different Actions
In this paper, we show that different body parts do not play equally
important roles in recognizing a human action in video data. We investigate to
what extent a body part plays a role in recognition of different actions and
hence propose a generic method of assigning weights to different body points.
The approach is inspired by the strong evidence in the applied perception
community that humans perform recognition in a foveated manner, that is they
recognize events or objects by only focusing on visually significant aspects.
An important contribution of our method is that the computation of the weights
assigned to body parts is invariant to viewing directions and camera parameters
in the input data. We have performed extensive experiments to validate the
proposed approach and demonstrate its significance. In particular, results show
that considerable improvement in performance is gained by taking into account
the relative importance of different body parts as defined by our approach.Comment: arXiv admin note: substantial text overlap with arXiv:1705.04641,
arXiv:1705.05741, arXiv:1705.0443
Appearance Descriptors for Person Re-identification: a Comprehensive Review
In video-surveillance, person re-identification is the task of recognising
whether an individual has already been observed over a network of cameras.
Typically, this is achieved by exploiting the clothing appearance, as classical
biometric traits like the face are impractical in real-world video surveillance
scenarios. Clothing appearance is represented by means of low-level
\textit{local} and/or \textit{global} features of the image, usually extracted
according to some part-based body model to treat different body parts (e.g.
torso and legs) independently. This paper provides a comprehensive review of
current approaches to build appearance descriptors for person
re-identification. The most relevant techniques are described in detail, and
categorised according to the body models and features used. The aim of this
work is to provide a structured body of knowledge and a starting point for
researchers willing to conduct novel investigations on this challenging topic
- …