1,063 research outputs found
Volumetric Super-Resolution of Multispectral Data
Most multispectral remote sensors (e.g. QuickBird, IKONOS, and Landsat 7
ETM+) provide low-spatial high-spectral resolution multispectral (MS) or
high-spatial low-spectral resolution panchromatic (PAN) images, separately. In
order to reconstruct a high-spatial/high-spectral resolution multispectral
image volume, either the information in MS and PAN images are fused (i.e.
pansharpening) or super-resolution reconstruction (SRR) is used with only MS
images captured on different dates. Existing methods do not utilize temporal
information of MS and high spatial resolution of PAN images together to improve
the resolution. In this paper, we propose a multiframe SRR algorithm using
pansharpened MS images, taking advantage of both temporal and spatial
information available in multispectral imagery, in order to exceed spatial
resolution of given PAN images. We first apply pansharpening to a set of
multispectral images and their corresponding PAN images captured on different
dates. Then, we use the pansharpened multispectral images as input to the
proposed wavelet-based multiframe SRR method to yield full volumetric SRR. The
proposed SRR method is obtained by deriving the subband relations between
multitemporal MS volumes. We demonstrate the results on Landsat 7 ETM+ images
comparing our method to conventional techniques.Comment: arXiv admin note: text overlap with arXiv:1705.0125
Restoration of Atmospheric Turbulence-distorted Images via RPCA and Quasiconformal Maps
We address the problem of restoring a high-quality image from an observed
image sequence strongly distorted by atmospheric turbulence. A novel algorithm
is proposed in this paper to reduce geometric distortion as well as
space-and-time-varying blur due to strong turbulence. By considering a suitable
energy functional, our algorithm first obtains a sharp reference image and a
subsampled image sequence containing sharp and mildly distorted image frames
with respect to the reference image. The subsampled image sequence is then
stabilized by applying the Robust Principal Component Analysis (RPCA) on the
deformation fields between image frames and warping the image frames by a
quasiconformal map associated with the low-rank part of the deformation matrix.
After image frames are registered to the reference image, the low-rank part of
them are deblurred via a blind deconvolution, and the deblurred frames are then
fused with the enhanced sparse part. Experiments have been carried out on both
synthetic and real turbulence-distorted video. Results demonstrate that our
method is effective in alleviating distortions and blur, restoring image
details and enhancing visual quality.Comment: 21 pages, 24 figure
Distortion-driven Turbulence Effect Removal using Variational Model
It remains a challenge to simultaneously remove geometric distortion and
space-time-varying blur in frames captured through a turbulent atmospheric
medium. To solve, or at least reduce these effects, we propose a new scheme to
recover a latent image from observed frames by integrating a new variational
model and distortion-driven spatial-temporal kernel regression. The proposed
scheme first constructs a high-quality reference image from the observed frames
using low-rank decomposition. Then, to generate an improved registered
sequence, the reference image is iteratively optimized using a variational
model containing a new spatial-temporal regularization. The proposed fast
algorithm efficiently solves this model without the use of partial differential
equations (PDEs). Next, to reduce blur variation, distortion-driven
spatial-temporal kernel regression is carried out to fuse the registered
sequence into one image by introducing the concept of the near-stationary
patch. Applying a blind deconvolution algorithm to the fused image produces the
final output. Extensive experimental testing shows, both qualitatively and
quantitatively, that the proposed method can effectively alleviate distortion
and blur and recover details of the original scene compared to state-of-the-art
methods.Comment: 28 pages, 15 figure
Variational models for joint subsampling and reconstruction of turbulence-degraded images
Turbulence-degraded image frames are distorted by both turbulent deformations
and space-time-varying blurs. To suppress these effects, we propose a
multi-frame reconstruction scheme to recover a latent image from the observed
image sequence. Recent approaches are commonly based on registering each frame
to a reference image, by which geometric turbulent deformations can be
estimated and a sharp image can be restored. A major challenge is that a fine
reference image is usually unavailable, as every turbulence-degraded frame is
distorted. A high-quality reference image is crucial for the accurate
estimation of geometric deformations and fusion of frames. Besides, it is
unlikely that all frames from the image sequence are useful, and thus frame
selection is necessary and highly beneficial. In this work, we propose a
variational model for joint subsampling of frames and extraction of a clear
image. A fine image and a suitable choice of subsample are simultaneously
obtained by iteratively reducing an energy functional. The energy consists of a
fidelity term measuring the discrepancy between the extracted image and the
subsampled frames, as well as regularization terms on the extracted image and
the subsample. Different choices of fidelity and regularization terms are
explored. By carefully selecting suitable frames and extracting the image, the
quality of the reconstructed image can be significantly improved. Extensive
experiments have been carried out, which demonstrate the efficacy of our
proposed model. In addition, the extracted subsamples and images can be put in
existing algorithms to produce improved results.Comment: arXiv admin note: text overlap with arXiv:1704.0314
Decomposition into Low-rank plus Additive Matrices for Background/Foreground Separation: A Review for a Comparative Evaluation with a Large-Scale Dataset
Recent research on problem formulations based on decomposition into low-rank
plus sparse matrices shows a suitable framework to separate moving objects from
the background. The most representative problem formulation is the Robust
Principal Component Analysis (RPCA) solved via Principal Component Pursuit
(PCP) which decomposes a data matrix in a low-rank matrix and a sparse matrix.
However, similar robust implicit or explicit decompositions can be made in the
following problem formulations: Robust Non-negative Matrix Factorization
(RNMF), Robust Matrix Completion (RMC), Robust Subspace Recovery (RSR), Robust
Subspace Tracking (RST) and Robust Low-Rank Minimization (RLRM). The main goal
of these similar problem formulations is to obtain explicitly or implicitly a
decomposition into low-rank matrix plus additive matrices. In this context,
this work aims to initiate a rigorous and comprehensive review of the similar
problem formulations in robust subspace learning and tracking based on
decomposition into low-rank plus additive matrices for testing and ranking
existing algorithms for background/foreground separation. For this, we first
provide a preliminary review of the recent developments in the different
problem formulations which allows us to define a unified view that we called
Decomposition into Low-rank plus Additive Matrices (DLAM). Then, we examine
carefully each method in each robust subspace learning/tracking frameworks with
their decomposition, their loss functions, their optimization problem and their
solvers. Furthermore, we investigate if incremental algorithms and real-time
implementations can be achieved for background/foreground separation. Finally,
experimental results on a large-scale dataset called Background Models
Challenge (BMC 2012) show the comparative performance of 32 different robust
subspace learning/tracking methods.Comment: 121 pages, 5 figures, submitted to Computer Science Review. arXiv
admin note: text overlap with arXiv:1312.7167, arXiv:1109.6297,
arXiv:1207.3438, arXiv:1105.2126, arXiv:1404.7592, arXiv:1210.0805,
arXiv:1403.8067 by other authors, Computer Science Review, November 201
An Invariant Model of the Significance of Different Body Parts in Recognizing Different Actions
In this paper, we show that different body parts do not play equally
important roles in recognizing a human action in video data. We investigate to
what extent a body part plays a role in recognition of different actions and
hence propose a generic method of assigning weights to different body points.
The approach is inspired by the strong evidence in the applied perception
community that humans perform recognition in a foveated manner, that is they
recognize events or objects by only focusing on visually significant aspects.
An important contribution of our method is that the computation of the weights
assigned to body parts is invariant to viewing directions and camera parameters
in the input data. We have performed extensive experiments to validate the
proposed approach and demonstrate its significance. In particular, results show
that considerable improvement in performance is gained by taking into account
the relative importance of different body parts as defined by our approach.Comment: arXiv admin note: substantial text overlap with arXiv:1705.04641,
arXiv:1705.05741, arXiv:1705.0443
Robust Online Matrix Factorization for Dynamic Background Subtraction
We propose an effective online background subtraction method, which can be
robustly applied to practical videos that have variations in both foreground
and background. Different from previous methods which often model the
foreground as Gaussian or Laplacian distributions, we model the foreground for
each frame with a specific mixture of Gaussians (MoG) distribution, which is
updated online frame by frame. Particularly, our MoG model in each frame is
regularized by the learned foreground/background knowledge in previous frames.
This makes our online MoG model highly robust, stable and adaptive to practical
foreground and background variations. The proposed model can be formulated as a
concise probabilistic MAP model, which can be readily solved by EM algorithm.
We further embed an affine transformation operator into the proposed model,
which can be automatically adjusted to fit a wide range of video background
transformations and make the method more robust to camera movements. With using
the sub-sampling technique, the proposed method can be accelerated to execute
more than 250 frames per second on average, meeting the requirement of
real-time background subtraction for practical video processing tasks. The
superiority of the proposed method is substantiated by extensive experiments
implemented on synthetic and real videos, as compared with state-of-the-art
online and offline background subtraction methods.Comment: 14 pages, 13 figure
Making a long story short: A Multi-Importance fast-forwarding egocentric videos with the emphasis on relevant objects
The emergence of low-cost high-quality personal wearable cameras combined
with the increasing storage capacity of video-sharing websites have evoked a
growing interest in first-person videos, since most videos are composed of
long-running unedited streams which are usually tedious and unpleasant to
watch. State-of-the-art semantic fast-forward methods currently face the
challenge of providing an adequate balance between smoothness in visual flow
and the emphasis on the relevant parts. In this work, we present the
Multi-Importance Fast-Forward (MIFF), a fully automatic methodology to
fast-forward egocentric videos facing these challenges. The dilemma of defining
what is the semantic information of a video is addressed by a learning process
based on the preferences of the user. Results show that the proposed method
keeps over times more semantic content than the state-of-the-art
fast-forward. Finally, we discuss the need of a particular video stabilization
technique for fast-forward egocentric videos.Comment: Accepted to publication in the Journal of Visual Communication and
Image Representation (JVCI) 2018. Project website:
https://www.verlab.dcc.ufmg.br/semantic-hyperlaps
A New Low-Rank Tensor Model for Video Completion
In this paper, we propose a new low-rank tensor model based on the circulant
algebra, namely, twist tensor nuclear norm or t-TNN for short. The twist tensor
denotes a 3-way tensor representation to laterally store 2D data slices in
order. On one hand, t-TNN convexly relaxes the tensor multi-rank of the twist
tensor in the Fourier domain, which allows an efficient computation using FFT.
On the other, t-TNN is equal to the nuclear norm of block circulant
matricization of the twist tensor in the original domain, which extends the
traditional matrix nuclear norm in a block circulant way. We test the t-TNN
model on a video completion application that aims to fill missing values and
the experiment results validate its effectiveness, especially when dealing with
video recorded by a non-stationary panning camera. The block circulant
matricization of the twist tensor can be transformed into a circulant block
representation with nuclear norm invariance. This representation, after
transformation, exploits the horizontal translation relationship between the
frames in a video, and endows the t-TNN model with a more powerful ability to
reconstruct panning videos than the existing state-of-the-art low-rank models.Comment: 8 pages, 11 figures, 1 tabl
Single Image Action Recognition by Predicting Space-Time Saliency
We propose a novel approach based on deep Convolutional Neural Networks (CNN)
to recognize human actions in still images by predicting the future motion, and
detecting the shape and location of the salient parts of the image. We make the
following major contributions to this important area of research: (i) We use
the predicted future motion in the static image (Walker et al., 2015) as a
means of compensating for the missing temporal information, while using the
saliency map to represent the the spatial information in the form of location
and shape of what is predicted as significant. (ii) We cast action
classification in static images as a domain adaptation problem by transfer
learning. We first map the input static image to a new domain that we refer to
as the Predicted Optical Flow-Saliency Map domain (POF-SM), and then fine-tune
the layers of a deep CNN model trained on classifying the ImageNet dataset to
perform action classification in the POF-SM domain. (iii) We tested our method
on the popular Willow dataset. But unlike existing methods, we also tested on a
more realistic and challenging dataset of over 2M still images that we
collected and labeled by taking random frames from the UCF-101 video dataset.
We call our dataset the UCF Still Image dataset or UCFSI-101 in short. Our
results outperform the state of the art
- …