15,964 research outputs found
Enhanced Quadratic Video Interpolation
With the prosperity of digital video industry, video frame interpolation has
arisen continuous attention in computer vision community and become a new
upsurge in industry. Many learning-based methods have been proposed and
achieved progressive results. Among them, a recent algorithm named quadratic
video interpolation (QVI) achieves appealing performance. It exploits
higher-order motion information (e.g. acceleration) and successfully models the
estimation of interpolated flow. However, its produced intermediate frames
still contain some unsatisfactory ghosting, artifacts and inaccurate motion,
especially when large and complex motion occurs. In this work, we further
improve the performance of QVI from three facets and propose an enhanced
quadratic video interpolation (EQVI) model. In particular, we adopt a rectified
quadratic flow prediction (RQFP) formulation with least squares method to
estimate the motion more accurately. Complementary with image pixel-level
blending, we introduce a residual contextual synthesis network (RCSN) to employ
contextual information in high-dimensional feature space, which could help the
model handle more complicated scenes and motion patterns. Moreover, to further
boost the performance, we devise a novel multi-scale fusion network (MS-Fusion)
which can be regarded as a learnable augmentation process. The proposed EQVI
model won the first place in the AIM2020 Video Temporal Super-Resolution
Challenge.Comment: Winning solution of AIM2020 VTSR Challenge (in conjunction with ECCV
2020
Recommended from our members
Wyner-Ziv side information generation using a higher order piecewise trajectory temporal interpolation algorithm
Distributed video coding (DVC) reverses the traditional coding paradigm of complex encoders allied with basic decoding, to one where the computational cost is largely incurred by the decoder. This enables low-cost, resource-poor sensors to be used at the transmitter in various applications including multi-sensor surveillance. A key constraint governing DVC performance is the quality of side information (SI), a coarse representation of original video frames which are not available at the decoder. Techniques to generate SI have generally been based on linear temporal interpolation, though these do not always produce satisfactory SI quality especially in sequences exhibiting asymmetric (non-linear) motion. This paper presents a higher-order piecewise trajectory temporal interpolation (HOPTTI) algorithm for SI generation that quantitatively and perceptually affords better SI quality in comparison to existing temporal interpolation-based approaches
Local Visual Microphones: Improved Sound Extraction from Silent Video
Sound waves cause small vibrations in nearby objects. A few techniques exist
in the literature that can extract sound from video. In this paper we study
local vibration patterns at different image locations. We show that different
locations in the image vibrate differently. We carefully aggregate local
vibrations and produce a sound quality that improves state-of-the-art. We show
that local vibrations could have a time delay because sound waves take time to
travel through the air. We use this phenomenon to estimate sound direction. We
also present a novel algorithm that speeds up sound extraction by two to three
orders of magnitude and reaches real-time performance in a 20KHz video.Comment: Accepted to BMVC 201
An Improved Observation Model for Super-Resolution under Affine Motion
Super-resolution (SR) techniques make use of subpixel shifts between frames
in an image sequence to yield higher-resolution images. We propose an original
observation model devoted to the case of non isometric inter-frame motion as
required, for instance, in the context of airborne imaging sensors. First, we
describe how the main observation models used in the SR literature deal with
motion, and we explain why they are not suited for non isometric motion. Then,
we propose an extension of the observation model by Elad and Feuer adapted to
affine motion. This model is based on a decomposition of affine transforms into
successive shear transforms, each one efficiently implemented by row-by-row or
column-by-column 1-D affine transforms.
We demonstrate on synthetic and real sequences that our observation model
incorporated in a SR reconstruction technique leads to better results in the
case of variable scale motions and it provides equivalent results in the case
of isometric motions
An Efficient Algorithm for Video Super-Resolution Based On a Sequential Model
In this work, we propose a novel procedure for video super-resolution, that
is the recovery of a sequence of high-resolution images from its low-resolution
counterpart. Our approach is based on a "sequential" model (i.e., each
high-resolution frame is supposed to be a displaced version of the preceding
one) and considers the use of sparsity-enforcing priors. Both the recovery of
the high-resolution images and the motion fields relating them is tackled. This
leads to a large-dimensional, non-convex and non-smooth problem. We propose an
algorithmic framework to address the latter. Our approach relies on fast
gradient evaluation methods and modern optimization techniques for
non-differentiable/non-convex problems. Unlike some other previous works, we
show that there exists a provably-convergent method with a complexity linear in
the problem dimensions. We assess the proposed optimization method on {several
video benchmarks and emphasize its good performance with respect to the state
of the art.}Comment: 37 pages, SIAM Journal on Imaging Sciences, 201
Depth Superresolution using Motion Adaptive Regularization
Spatial resolution of depth sensors is often significantly lower compared to
that of conventional optical cameras. Recent work has explored the idea of
improving the resolution of depth using higher resolution intensity as a side
information. In this paper, we demonstrate that further incorporating temporal
information in videos can significantly improve the results. In particular, we
propose a novel approach that improves depth resolution, exploiting the
space-time redundancy in the depth and intensity using motion-adaptive low-rank
regularization. Experiments confirm that the proposed approach substantially
improves the quality of the estimated high-resolution depth. Our approach can
be a first component in systems using vision techniques that rely on high
resolution depth information
Learning to Transform Time Series with a Few Examples
We describe a semi-supervised regression algorithm that learns to transform one time series into another time series given examples of the transformation. This algorithm is applied to tracking, where a time series of observations from sensors is transformed to a time series describing the pose of a target. Instead of defining and implementing such transformations for each tracking task separately, our algorithm learns a memoryless transformation of time series from a few example input-output mappings. The algorithm searches for a smooth function that fits the training examples and, when applied to the input time series, produces a time series that evolves according to assumed dynamics. The learning procedure is fast and lends itself to a closed-form solution. It is closely related to nonlinear system identification and manifold learning techniques. We demonstrate our algorithm on the tasks of tracking RFID tags from signal strength measurements, recovering the pose of rigid objects, deformable bodies, and articulated bodies from video sequences. For these tasks, this algorithm requires significantly fewer examples compared to fully-supervised regression algorithms or semi-supervised learning algorithms that do not take the dynamics of the output time series into account
- …