14,030 research outputs found
Online Video Deblurring via Dynamic Temporal Blending Network
State-of-the-art video deblurring methods are capable of removing non-uniform
blur caused by unwanted camera shake and/or object motion in dynamic scenes.
However, most existing methods are based on batch processing and thus need
access to all recorded frames, rendering them computationally demanding and
time consuming and thus limiting their practical use. In contrast, we propose
an online (sequential) video deblurring method based on a spatio-temporal
recurrent network that allows for real-time performance. In particular, we
introduce a novel architecture which extends the receptive field while keeping
the overall size of the network small to enable fast execution. In doing so,
our network is able to remove even large blur caused by strong camera shake
and/or fast moving objects. Furthermore, we propose a novel network layer that
enforces temporal consistency between consecutive frames by dynamic temporal
blending which compares and adaptively (at test time) shares features obtained
at different time steps. We show the superiority of the proposed method in an
extensive experimental evaluation.Comment: 10 page
Video Object Detection with an Aligned Spatial-Temporal Memory
We introduce Spatial-Temporal Memory Networks for video object detection. At
its core, a novel Spatial-Temporal Memory module (STMM) serves as the recurrent
computation unit to model long-term temporal appearance and motion dynamics.
The STMM's design enables full integration of pretrained backbone CNN weights,
which we find to be critical for accurate detection. Furthermore, in order to
tackle object motion in videos, we propose a novel MatchTrans module to align
the spatial-temporal memory from frame to frame. Our method produces
state-of-the-art results on the benchmark ImageNet VID dataset, and our
ablative studies clearly demonstrate the contribution of our different design
choices. We release our code and models at
http://fanyix.cs.ucdavis.edu/project/stmn/project.html
Real Time Turbulent Video Perfecting by Image Stabilization and Super-Resolution
Image and video quality in Long Range Observation Systems (LOROS) suffer from
atmospheric turbulence that causes small neighbourhoods in image frames to
chaotically move in different directions and substantially hampers visual
analysis of such image and video sequences. The paper presents a real-time
algorithm for perfecting turbulence degraded videos by means of stabilization
and resolution enhancement. The latter is achieved by exploiting the turbulent
motion. The algorithm involves generation of a reference frame and estimation,
for each incoming video frame, of a local image displacement map with respect
to the reference frame; segmentation of the displacement map into two classes:
stationary and moving objects and resolution enhancement of stationary objects,
while preserving real motion. Experiments with synthetic and real-life
sequences have shown that the enhanced videos, generated in real time, exhibit
substantially better resolution and complete stabilization for stationary
objects while retaining real motion.Comment: Submitted to The Seventh IASTED International Conference on
Visualization, Imaging, and Image Processing (VIIP 2007) August, 2007 Palma
de Mallorca, Spai
GlobalFlowNet: Video Stabilization using Deep Distilled Global Motion Estimates
Videos shot by laymen using hand-held cameras contain undesirable shaky
motion. Estimating the global motion between successive frames, in a manner not
influenced by moving objects, is central to many video stabilization
techniques, but poses significant challenges. A large body of work uses 2D
affine transformations or homography for the global motion. However, in this
work, we introduce a more general representation scheme, which adapts any
existing optical flow network to ignore the moving objects and obtain a
spatially smooth approximation of the global motion between video frames. We
achieve this by a knowledge distillation approach, where we first introduce a
low pass filter module into the optical flow network to constrain the predicted
optical flow to be spatially smooth. This becomes our student network, named as
\textsc{GlobalFlowNet}. Then, using the original optical flow network as the
teacher network, we train the student network using a robust loss function.
Given a trained \textsc{GlobalFlowNet}, we stabilize videos using a two stage
process. In the first stage, we correct the instability in affine parameters
using a quadratic programming approach constrained by a user-specified cropping
limit to control loss of field of view. In the second stage, we stabilize the
video further by smoothing global motion parameters, expressed using a small
number of discrete cosine transform coefficients. In extensive experiments on a
variety of different videos, our technique outperforms state of the art
techniques in terms of subjective quality and different quantitative measures
of video stability. The source code is publicly available at
\href{https://github.com/GlobalFlowNet/GlobalFlowNet}{https://github.com/GlobalFlowNet/GlobalFlowNet}Comment: Accepted in WACV 202
Fast Full-frame Video Stabilization with Iterative Optimization
Video stabilization refers to the problem of transforming a shaky video into
a visually pleasing one. The question of how to strike a good trade-off between
visual quality and computational speed has remained one of the open challenges
in video stabilization. Inspired by the analogy between wobbly frames and
jigsaw puzzles, we propose an iterative optimization-based learning approach
using synthetic datasets for video stabilization, which consists of two
interacting submodules: motion trajectory smoothing and full-frame outpainting.
First, we develop a two-level (coarse-to-fine) stabilizing algorithm based on
the probabilistic flow field. The confidence map associated with the estimated
optical flow is exploited to guide the search for shared regions through
backpropagation. Second, we take a divide-and-conquer approach and propose a
novel multiframe fusion strategy to render full-frame stabilized views. An
important new insight brought about by our iterative optimization approach is
that the target video can be interpreted as the fixed point of nonlinear
mapping for video stabilization. We formulate video stabilization as a problem
of minimizing the amount of jerkiness in motion trajectories, which guarantees
convergence with the help of fixed-point theory. Extensive experimental results
are reported to demonstrate the superiority of the proposed approach in terms
of computational speed and visual quality. The code will be available on
GitHub.Comment: Accepted by ICCV202
Long-Term Visual Object Tracking Benchmark
We propose a new long video dataset (called Track Long and Prosper - TLP) and
benchmark for single object tracking. The dataset consists of 50 HD videos from
real world scenarios, encompassing a duration of over 400 minutes (676K
frames), making it more than 20 folds larger in average duration per sequence
and more than 8 folds larger in terms of total covered duration, as compared to
existing generic datasets for visual tracking. The proposed dataset paves a way
to suitably assess long term tracking performance and train better deep
learning architectures (avoiding/reducing augmentation, which may not reflect
real world behaviour). We benchmark the dataset on 17 state of the art trackers
and rank them according to tracking accuracy and run time speeds. We further
present thorough qualitative and quantitative evaluation highlighting the
importance of long term aspect of tracking. Our most interesting observations
are (a) existing short sequence benchmarks fail to bring out the inherent
differences in tracking algorithms which widen up while tracking on long
sequences and (b) the accuracy of trackers abruptly drops on challenging long
sequences, suggesting the potential need of research efforts in the direction
of long-term tracking.Comment: ACCV 2018 (Oral
- …