3,341 research outputs found
Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution
Convolutional neural networks have recently demonstrated high-quality
reconstruction for single-image super-resolution. In this paper, we propose the
Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively
reconstruct the sub-band residuals of high-resolution images. At each pyramid
level, our model takes coarse-resolution feature maps as input, predicts the
high-frequency residuals, and uses transposed convolutions for upsampling to
the finer level. Our method does not require the bicubic interpolation as the
pre-processing step and thus dramatically reduces the computational complexity.
We train the proposed LapSRN with deep supervision using a robust Charbonnier
loss function and achieve high-quality reconstruction. Furthermore, our network
generates multi-scale predictions in one feed-forward pass through the
progressive reconstruction, thereby facilitates resource-aware applications.
Extensive quantitative and qualitative evaluations on benchmark datasets show
that the proposed algorithm performs favorably against the state-of-the-art
methods in terms of speed and accuracy.Comment: This work is accepted in CVPR 2017. The code and datasets are
available on http://vllab.ucmerced.edu/wlai24/LapSRN
An Asynchronous Linear Filter Architecture for Hybrid Event-Frame Cameras
Event cameras are ideally suited to capture High Dynamic Range (HDR) visual
information without blur but provide poor imaging capability for static or
slowly varying scenes. Conversely, conventional image sensors measure absolute
intensity of slowly changing scenes effectively but do poorly on HDR or quickly
changing scenes. In this paper, we present an asynchronous linear filter
architecture, fusing event and frame camera data, for HDR video reconstruction
and spatial convolution that exploits the advantages of both sensor modalities.
The key idea is the introduction of a state that directly encodes the
integrated or convolved image information and that is updated asynchronously as
each event or each frame arrives from the camera. The state can be read-off
as-often-as and whenever required to feed into subsequent vision modules for
real-time robotic systems. Our experimental results are evaluated on both
publicly available datasets with challenging lighting conditions and fast
motions, along with a new dataset with HDR reference that we provide. The
proposed AKF pipeline outperforms other state-of-the-art methods in both
absolute intensity error (69.4% reduction) and image similarity indexes
(average 35.5% improvement). We also demonstrate the integration of image
convolution with linear spatial kernels Gaussian, Sobel, and Laplacian as an
application of our architecture.Comment: 17 pages, 10 figures, Accepted by IEEE Transactions on Pattern
Analysis and Machine Intelligence (TPAMI) in August 202
Confidence-aware Levenberg-Marquardt optimization for joint motion estimation and super-resolution
Motion estimation across low-resolution frames and the reconstruction of
high-resolution images are two coupled subproblems of multi-frame
super-resolution. This paper introduces a new joint optimization approach for
motion estimation and image reconstruction to address this interdependence. Our
method is formulated via non-linear least squares optimization and combines two
principles of robust super-resolution. First, to enhance the robustness of the
joint estimation, we propose a confidence-aware energy minimization framework
augmented with sparse regularization. Second, we develop a tailor-made
Levenberg-Marquardt iteration scheme to jointly estimate motion parameters and
the high-resolution image along with the corresponding model confidence
parameters. Our experiments on simulated and real images confirm that the
proposed approach outperforms decoupled motion estimation and image
reconstruction as well as related state-of-the-art joint estimation algorithms.Comment: accepted for ICIP 201
End-to-end Flow Correlation Tracking with Spatial-temporal Attention
Discriminative correlation filters (DCF) with deep convolutional features
have achieved favorable performance in recent tracking benchmarks. However,
most of existing DCF trackers only consider appearance features of current
frame, and hardly benefit from motion and inter-frame information. The lack of
temporal information degrades the tracking performance during challenges such
as partial occlusion and deformation. In this work, we focus on making use of
the rich flow information in consecutive frames to improve the feature
representation and the tracking accuracy. Firstly, individual components,
including optical flow estimation, feature extraction, aggregation and
correlation filter tracking are formulated as special layers in network. To the
best of our knowledge, this is the first work to jointly train flow and
tracking task in a deep learning framework. Then the historical feature maps at
predefined intervals are warped and aggregated with current ones by the guiding
of flow. For adaptive aggregation, we propose a novel spatial-temporal
attention mechanism. Extensive experiments are performed on four challenging
tracking datasets: OTB2013, OTB2015, VOT2015 and VOT2016, and the proposed
method achieves superior results on these benchmarks.Comment: Accepted in CVPR 201
Bitstream-Corrupted Video Recovery: A Novel Benchmark Dataset and Method
The past decade has witnessed great strides in video recovery by specialist
technologies, like video inpainting, completion, and error concealment.
However, they typically simulate the missing content by manual-designed error
masks, thus failing to fill in the realistic video loss in video communication
(e.g., telepresence, live streaming, and internet video) and multimedia
forensics. To address this, we introduce the bitstream-corrupted video (BSCV)
benchmark, the first benchmark dataset with more than 28,000 video clips, which
can be used for bitstream-corrupted video recovery in the real world. The BSCV
is a collection of 1) a proposed three-parameter corruption model for video
bitstream, 2) a large-scale dataset containing rich error patterns, multiple
corruption levels, and flexible dataset branches, and 3) a plug-and-play module
in video recovery framework that serves as a benchmark. We evaluate
state-of-the-art video inpainting methods on the BSCV dataset, demonstrating
existing approaches' limitations and our framework's advantages in solving the
bitstream-corrupted video recovery problem. The benchmark and dataset are
released at https://github.com/LIUTIGHE/BSCV-Dataset.Comment: Accepted by NeurIPS Dataset and Benchmark Track 202
- …