13,123 research outputs found
Bandwidth Extension on Raw Audio via Generative Adversarial Networks
Neural network-based methods have recently demonstrated state-of-the-art
results on image synthesis and super-resolution tasks, in particular by using
variants of generative adversarial networks (GANs) with supervised feature
losses. Nevertheless, previous feature loss formulations rely on the
availability of large auxiliary classifier networks, and labeled datasets that
enable such classifiers to be trained. Furthermore, there has been
comparatively little work to explore the applicability of GAN-based methods to
domains other than images and video. In this work we explore a GAN-based method
for audio processing, and develop a convolutional neural network architecture
to perform audio super-resolution. In addition to several new architectural
building blocks for audio processing, a key component of our approach is the
use of an autoencoder-based loss that enables training in the GAN framework,
with feature losses derived from unlabeled data. We explore the impact of our
architectural choices, and demonstrate significant improvements over previous
works in terms of both objective and perceptual quality
3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks
In video super-resolution, the spatio-temporal coherence between, and among
the frames must be exploited appropriately for accurate prediction of the high
resolution frames. Although 2D convolutional neural networks (CNNs) are
powerful in modelling images, 3D-CNNs are more suitable for spatio-temporal
feature extraction as they can preserve temporal information. To this end, we
propose an effective 3D-CNN for video super-resolution, called the 3DSRnet that
does not require motion alignment as preprocessing. Our 3DSRnet maintains the
temporal depth of spatio-temporal feature maps to maximally capture the
temporally nonlinear characteristics between low and high resolution frames,
and adopts residual learning in conjunction with the sub-pixel outputs. It
outperforms the most state-of-the-art method with average 0.45 and 0.36 dB
higher in PSNR for scales 3 and 4, respectively, in the Vidset4 benchmark. Our
3DSRnet first deals with the performance drop due to scene change, which is
important in practice but has not been previously considered.Comment: Extension of our paper accepted at ICIP 201
Super-Resolution via Deep Learning
The recent phenomenal interest in convolutional neural networks (CNNs) must
have made it inevitable for the super-resolution (SR) community to explore its
potential. The response has been immense and in the last three years, since the
advent of the pioneering work, there appeared too many works not to warrant a
comprehensive survey. This paper surveys the SR literature in the context of
deep learning. We focus on the three important aspects of multimedia - namely
image, video and multi-dimensions, especially depth maps. In each case, first
relevant benchmarks are introduced in the form of datasets and state of the art
SR methods, excluding deep learning. Next is a detailed analysis of the
individual works, each including a short description of the method and a
critique of the results with special reference to the benchmarking done. This
is followed by minimum overall benchmarking in the form of comparison on some
common dataset, while relying on the results reported in various works
Fast Spatio-Temporal Residual Network for Video Super-Resolution
Recently, deep learning based video super-resolution (SR) methods have
achieved promising performance. To simultaneously exploit the spatial and
temporal information of videos, employing 3-dimensional (3D) convolutions is a
natural approach. However, straight utilizing 3D convolutions may lead to an
excessively high computational complexity which restricts the depth of video SR
models and thus undermine the performance. In this paper, we present a novel
fast spatio-temporal residual network (FSTRN) to adopt 3D convolutions for the
video SR task in order to enhance the performance while maintaining a low
computational load. Specifically, we propose a fast spatio-temporal residual
block (FRB) that divide each 3D filter to the product of two 3D filters, which
have considerably lower dimensions. Furthermore, we design a cross-space
residual learning that directly links the low-resolution space and the
high-resolution space, which can greatly relieve the computational burden on
the feature fusion and up-scaling parts. Extensive evaluations and comparisons
on benchmark datasets validate the strengths of the proposed approach and
demonstrate that the proposed network significantly outperforms the current
state-of-the-art methods.Comment: To appear in CVPR 201
Is There Tradeoff between Spatial and Temporal in Video Super-Resolution?
Recent advances of deep learning lead to great success of image and video
super-resolution (SR) methods that are based on convolutional neural networks
(CNN). For video SR, advanced algorithms have been proposed to exploit the
temporal correlation between low-resolution (LR) video frames, and/or to
super-resolve a frame with multiple LR frames. These methods pursue higher
quality of super-resolved frames, where the quality is usually measured frame
by frame in e.g. PSNR. However, frame-wise quality may not reveal the
consistency between frames. If an algorithm is applied to each frame
independently (which is the case of most previous methods), the algorithm may
cause temporal inconsistency, which can be observed as flickering. It is a
natural requirement to improve both frame-wise fidelity and between-frame
consistency, which are termed spatial quality and temporal quality,
respectively. Then we may ask, is a method optimized for spatial quality also
optimized for temporal quality? Can we optimize the two quality metrics
jointly
iSeeBetter: Spatio-temporal video super-resolution using recurrent generative back-projection networks
Recently, learning-based models have enhanced the performance of single-image
super-resolution (SISR). However, applying SISR successively to each video
frame leads to a lack of temporal coherency. Convolutional neural networks
(CNNs) outperform traditional approaches in terms of image quality metrics such
as peak signal to noise ratio (PSNR) and structural similarity (SSIM). However,
generative adversarial networks (GANs) offer a competitive advantage by being
able to mitigate the issue of a lack of finer texture details, usually seen
with CNNs when super-resolving at large upscaling factors. We present
iSeeBetter, a novel GAN-based spatio-temporal approach to video
super-resolution (VSR) that renders temporally consistent super-resolution
videos. iSeeBetter extracts spatial and temporal information from the current
and neighboring frames using the concept of recurrent back-projection networks
as its generator. Furthermore, to improve the "naturality" of the
super-resolved image while eliminating artifacts seen with traditional
algorithms, we utilize the discriminator from super-resolution generative
adversarial network (SRGAN). Although mean squared error (MSE) as a primary
loss-minimization objective improves PSNR/SSIM, these metrics may not capture
fine details in the image resulting in misrepresentation of perceptual quality.
To address this, we use a four-fold (MSE, perceptual, adversarial, and
total-variation (TV)) loss function. Our results demonstrate that iSeeBetter
offers superior VSR fidelity and surpasses state-of-the-art performance.Comment: 11 pages, 6 figures, 4 tables, Project Page:
https://iseebetter.amanchadha.com
Perceptual Video Super Resolution with Enhanced Temporal Consistency
With the advent of perceptual loss functions, new possibilities in
super-resolution have emerged, and we currently have models that successfully
generate near-photorealistic high-resolution images from their low-resolution
observations. Up to now, however, such approaches have been exclusively limited
to single image super-resolution. The application of perceptual loss functions
on video processing still entails several challenges, mostly related to the
lack of temporal consistency of the generated images, i.e., flickering
artifacts. In this work, we present a novel adversarial recurrent network for
video upscaling that is able to produce realistic textures in a temporally
consistent way. The proposed architecture naturally leverages information from
previous frames due to its recurrent architecture, i.e. the input to the
generator is composed of the low-resolution image and, additionally, the warped
output of the network at the previous step. Together with a video
discriminator, we also propose additional loss functions to further reinforce
temporal consistency in the generated sequences. The experimental validation of
our algorithm shows the effectiveness of our approach which obtains images with
high perceptual quality and improved temporal consistency.Comment: Major revision and improvement of the manuscript: New network
architecture, new loss function and extended experiment
NTIRE 2020 Challenge on Image and Video Deblurring
Motion blur is one of the most common degradation artifacts in dynamic scene
photography. This paper reviews the NTIRE 2020 Challenge on Image and Video
Deblurring. In this challenge, we present the evaluation results from 3
competition tracks as well as the proposed solutions. Track 1 aims to develop
single-image deblurring methods focusing on restoration quality. On Track 2,
the image deblurring methods are executed on a mobile platform to find the
balance of the running speed and the restoration accuracy. Track 3 targets
developing video deblurring methods that exploit the temporal relation between
input frames. In each competition, there were 163, 135, and 102 registered
participants and in the final testing phase, 9, 4, and 7 teams competed. The
winning methods demonstrate the state-ofthe-art performance on image and video
deblurring tasks.Comment: To be published in CVPR 2020 Workshop (New Trends in Image
Restoration and Enhancement
Adapting Image Super-Resolution State-of-the-arts and Learning Multi-model Ensemble for Video Super-Resolution
Recently, image super-resolution has been widely studied and achieved
significant progress by leveraging the power of deep convolutional neural
networks. However, there has been limited advancement in video super-resolution
(VSR) due to the complex temporal patterns in videos. In this paper, we
investigate how to adapt state-of-the-art methods of image super-resolution for
video super-resolution. The proposed adapting method is straightforward. The
information among successive frames is well exploited, while the overhead on
the original image super-resolution method is negligible. Furthermore, we
propose a learning-based method to ensemble the outputs from multiple
super-resolution models. Our methods show superior performance and rank second
in the NTIRE2019 Video Super-Resolution Challenge Track 1
Real-time Deep Video Deinterlacing
Interlacing is a widely used technique, for television broadcast and video
recording, to double the perceived frame rate without increasing the bandwidth.
But it presents annoying visual artifacts, such as flickering and silhouette
"serration," during the playback. Existing state-of-the-art deinterlacing
methods either ignore the temporal information to provide real-time performance
but lower visual quality, or estimate the motion for better deinterlacing but
with a trade-off of higher computational cost. In this paper, we present the
first and novel deep convolutional neural networks (DCNNs) based method to
deinterlace with high visual quality and real-time performance. Unlike existing
models for super-resolution problems which relies on the translation-invariant
assumption, our proposed DCNN model utilizes the temporal information from both
the odd and even half frames to reconstruct only the missing scanlines, and
retains the given odd and even scanlines for producing the full deinterlaced
frames. By further introducing a layer-sharable architecture, our system can
achieve real-time performance on a single GPU. Experiments shows that our
method outperforms all existing methods, in terms of reconstruction accuracy
and computational performance.Comment: 9 pages, 11 figure
- …