67,170 research outputs found
Confidence-aware Levenberg-Marquardt optimization for joint motion estimation and super-resolution
Motion estimation across low-resolution frames and the reconstruction of
high-resolution images are two coupled subproblems of multi-frame
super-resolution. This paper introduces a new joint optimization approach for
motion estimation and image reconstruction to address this interdependence. Our
method is formulated via non-linear least squares optimization and combines two
principles of robust super-resolution. First, to enhance the robustness of the
joint estimation, we propose a confidence-aware energy minimization framework
augmented with sparse regularization. Second, we develop a tailor-made
Levenberg-Marquardt iteration scheme to jointly estimate motion parameters and
the high-resolution image along with the corresponding model confidence
parameters. Our experiments on simulated and real images confirm that the
proposed approach outperforms decoupled motion estimation and image
reconstruction as well as related state-of-the-art joint estimation algorithms.Comment: accepted for ICIP 201
Multi-frame Image Super-resolution Reconstruction Using Multi-grained Cascade Forest
Super-resolution image reconstruction utilizes two algorithms, where one is for single-frame image reconstruction, and the other is for multi-frame image reconstruction. Single-frame image reconstruction generally takes the first degradation and is followed by reconstruction, which essentially creates a problem of insufficient characterization. Multi-frame images provide additional information for image reconstruction relative to single frame images due to the slight differences between sequential frames. However, the existing super-resolution algorithm for multi-frame images do not take advantage of this key factor, either because of loose structure and complexity, or because the individual frames are restored poorly. This paper proposes a new SR reconstruction algorithm for images using Multi-grained Cascade Forest. Multi-frame image reconstruction is processed sequentially. Firstly, the image registration algorithm uses a convolutional neural network to register low-resolution image sequences, and then the images are reconstructed after registration by the Multi-grained Cascade Forest reconstruction algorithm. Finally, the reconstructed images are fused. The optimal algorithm is selected for each step to get the most out of the details and tightly connect the internal logic of each sequential step.This novel approach proposed in this paper, in which the depth of the cascade forest is procedurally generated for recovered images, rather than being a constant. After training each layer, the recovered image is automatically evaluated, and new layers are constructed for training until an optimal restored image is obtained. Experiments show that this method improves the quality of image reconstruction while preserving the details of the image
Multi-frame Image Super-resolution Reconstruction Using Multi-grained Cascade Forest
Super-resolution image reconstruction utilizes two algorithms, where one is for single-frame image reconstruction, and the other is for multi-frame image reconstruction. Single-frame image reconstruction generally takes the first degradation and is followed by reconstruction, which essentially creates a problem of insufficient characterization. Multi-frame images provide additional information for image reconstruction relative to single frame images due to the slight differences between sequential frames. However, the existing super-resolution algorithm for multi-frame images do not take advantage of this key factor, either because of loose structure and complexity, or because the individual frames are restored poorly. This paper proposes a new SR reconstruction algorithm for images using Multi-grained Cascade Forest. Multi-frame image reconstruction is processed sequentially. Firstly, the image registration algorithm uses a convolutional neural network to register low-resolution image sequences, and then the images are reconstructed after registration by the Multi-grained Cascade Forest reconstruction algorithm. Finally, the reconstructed images are fused. The optimal algorithm is selected for each step to get the most out of the details and tightly connect the internal logic of each sequential step.This novel approach proposed in this paper, in which the depth of the cascade forest is procedurally generated for recovered images, rather than being a constant. After training each layer, the recovered image is automatically evaluated, and new layers are constructed for training until an optimal restored image is obtained. Experiments show that this method improves the quality of image reconstruction while preserving the details of the image
Multi-frame reconstruction using super-resolution, inpainting, segmentation and codecs
In this thesis, different aspects of video and light field reconstruction are considered such as super-resolution, inpainting, segmentation and codecs. For this purpose, each of these strategies are analyzed based on a specific goal and a specific database. Accordingly, databases which are relevant to film industry, sport videos, light fields and hyperspectral videos are used for the sake of improvement.
This thesis is constructed around six related manuscripts, in which several approaches are proposed for multi-frame reconstruction. Initially, a novel multi-frame reconstruction strategy is proposed for lightfield super-resolution in which graph-based regularization is applied along with edge preserving filtering for improving the spatio-angular quality of lightfield. Second, a novel video reconstruction is proposed which is built based on compressive sensing (CS), Gaussian mixture models (GMM) and sparse 3D transform-domain block matching. The motivation of the proposed technique is the improvement in visual quality performance of the video frames and decreasing the reconstruction error in comparison with the former video reconstruction methods. In the next approach, student-t mixture models and edge preserving filtering are applied for the purpose of video super-resolution. Student-t mixture model has a heavy tail which makes it robust and suitable as a video frame patch prior and rich in terms of log likelihood for information retrieval. In another approach, a hyperspectral video database is considered, and a Bayesian dictionary learning process is used for hyperspectral video super-resolution. To that end, Beta process is used in Bayesian dictionary learning and a sparse coding is generated regarding the hyperspectral video super-resolution. The spatial super-resolution is followed by a spectral video restoration strategy, and the whole process leveraged two different dictionary learnings, in which the first one is trained for spatial super-resolution and the second one is trained for the spectral restoration.
Furthermore, in another approach, a novel framework is proposed for replacing advertisement contents in soccer videos in an automatic way by using deep learning strategies. For this purpose, a UNET architecture is applied (an image segmentation convolutional neural network technique) for content segmentation and detection. Subsequently, after reconstructing the segmented content in the video frames (considering the apparent loss in detection), the unwanted content is replaced by new one using a homography mapping procedure.
In addition, in another research work, a novel video compression framework is presented using autoencoder networks that encode and decode videos by using less chroma information than luma information. For this purpose, instead of converting Y'CbCr 4:2:2/4:2:0 videos to and from RGB 4:4:4, the video is kept in Y'CbCr 4:2:2/4:2:0 and merged the luma and chroma channels after the luma is downsampled to match the chroma size. An inverse function is performed for the decoder. The performance of these models is evaluated by using CPSNR, MS-SSIM, and VMAF metrics. The experiments reveal that, as compared to video compression involving conversion to and from RGB 4:4:4, the proposed method increases the video quality by about 5.5% for Y'CbCr 4:2:2 and 8.3% for Y'CbCr 4:2:0 while reducing the amount of computation by nearly 37% for Y'CbCr 4:2:2 and 40% for Y'CbCr 4:2:0.
The thread that ties these approaches together is reconstruction of the video and light field frames based on different aspects of problems such as having loss of information, blur in the frames, existing noise after reconstruction, existing unpleasant content, excessive size of information and high computational overhead. In three of the proposed approaches, we have used Plug-and-Play ADMM model for the first time regarding reconstruction of videos and light fields in order to address both information retrieval in the frames and tackling noise/blur at the same time. In two of the proposed models, we applied sparse dictionary learning to reduce the data dimension and demonstrate them as an efficient linear combination of basis frame patches. Two of the proposed approaches are developed in collaboration with industry, in which deep learning frameworks are used to handle large set of features and to learn high-level features from the data
Light Field Super-Resolution Via Graph-Based Regularization
Light field cameras capture the 3D information in a scene with a single
exposure. This special feature makes light field cameras very appealing for a
variety of applications: from post-capture refocus, to depth estimation and
image-based rendering. However, light field cameras suffer by design from
strong limitations in their spatial resolution, which should therefore be
augmented by computational methods. On the one hand, off-the-shelf single-frame
and multi-frame super-resolution algorithms are not ideal for light field data,
as they do not consider its particular structure. On the other hand, the few
super-resolution algorithms explicitly tailored for light field data exhibit
significant limitations, such as the need to estimate an explicit disparity map
at each view. In this work we propose a new light field super-resolution
algorithm meant to address these limitations. We adopt a multi-frame alike
super-resolution approach, where the complementary information in the different
light field views is used to augment the spatial resolution of the whole light
field. We show that coupling the multi-frame approach with a graph regularizer,
that enforces the light field structure via nonlocal self similarities, permits
to avoid the costly and challenging disparity estimation step for all the
views. Extensive experiments show that the new algorithm compares favorably to
the other state-of-the-art methods for light field super-resolution, both in
terms of PSNR and visual quality.Comment: This new version includes more material. In particular, we added: a
new section on the computational complexity of the proposed algorithm,
experimental comparisons with a CNN-based super-resolution algorithm, and new
experiments on a third datase
An Efficient Algorithm for Video Super-Resolution Based On a Sequential Model
In this work, we propose a novel procedure for video super-resolution, that
is the recovery of a sequence of high-resolution images from its low-resolution
counterpart. Our approach is based on a "sequential" model (i.e., each
high-resolution frame is supposed to be a displaced version of the preceding
one) and considers the use of sparsity-enforcing priors. Both the recovery of
the high-resolution images and the motion fields relating them is tackled. This
leads to a large-dimensional, non-convex and non-smooth problem. We propose an
algorithmic framework to address the latter. Our approach relies on fast
gradient evaluation methods and modern optimization techniques for
non-differentiable/non-convex problems. Unlike some other previous works, we
show that there exists a provably-convergent method with a complexity linear in
the problem dimensions. We assess the proposed optimization method on {several
video benchmarks and emphasize its good performance with respect to the state
of the art.}Comment: 37 pages, SIAM Journal on Imaging Sciences, 201
Adaptive foveated single-pixel imaging with dynamic super-sampling
As an alternative to conventional multi-pixel cameras, single-pixel cameras
enable images to be recorded using a single detector that measures the
correlations between the scene and a set of patterns. However, to fully sample
a scene in this way requires at least the same number of correlation
measurements as there are pixels in the reconstructed image. Therefore
single-pixel imaging systems typically exhibit low frame-rates. To mitigate
this, a range of compressive sensing techniques have been developed which rely
on a priori knowledge of the scene to reconstruct images from an under-sampled
set of measurements. In this work we take a different approach and adopt a
strategy inspired by the foveated vision systems found in the animal kingdom -
a framework that exploits the spatio-temporal redundancy present in many
dynamic scenes. In our single-pixel imaging system a high-resolution foveal
region follows motion within the scene, but unlike a simple zoom, every frame
delivers new spatial information from across the entire field-of-view. Using
this approach we demonstrate a four-fold reduction in the time taken to record
the detail of rapidly evolving features, whilst simultaneously accumulating
detail of more slowly evolving regions over several consecutive frames. This
tiered super-sampling technique enables the reconstruction of video streams in
which both the resolution and the effective exposure-time spatially vary and
adapt dynamically in response to the evolution of the scene. The methods
described here can complement existing compressive sensing approaches and may
be applied to enhance a variety of computational imagers that rely on
sequential correlation measurements.Comment: 13 pages, 5 figure
- âŠ