143,786 research outputs found
Light Field Super-Resolution Via Graph-Based Regularization
Light field cameras capture the 3D information in a scene with a single
exposure. This special feature makes light field cameras very appealing for a
variety of applications: from post-capture refocus, to depth estimation and
image-based rendering. However, light field cameras suffer by design from
strong limitations in their spatial resolution, which should therefore be
augmented by computational methods. On the one hand, off-the-shelf single-frame
and multi-frame super-resolution algorithms are not ideal for light field data,
as they do not consider its particular structure. On the other hand, the few
super-resolution algorithms explicitly tailored for light field data exhibit
significant limitations, such as the need to estimate an explicit disparity map
at each view. In this work we propose a new light field super-resolution
algorithm meant to address these limitations. We adopt a multi-frame alike
super-resolution approach, where the complementary information in the different
light field views is used to augment the spatial resolution of the whole light
field. We show that coupling the multi-frame approach with a graph regularizer,
that enforces the light field structure via nonlocal self similarities, permits
to avoid the costly and challenging disparity estimation step for all the
views. Extensive experiments show that the new algorithm compares favorably to
the other state-of-the-art methods for light field super-resolution, both in
terms of PSNR and visual quality.Comment: This new version includes more material. In particular, we added: a
new section on the computational complexity of the proposed algorithm,
experimental comparisons with a CNN-based super-resolution algorithm, and new
experiments on a third datase
ChiTransformer:Towards Reliable Stereo from Cues
Current stereo matching techniques are challenged by restricted searching
space, occluded regions and sheer size. While single image depth estimation is
spared from these challenges and can achieve satisfactory results with the
extracted monocular cues, the lack of stereoscopic relationship renders the
monocular prediction less reliable on its own, especially in highly dynamic or
cluttered environments. To address these issues in both scenarios, we present
an optic-chiasm-inspired self-supervised binocular depth estimation method,
wherein a vision transformer (ViT) with gated positional cross-attention (GPCA)
layers is designed to enable feature-sensitive pattern retrieval between views
while retaining the extensive context information aggregated through
self-attentions. Monocular cues from a single view are thereafter conditionally
rectified by a blending layer with the retrieved pattern pairs. This crossover
design is biologically analogous to the optic-chasma structure in the human
visual system and hence the name, ChiTransformer. Our experiments show that
this architecture yields substantial improvements over state-of-the-art
self-supervised stereo approaches by 11%, and can be used on both rectilinear
and non-rectilinear (e.g., fisheye) images.Comment: 11 pages, 3 figures, CVPR202
- …