9 research outputs found
OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic Segmentation of Road Scenes
Light field cameras can provide rich angular and spatial information to
enhance image semantic segmentation for scene understanding in the field of
autonomous driving. However, the extensive angular information of light field
cameras contains a large amount of redundant data, which is overwhelming for
the limited hardware resource of intelligent vehicles. Besides, inappropriate
compression leads to information corruption and data loss. To excavate
representative information, we propose an Omni-Aperture Fusion model (OAFuser),
which leverages dense context from the central view and discovers the angular
information from sub-aperture images to generate a semantically-consistent
result. To avoid feature loss during network propagation and simultaneously
streamline the redundant information from the light field camera, we present a
simple yet very effective Sub-Aperture Fusion Module (SAFM) to embed
sub-aperture images into angular features without any additional memory cost.
Furthermore, to address the mismatched spatial information across viewpoints,
we present Center Angular Rectification Module (CARM) realized feature
resorting and prevent feature occlusion caused by asymmetric information. Our
proposed OAFuser achieves state-of-the-art performance on the UrbanLF-Real and
-Syn datasets and sets a new record of 84.93% in mIoU on the UrbanLF-Real
Extended dataset, with a gain of +4.53%. The source code of OAFuser will be
made publicly available at https://github.com/FeiBryantkit/OAFuser.Comment: The source code of OAFuser will be made publicly available at
https://github.com/FeiBryantkit/OAFuse
Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-resolution
The effective extraction of spatial-angular features plays a crucial role in
light field image super-resolution (LFSR) tasks, and the introduction of
convolution and Transformers leads to significant improvement in this area.
Nevertheless, due to the large 4D data volume of light field images, many
existing methods opted to decompose the data into a number of lower-dimensional
subspaces and perform Transformers in each sub-space individually. As a side
effect, these methods inadvertently restrict the self-attention mechanisms to a
One-to-One scheme accessing only a limited subset of LF data, explicitly
preventing comprehensive optimization on all spatial and angular cues. In this
paper, we identify this limitation as subspace isolation and introduce a novel
Many-to-Many Transformer (M2MT) to address it. M2MT aggregates angular
information in the spatial subspace before performing the self-attention
mechanism. It enables complete access to all information across all
sub-aperture images (SAIs) in a light field image. Consequently, M2MT is
enabled to comprehensively capture long-range correlation dependencies. With
M2MT as the pivotal component, we develop a simple yet effective M2MT network
for LFSR. Our experimental results demonstrate that M2MT achieves
state-of-the-art performance across various public datasets. We further conduct
in-depth analysis using local attribution maps (LAM) to obtain visual
interpretability, and the results validate that M2MT is empowered with a truly
non-local context in both spatial and angular subspaces to mitigate subspace
isolation and acquire effective spatial-angular representation
Light Field Reconstruction via Attention-Guided Deep Fusion of Hybrid Lenses
This paper explores the problem of reconstructing high-resolution light field
(LF) images from hybrid lenses, including a high-resolution camera surrounded
by multiple low-resolution cameras. The performance of existing methods is
still limited, as they produce either blurry results on plain textured areas or
distortions around depth discontinuous boundaries. To tackle this challenge, we
propose a novel end-to-end learning-based approach, which can comprehensively
utilize the specific characteristics of the input from two complementary and
parallel perspectives. Specifically, one module regresses a spatially
consistent intermediate estimation by learning a deep multidimensional and
cross-domain feature representation, while the other module warps another
intermediate estimation, which maintains the high-frequency textures, by
propagating the information of the high-resolution view. We finally leverage
the advantages of the two intermediate estimations adaptively via the learned
attention maps, leading to the final high-resolution LF image with satisfactory
results on both plain textured areas and depth discontinuous boundaries.
Besides, to promote the effectiveness of our method trained with simulated
hybrid data on real hybrid data captured by a hybrid LF imaging system, we
carefully design the network architecture and the training strategy. Extensive
experiments on both real and simulated hybrid data demonstrate the significant
superiority of our approach over state-of-the-art ones. To the best of our
knowledge, this is the first end-to-end deep learning method for LF
reconstruction from a real hybrid input. We believe our framework could
potentially decrease the cost of high-resolution LF data acquisition and
benefit LF data storage and transmission.Comment: 14 pages, 8 figures. arXiv admin note: text overlap with
arXiv:1907.0964
Real-World Light Field Image Super-Resolution via Degradation Modulation
Recent years have witnessed the great advances of deep neural networks (DNNs)
in light field (LF) image super-resolution (SR). However, existing DNN-based LF
image SR methods are developed on a single fixed degradation (e.g., bicubic
downsampling), and thus cannot be applied to super-resolve real LF images with
diverse degradation. In this paper, we propose a simple yet effective method
for real-world LF image SR. In our method, a practical LF degradation model is
developed to formulate the degradation process of real LF images. Then, a
convolutional neural network is designed to incorporate the degradation prior
into the SR process. By training on LF images using our formulated degradation,
our network can learn to modulate different degradation while incorporating
both spatial and angular information in LF images. Extensive experiments on
both synthetically degraded and real-world LF images demonstrate the
effectiveness of our method. Compared with existing state-of-the-art single and
LF image SR methods, our method achieves superior SR performance under a wide
range of degradation, and generalizes better to real LF images. Codes and
models are available at https://yingqianwang.github.io/LF-DMnet/.Comment: 15 pages, 10 figure
Enhanced processing methods for light field imaging
The light field camera provides rich textural and geometric information, but it is still challenging to use it efficiently and accurately to solve computer vision problems. Light field image processing is divided into multiple levels. First, low-level processing technology mainly includes the acquisition of light field images and their preprocessing. Second, the middle-level process consists of the depth estimation, light field encoding, and the extraction of cues from the light field. Third, high-level processing involves 3D reconstruction, target recognition, visual odometry, image reconstruction, and other advanced applications. We propose a series of improved algorithms for each of these levels.
The light field signal contains rich angular information. By contrast, traditional computer vision methods, as used for 2D images, often cannot make full use of the high-frequency part of the light field angular information. We propose a fast pre-estimation algorithm to enhance the light field feature to improve its speed and accuracy when keeping full use of the angular information.Light field filtering and refocusing are essential cues in light field signal processing. Modern frequency domain filtering technology and wavelet technology have effectively improved light field filtering accuracy but may fail at object edges. We adapted the sub-window filtering with the light field to improve the reconstruction of object edges. Light field images can analyze the effects of scattering and refraction phenomena, and there are still insufficient metrics to evaluate the results. Therefore, we propose a physical rendering-based light field dataset that simulates the distorted light field image through a transparent medium, such as atmospheric turbulence or water surface. The neural network is an essential method to process complex light field data. We propose an efficient 3D convolutional autoencoder network for the light field structure. This network overcomes the severe distortion caused by high-intensity turbulence with limited angular resolution and solves the difficulty of pixel matching between distorted images.
This work emphasizes the application and usefulness of light field imaging in computer vision whilst improving light field image processing speed and accuracy through signal processing, computer graphics, computer vision, and artificial neural networks