9 research outputs found

    OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic Segmentation of Road Scenes

    Full text link
    Light field cameras can provide rich angular and spatial information to enhance image semantic segmentation for scene understanding in the field of autonomous driving. However, the extensive angular information of light field cameras contains a large amount of redundant data, which is overwhelming for the limited hardware resource of intelligent vehicles. Besides, inappropriate compression leads to information corruption and data loss. To excavate representative information, we propose an Omni-Aperture Fusion model (OAFuser), which leverages dense context from the central view and discovers the angular information from sub-aperture images to generate a semantically-consistent result. To avoid feature loss during network propagation and simultaneously streamline the redundant information from the light field camera, we present a simple yet very effective Sub-Aperture Fusion Module (SAFM) to embed sub-aperture images into angular features without any additional memory cost. Furthermore, to address the mismatched spatial information across viewpoints, we present Center Angular Rectification Module (CARM) realized feature resorting and prevent feature occlusion caused by asymmetric information. Our proposed OAFuser achieves state-of-the-art performance on the UrbanLF-Real and -Syn datasets and sets a new record of 84.93% in mIoU on the UrbanLF-Real Extended dataset, with a gain of +4.53%. The source code of OAFuser will be made publicly available at https://github.com/FeiBryantkit/OAFuser.Comment: The source code of OAFuser will be made publicly available at https://github.com/FeiBryantkit/OAFuse

    Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-resolution

    Full text link
    The effective extraction of spatial-angular features plays a crucial role in light field image super-resolution (LFSR) tasks, and the introduction of convolution and Transformers leads to significant improvement in this area. Nevertheless, due to the large 4D data volume of light field images, many existing methods opted to decompose the data into a number of lower-dimensional subspaces and perform Transformers in each sub-space individually. As a side effect, these methods inadvertently restrict the self-attention mechanisms to a One-to-One scheme accessing only a limited subset of LF data, explicitly preventing comprehensive optimization on all spatial and angular cues. In this paper, we identify this limitation as subspace isolation and introduce a novel Many-to-Many Transformer (M2MT) to address it. M2MT aggregates angular information in the spatial subspace before performing the self-attention mechanism. It enables complete access to all information across all sub-aperture images (SAIs) in a light field image. Consequently, M2MT is enabled to comprehensively capture long-range correlation dependencies. With M2MT as the pivotal component, we develop a simple yet effective M2MT network for LFSR. Our experimental results demonstrate that M2MT achieves state-of-the-art performance across various public datasets. We further conduct in-depth analysis using local attribution maps (LAM) to obtain visual interpretability, and the results validate that M2MT is empowered with a truly non-local context in both spatial and angular subspaces to mitigate subspace isolation and acquire effective spatial-angular representation

    Light Field Reconstruction via Attention-Guided Deep Fusion of Hybrid Lenses

    Full text link
    This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. The performance of existing methods is still limited, as they produce either blurry results on plain textured areas or distortions around depth discontinuous boundaries. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. Specifically, one module regresses a spatially consistent intermediate estimation by learning a deep multidimensional and cross-domain feature representation, while the other module warps another intermediate estimation, which maintains the high-frequency textures, by propagating the information of the high-resolution view. We finally leverage the advantages of the two intermediate estimations adaptively via the learned attention maps, leading to the final high-resolution LF image with satisfactory results on both plain textured areas and depth discontinuous boundaries. Besides, to promote the effectiveness of our method trained with simulated hybrid data on real hybrid data captured by a hybrid LF imaging system, we carefully design the network architecture and the training strategy. Extensive experiments on both real and simulated hybrid data demonstrate the significant superiority of our approach over state-of-the-art ones. To the best of our knowledge, this is the first end-to-end deep learning method for LF reconstruction from a real hybrid input. We believe our framework could potentially decrease the cost of high-resolution LF data acquisition and benefit LF data storage and transmission.Comment: 14 pages, 8 figures. arXiv admin note: text overlap with arXiv:1907.0964

    Real-World Light Field Image Super-Resolution via Degradation Modulation

    Full text link
    Recent years have witnessed the great advances of deep neural networks (DNNs) in light field (LF) image super-resolution (SR). However, existing DNN-based LF image SR methods are developed on a single fixed degradation (e.g., bicubic downsampling), and thus cannot be applied to super-resolve real LF images with diverse degradation. In this paper, we propose a simple yet effective method for real-world LF image SR. In our method, a practical LF degradation model is developed to formulate the degradation process of real LF images. Then, a convolutional neural network is designed to incorporate the degradation prior into the SR process. By training on LF images using our formulated degradation, our network can learn to modulate different degradation while incorporating both spatial and angular information in LF images. Extensive experiments on both synthetically degraded and real-world LF images demonstrate the effectiveness of our method. Compared with existing state-of-the-art single and LF image SR methods, our method achieves superior SR performance under a wide range of degradation, and generalizes better to real LF images. Codes and models are available at https://yingqianwang.github.io/LF-DMnet/.Comment: 15 pages, 10 figure

    Enhanced processing methods for light field imaging

    Full text link
    The light field camera provides rich textural and geometric information, but it is still challenging to use it efficiently and accurately to solve computer vision problems. Light field image processing is divided into multiple levels. First, low-level processing technology mainly includes the acquisition of light field images and their preprocessing. Second, the middle-level process consists of the depth estimation, light field encoding, and the extraction of cues from the light field. Third, high-level processing involves 3D reconstruction, target recognition, visual odometry, image reconstruction, and other advanced applications. We propose a series of improved algorithms for each of these levels. The light field signal contains rich angular information. By contrast, traditional computer vision methods, as used for 2D images, often cannot make full use of the high-frequency part of the light field angular information. We propose a fast pre-estimation algorithm to enhance the light field feature to improve its speed and accuracy when keeping full use of the angular information.Light field filtering and refocusing are essential cues in light field signal processing. Modern frequency domain filtering technology and wavelet technology have effectively improved light field filtering accuracy but may fail at object edges. We adapted the sub-window filtering with the light field to improve the reconstruction of object edges. Light field images can analyze the effects of scattering and refraction phenomena, and there are still insufficient metrics to evaluate the results. Therefore, we propose a physical rendering-based light field dataset that simulates the distorted light field image through a transparent medium, such as atmospheric turbulence or water surface. The neural network is an essential method to process complex light field data. We propose an efficient 3D convolutional autoencoder network for the light field structure. This network overcomes the severe distortion caused by high-intensity turbulence with limited angular resolution and solves the difficulty of pixel matching between distorted images. This work emphasizes the application and usefulness of light field imaging in computer vision whilst improving light field image processing speed and accuracy through signal processing, computer graphics, computer vision, and artificial neural networks