113 research outputs found

    iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects

    Full text link
    We address the task of 6D pose estimation of known rigid objects from single input images in scenarios where the objects are partly occluded. Recent RGB-D-based methods are robust to moderate degrees of occlusion. For RGB inputs, no previous method works well for partly occluded objects. Our main contribution is to present the first deep learning-based system that estimates accurate poses for partly occluded objects from RGB-D and RGB input. We achieve this with a new instance-aware pipeline that decomposes 6D object pose estimation into a sequence of simpler steps, where each step removes specific aspects of the problem. The first step localizes all known objects in the image using an instance segmentation network, and hence eliminates surrounding clutter and occluders. The second step densely maps pixels to 3D object surface positions, so called object coordinates, using an encoder-decoder network, and hence eliminates object appearance. The third, and final, step predicts the 6D pose using geometric optimization. We demonstrate that we significantly outperform the state-of-the-art for pose estimation of partly occluded objects for both RGB and RGB-D input

    Toward Flare-Free Images: A Survey

    Full text link
    Lens flare is a common image artifact that can significantly degrade image quality and affect the performance of computer vision systems due to a strong light source pointing at the camera. This survey provides a comprehensive overview of the multifaceted domain of lens flare, encompassing its underlying physics, influencing factors, types, and characteristics. It delves into the complex optics of flare formation, arising from factors like internal reflection, scattering, diffraction, and dispersion within the camera lens system. The diverse categories of flare are explored, including scattering, reflective, glare, orb, and starburst types. Key properties such as shape, color, and localization are analyzed. The numerous factors impacting flare appearance are discussed, spanning light source attributes, lens features, camera settings, and scene content. The survey extensively covers the wide range of methods proposed for flare removal, including hardware optimization strategies, classical image processing techniques, and learning-based methods using deep learning. It not only describes pioneering flare datasets created for training and evaluation purposes but also how they were created. Commonly employed performance metrics such as PSNR, SSIM, and LPIPS are explored. Challenges posed by flare's complex and data-dependent characteristics are highlighted. The survey provides insights into best practices, limitations, and promising future directions for flare removal research. Reviewing the state-of-the-art enables an in-depth understanding of the inherent complexities of the flare phenomenon and the capabilities of existing solutions. This can inform and inspire new innovations for handling lens flare artifacts and improving visual quality across various applications

    Depth Estimation from Monocular Images and Sparse Radar Data

    Full text link
    In this paper, we explore the possibility of achieving a more accurate depth estimation by fusing monocular images and Radar points using a deep neural network. We give a comprehensive study of the fusion between RGB images and Radar measurements from different aspects and proposed a working solution based on the observations. We find that the noise existing in Radar measurements is one of the main key reasons that prevents one from applying the existing fusion methods developed for LiDAR data and images to the new fusion problem between Radar data and images. The experiments are conducted on the nuScenes dataset, which is one of the first datasets which features Camera, Radar, and LiDAR recordings in diverse scenes and weather conditions. Extensive experiments demonstrate that our method outperforms existing fusion methods. We also provide detailed ablation studies to show the effectiveness of each component in our method.Comment: 9 pages, 6 figures, Accepted to 2020 IEEE International Conference on Intelligent Robots and Systems (IROS 2020

    Exploring Subtasks of Scene Understanding: Challenges and Cross-Modal Analysis

    Get PDF
    Scene understanding is one of the most important problems in computer vision. It consists of many subtasks such as image classification for describing an image with one word, object detection for finding and localizing objects of interest in the image and assigning a category to each of them, semantic segmentation for assigning a category to each pixel of an image, instance segmentation for finding and localizing objects of interest and marking all the pixels belonging to each object, depth estimation for estimating the distance of each pixel in the image from the camera, etc. Each of these tasks has its advantages and limitations. These tasks have a common goal to achieve that is to understand and describe a scene captured in an image or a set of images. One common question is if there is any synergy between these tasks. Therefore, alongside single task approaches, there is a line of research on how to learn multiple tasks jointly. In this thesis, we explore different subtasks of scene understanding and propose mainly deep learning-based approaches to improve these tasks. First, we propose a modular Convolutional Neural Network (CNN) architecture for jointly training semantic segmentation and depth estimation tasks. We provide a setup suitable to analyze the cross-modality influence between these tasks for different architecture designs. Then, we utilize object detection and instance segmentation as auxiliary tasks for focusing on target objects in complex tasks of scene flow estimation and object 6d pose estimation. Furthermore, we propose a novel deep approach for object co-segmentation which is the task of segmenting common objects in a set of images. Finally, we introduce a novel pooling layer that preserves the spatial information while capturing a large receptive field. This pooling layer is designed for improving the dense prediction tasks such as semantic segmentation and depth estimation
    • …
    corecore