95,826 research outputs found

    Direct structure estimation for 3D reconstruction

    Get PDF

    Learning Human Poses from Monocular Images

    Get PDF
    In this research, we mainly focus on the problem of estimating the 2D human pose from a monocular image and reconstructing the 3D human pose based on the 2D human pose. Here a 3D pose is the locations of the human joints in the 3D space and a 2D pose is the projection of a 3D pose on an image. Unlike many previous works that explicitly use hand-crafted physiological models, both our 2D pose estimation and 3D pose reconstruction approaches implicitly learn the structure of human body from human pose data. This 3D pose reconstruction is an ill-posed problem without considering any prior knowledge. In this research, we propose a new approach, namely Pose Locality Constrained Representation (PLCR), to constrain the search space for the underlying 3D human pose and use it to improve 3D human pose reconstruction. In this approach, an over-complete pose dictionary is constructed by hierarchically clustering the 3D pose space into many subspaces. Then PLCR utilizes the structure of the over-complete dictionary to constrain the 3D pose solution to a set of highly-related subspaces. Finally, PLCR is combined into the matching-pursuit based algorithm for 3D human-pose reconstruction. The 2D human pose used in 3D pose reconstruction can be manually annotated or automatically estimated from a single image. In this research, we develop a new learning-based 2D human pose estimation approach based on a Dual-Source Deep Convolutional Neural Networks (DS-CNN). The proposed DS-CNN model learns the appearance of each local body part and the relations between parts simultaneously, while most of existing approaches consider them as two separate steps. In our experiments, the proposed DS-CNN model produces superior or comparable performance against the state-of-the-art 2D human-pose estimation approaches based on pose priors learned from hand-crafted models or holistic perspectives. Finally, we use our 2D human pose estimation approach to recognize human attributes by utilizing the strong correspondence between human attributes and human body parts. Then we probe if and when the CNN can find such correspondence by itself on human attribute recognition and bird species recognition. We find that there is direct correlation between the recognition accuracy and the correctness of the correspondence that the CNN finds

    A Linear Approach to Absolute Pose Estimation for Light Fields

    Get PDF
    This paper presents the first absolute pose estimation approach tailored to Light Field cameras. It builds on the observation that the ratio between the disparity arising in different sub-aperture images and their corresponding baseline is constant. Hence, we augment the 2D pixel coordinates with the corresponding normalised disparity to obtain the Light Field feature. This new representation reduces the effect of noise by aggregating multiple projections and allows for linear estimation of the absolute pose of a Light Field camera using the well-known Direct Linear Transformation algorithm. We evaluate the resulting absolute pose estimates with extensive simulations and experiments involving real Light Field datasets, demonstrating the competitive performance of our linear approach. Furthermore, we integrate our approach in a state-of-the-art Light Field Structure from Motion pipeline and demonstrate accurate multi-view 3D reconstruction

    CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction

    Full text link
    Given the recent advances in depth prediction from Convolutional Neural Networks (CNNs), this paper investigates how predicted depth maps from a deep neural network can be deployed for accurate and dense monocular reconstruction. We propose a method where CNN-predicted dense depth maps are naturally fused together with depth measurements obtained from direct monocular SLAM. Our fusion scheme privileges depth prediction in image locations where monocular SLAM approaches tend to fail, e.g. along low-textured regions, and vice-versa. We demonstrate the use of depth prediction for estimating the absolute scale of the reconstruction, hence overcoming one of the major limitations of monocular SLAM. Finally, we propose a framework to efficiently fuse semantic labels, obtained from a single frame, with dense SLAM, yielding semantically coherent scene reconstruction from a single view. Evaluation results on two benchmark datasets show the robustness and accuracy of our approach.Comment: 10 pages, 6 figures, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, June, 2017. The first two authors contribute equally to this pape
    • …
    corecore