568 research outputs found

    Vid2Curve: Simultaneous Camera Motion Estimation and Thin Structure Reconstruction from an RGB Video

    Get PDF
    Thin structures, such as wire-frame sculptures, fences, cables, power lines, and tree branches, are common in the real world. It is extremely challenging to acquire their 3D digital models using traditional image-based or depth-based reconstruction methods because thin structures often lack distinct point features and have severe self-occlusion. We propose the first approach that simultaneously estimates camera motion and reconstructs the geometry of complex 3D thin structures in high quality from a color video captured by a handheld camera. Specifically, we present a new curve-based approach to estimate accurate camera poses by establishing correspondences between featureless thin objects in the foreground in consecutive video frames, without requiring visual texture in the background scene to lock on. Enabled by this effective curve-based camera pose estimation strategy, we develop an iterative optimization method with tailored measures on geometry, topology as well as self-occlusion handling for reconstructing 3D thin structures. Extensive validations on a variety of thin structures show that our method achieves accurate camera pose estimation and faithful reconstruction of 3D thin structures with complex shape and topology at a level that has not been attained by other existing reconstruction methods.Comment: Accepted by SIGGRAPH 202

    3D Human Face Reconstruction and 2D Appearance Synthesis

    Get PDF
    3D human face reconstruction has been an extensive research for decades due to its wide applications, such as animation, recognition and 3D-driven appearance synthesis. Although commodity depth sensors are widely available in recent years, image based face reconstruction are significantly valuable as images are much easier to access and store. In this dissertation, we first propose three image-based face reconstruction approaches according to different assumption of inputs. In the first approach, face geometry is extracted from multiple key frames of a video sequence with different head poses. The camera should be calibrated under this assumption. As the first approach is limited to videos, we propose the second approach then focus on single image. This approach also improves the geometry by adding fine grains using shading cue. We proposed a novel albedo estimation and linear optimization algorithm in this approach. In the third approach, we further loose the constraint of the input image to arbitrary in the wild images. Our proposed approach can robustly reconstruct high quality model even with extreme expressions and large poses. We then explore the applicability of our face reconstructions on four interesting applications: video face beautification, generating personalized facial blendshape from image sequences, face video stylizing and video face replacement. We demonstrate great potentials of our reconstruction approaches on these real-world applications. In particular, with the recent surge of interests in VR/AR, it is increasingly common to see people wearing head-mounted displays. However, the large occlusion on face is a big obstacle for people to communicate in a face-to-face manner. Our another application is that we explore hardware/software solutions for synthesizing the face image with presence of HMDs. We design two setups (experimental and mobile) which integrate two near IR cameras and one color camera to solve this problem. With our algorithm and prototype, we can achieve photo-realistic results. We further propose a deep neutral network to solve the HMD removal problem considering it as a face inpainting problem. This approach doesn\u27t need special hardware and run in real-time with satisfying results

    Zernike velocity moments for sequence-based description of moving features

    No full text
    The increasing interest in processing sequences of images motivates development of techniques for sequence-based object analysis and description. Accordingly, new velocity moments have been developed to allow a statistical description of both shape and associated motion through an image sequence. Through a generic framework motion information is determined using the established centralised moments, enabling statistical moments to be applied to motion based time series analysis. The translation invariant Cartesian velocity moments suffer from highly correlated descriptions due to their non-orthogonality. The new Zernike velocity moments overcome this by using orthogonal spatial descriptions through the proven orthogonal Zernike basis. Further, they are translation and scale invariant. To illustrate their benefits and application the Zernike velocity moments have been applied to gait recognition—an emergent biometric. Good recognition results have been achieved on multiple datasets using relatively few spatial and/or motion features and basic feature selection and classification techniques. The prime aim of this new technique is to allow the generation of statistical features which encode shape and motion information, with generic application capability. Applied performance analyses illustrate the properties of the Zernike velocity moments which exploit temporal correlation to improve a shape's description. It is demonstrated how the temporal correlation improves the performance of the descriptor under more generalised application scenarios, including reduced resolution imagery and occlusion

    {Vid2Curve}: {S}imultaneous Camera Motion Estimation and Thin Structure Reconstruction from an {RGB} Video

    Get PDF
    Thin structures, such as wire-frame sculptures, fences, cables, power lines, and tree branches, are common in the real world. It is extremely challenging to acquire their 3D digital models using traditional image-based or depth-based reconstruction methods because thin structures often lack distinct point features and have severe self-occlusion. We propose the first approach that simultaneously estimates camera motion and reconstructs the geometry of complex 3D thin structures in high quality from a color video captured by a handheld camera. Specifically, we present a new curve-based approach to estimate accurate camera poses by establishing correspondences between featureless thin objects in the foreground in consecutive video frames, without requiring visual texture in the background scene to lock on. Enabled by this effective curve-based camera pose estimation strategy, we develop an iterative optimization method with tailored measures on geometry, topology as well as self-occlusion handling for reconstructing 3D thin structures. Extensive validations on a variety of thin structures show that our method achieves accurate camera pose estimation and faithful reconstruction of 3D thin structures with complex shape and topology at a level that has not been attained by other existing reconstruction methods

    3D scene and object parsing from a single image

    Get PDF
    The term 3D parsing refers to the process of segmenting and labeling the 3D space into expressive categories of voxels, point clouds or surfaces. Humans can effortlessly perceive the 3D scene and the unseen part of an object from a single image with a limited field of view. In the same sense, a robot that is designed to execute a few human-like actions should be able to infer the 3D visual world, from a single snapshot of a 2D sensor such as a camera, or a 2.5D sensor such as a Kinect depth equipment. In this thesis, we focus on 3D scene and object parsing from a single image, aiming to produce a 3D parse that is able to support applications like robotics and navigation. Our goal is to produce an expressive 3D parse: e.g., what is it, where is it, how can humans move and interact with it. Inferring such a 3D parse from a single image is not trivial. The main challenges are: the unknown separation of layout surfaces and objects; the high degree of occlusions and the diverse classes of objects in the cluttered scene; how to represent 3D object geometry in a way that can be predicted from noisy or partial observations, and can help assist reasoning like contact, support and extent. In this thesis, we put forward the hypothesis and prove in experiments, that a data-driven approach is able to directly produce a complete 3D recovery from 2D partial observations. Moreover, we show that by imposing constraints of 3D patterns and priors into the learned model (e.g., layout surfaces are flat and orthogonal to adjacent surfaces, support height can reveal the full extent of an occluded object, 2D complete silhouettes can guide reconstructions beyond partial foreground occlusions, and a shape can be decomposed into a set of simple parts), we are able to obtain a more accurate reconstruction of the scene and a structural representation of the object. We present our approaches at different levels of detail, from a rough layout level to a more complex scene level and finally to the most detailed object level. We start by estimating the 3D room layout from a single RGB image, proposing an approach that generalizes across panoramas and perspective images, cuboid layouts and more general layouts (e.g., “L”-shape room). We then make use of an additional depth image, explore at the scene level to recover the complete 3D scene with layouts and all objects jointly. At the object level, we propose to recover each 3D object with robustness to possible partial foreground occlusions. Finally, we represent each 3D object as a 3D composite of sets of primitives, recurrently parsing each shape into primitives given a single depth view. We demonstrate the efficacy of each proposed approach with extensive experiments both quantitatively and qualitatively on public datasets
    • …
    corecore