42 research outputs found
Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors
We present a method to infer 3D pose and shape of vehicles from a single
image. To tackle this ill-posed problem, we optimize two-scale projection
consistency between the generated 3D hypotheses and their 2D
pseudo-measurements. Specifically, we use a morphable wireframe model to
generate a fine-scaled representation of vehicle shape and pose. To reduce its
sensitivity to 2D landmarks, we jointly model the 3D bounding box as a coarse
representation which improves robustness. We also integrate three task priors,
including unsupervised monocular depth, a ground plane constraint as well as
vehicle shape priors, with forward projection errors into an overall energy
function.Comment: Proc. of the AAAI, September 201
Hierarchical Object Parsing from Structured Noisy Point Clouds
Object parsing and segmentation from point clouds are challenging tasks
because the relevant data is available only as thin structures along object
boundaries or other features, and is corrupted by large amounts of noise. To
handle this kind of data, flexible shape models are desired that can accurately
follow the object boundaries. Popular models such as Active Shape and Active
Appearance models lack the necessary flexibility for this task, while recent
approaches such as the Recursive Compositional Models make model
simplifications in order to obtain computational guarantees. This paper
investigates a hierarchical Bayesian model of shape and appearance in a
generative setting. The input data is explained by an object parsing layer,
which is a deformation of a hidden PCA shape model with Gaussian prior. The
paper also introduces a novel efficient inference algorithm that uses informed
data-driven proposals to initialize local searches for the hidden variables.
Applied to the problem of object parsing from structured point clouds such as
edge detection images, the proposed approach obtains state of the art parsing
errors on two standard datasets without using any intensity information.Comment: 13 pages, 16 figure
Multi-Scale 3D Scene Flow from Binocular Stereo Sequences
Scene flow methods estimate the three-dimensional motion field for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation. This paper describes an alternative formulation for dense scene flow estimation that provides reliable results using only two cameras by fusing stereo and optical flow estimation into a single coherent framework. Internally, the proposed algorithm generates probability distributions for optical flow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene flow than previous methods allow. To handle the aperture problems inherent in the estimation of optical flow and disparity, a multi-scale method along with a novel region-based technique is used within a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization – two problems commonly associated with the basic multi-scale approaches. Experiments with synthetic and real test data demonstrate the strength of the proposed approach.National Science Foundation (CNS-0202067, IIS-0208876); Office of Naval Research (N00014-03-1-0108
Virtual Objects on Real Oceans
International audienceAugmented Reality (AR) aims to provide means to integrate virtual objects in a real scene. In that context it is often necessary to recover geometrical information such as objects shapes from the scene in order to add new objects. This paper proposes a semiautomatic method to reconstruct the surface of the ocean from a real ocean scene. A detection algorithm is applied to identify significant waves crestlines. A virtual ocean is then reconstructed using Gerstner model; its parameters are inferred and adjusted by the user to match the crestlines and to provide a smooth reconstruction between adjacent waves. An application is presented to insert a virtual object in the real ocean scene that computes correct occlusions between the ocean surface and the object and uses OpenGL for real-time renderin
Learning monocular 3D reconstruction of articulated categories from motion
Monocular 3D reconstruction of articulated object categories is challenging
due to the lack of training data and the inherent ill-posedness of the problem.
In this work we use video self-supervision, forcing the consistency of
consecutive 3D reconstructions by a motion-based cycle loss. This largely
improves both optimization-based and learning-based 3D mesh reconstruction. We
further introduce an interpretable model of 3D template deformations that
controls a 3D surface through the displacement of a small number of local,
learnable handles. We formulate this operation as a structured layer relying on
mesh-laplacian regularization and show that it can be trained in an end-to-end
manner. We finally introduce a per-sample numerical optimisation approach that
jointly optimises over mesh displacements and cameras within a video, boosting
accuracy both for training and also as test time post-processing. While relying
exclusively on a small set of videos collected per category for supervision, we
obtain state-of-the-art reconstructions with diverse shapes, viewpoints and
textures for multiple articulated object categories.Comment: For project website see
https://fkokkinos.github.io/video_3d_reconstruction
To The Point: Correspondence-driven monocular 3D category reconstruction
We present To The Point (TTP), a method for reconstructing 3D objects from a single image using 2D to 3D correspondences learned from weak supervision. We recover a 3D shape from a 2D image by first regressing the 2D positions corresponding to the 3D template vertices and then jointly estimating a rigid camera transform and non-rigid template deformation that optimally explain the 2D positions through the 3D shape projection. By relying on 3D-2D correspondences we use a simple per-sample optimization problem to replace CNN-based regression of camera pose and non-rigid deformation and thereby obtain substantially more accurate 3D reconstructions. We treat this optimization as a differentiable layer and train the whole system in an end-to-end manner. We report systematic quantitative improvements on multiple categories and provide qualitative results comprising diverse shape, pose and texture prediction examples. Project website: https://fkokkinos.github.io/to_the_point