71 research outputs found
Pix2Vox: Context-aware 3D Reconstruction from Single and Multi-view Images
Recovering the 3D representation of an object from single-view or multi-view
RGB images by deep neural networks has attracted increasing attention in the
past few years. Several mainstream works (e.g., 3D-R2N2) use recurrent neural
networks (RNNs) to fuse multiple feature maps extracted from input images
sequentially. However, when given the same set of input images with different
orders, RNN-based approaches are unable to produce consistent reconstruction
results. Moreover, due to long-term memory loss, RNNs cannot fully exploit
input images to refine reconstruction results. To solve these problems, we
propose a novel framework for single-view and multi-view 3D reconstruction,
named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse
3D volume from each input image. Then, a context-aware fusion module is
introduced to adaptively select high-quality reconstructions for each part
(e.g., table legs) from different coarse 3D volumes to obtain a fused 3D
volume. Finally, a refiner further refines the fused 3D volume to generate the
final output. Experimental results on the ShapeNet and Pix3D benchmarks
indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large
margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in
terms of backward inference time. The experiments on ShapeNet unseen 3D
categories have shown the superior generalization abilities of our method.Comment: ICCV 201
Motion sequence analysis in the presence of figural cues
Published in final edited form as: Neurocomputing. 2015 January 5, 147: 485â491The perception of 3-D structure in dynamic sequences is believed to be subserved primarily through the use of motion cues. However, real-world sequences contain many figural shape cues besides the dynamic ones. We hypothesize that if figural cues are perceptually significant during sequence analysis, then inconsistencies in these cues over time would lead to percepts of non-rigidity in sequences showing physically rigid objects in motion. We develop an experimental paradigm to test this hypothesis and present results with two patients with impairments in motion perception due to focal neurological damage, as well as two control subjects. Consistent with our hypothesis, the data suggest that figural cues strongly influence the perception of structure in motion sequences, even to the extent of inducing non-rigid percepts in sequences where motion information alone would yield rigid structures. Beyond helping to probe the issue of shape perception, our experimental paradigm might also serve as a possible perceptual assessment tool in a clinical setting.The authors wish to thank all observers who participated in the experiments reported here. This research and the preparation of this manuscript was supported by the National Institutes of Health RO1 NS064100 grant to LMV. (RO1 NS064100 - National Institutes of Health)Accepted manuscrip
CAGD based 3-D visual recognition
Journal ArticleA coherent automated manufacturing system needs to include CAD/CAM, computer vision, and object manipulation. Currently, most systems which support CAD/CAM do not provide for vision or manipulation and similarly, vision and manipulation systems incorporate no explicit relation to CAD/CAM models. CAD/CAM systems have emerged which allow the designer to conceive and model an object and automatically manufacture the object to the prescribed specifications. !f recognition or manipulation is to be performed, existing vision systems rely on models generated in an ad hoc manner for the vision or recognition process. Although both Vision and CAD/CAM systems rely on models of the objects involved, different modeling schemes are used in each case. A more unified system will allow vision models to be generated from the CAD database. We are implementing a framework in which objects are designed using an existing CAGD system and recognition strategies based on these design models are used for visual recognition and manipulation. An example of its application is given
The synthesis of visual recognition strategies
Journal ArticleA coherent automated manufacturing system needs to include CAD/CAM, computer vision, and object manipulation. Currently, most systems which support CAD/CAM do not provide for vision or manipulation and similarly, vision and manipulation systems incorporate no explicit relation to CAD/ CAM models. CAD/CAM systems have emerged which allow the designer to conceive and model an object and automatically manufacture the object to the prescribed specifications. If recognition or manipulation is to be performed, existing vision systems rely on models generated in an ad hoc manner for the vision or recognition process. Although both Vision and CAD/CAM systems rely on models of the objects involved, different modeling schemes are used in each case. A more unified system will allow vision models to be generated from the CAD database. The model generation should be guided by the class of object being constructed, the constraints of the vision algorithms used and the constraints imposed by the robotic workcell environment (fixtures, sensors, manipulators and effectors). We are implementing a framework in which objects are designed using an existing CAGD system and recognition strategies (logical sensor specifications) are automatically synthesized and used for visual recognition and manipulation
Aperture Supervision for Monocular Depth Estimation
We present a novel method to train machine learning algorithms to estimate
scene depths from a single image, by using the information provided by a
camera's aperture as supervision. Prior works use a depth sensor's outputs or
images of the same scene from alternate viewpoints as supervision, while our
method instead uses images from the same viewpoint taken with a varying camera
aperture. To enable learning algorithms to use aperture effects as supervision,
we introduce two differentiable aperture rendering functions that use the input
image and predicted depths to simulate the depth-of-field effects caused by
real camera apertures. We train a monocular depth estimation network end-to-end
to predict the scene depths that best explain these finite aperture images as
defocus-blurred renderings of the input all-in-focus image.Comment: To appear at CVPR 2018 (updated to camera ready version
Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors
The impressive performance of deep convolutional neural networks in
single-view 3D reconstruction suggests that these models perform non-trivial
reasoning about the 3D structure of the output space. However, recent work has
challenged this belief, showing that complex encoder-decoder architectures
perform similarly to nearest-neighbor baselines or simple linear decoder models
that exploit large amounts of per category data in standard benchmarks. On the
other hand settings where 3D shape must be inferred for new categories with few
examples are more natural and require models that generalize about shapes. In
this work we demonstrate experimentally that naive baselines do not apply when
the goal is to learn to reconstruct novel objects using very few examples, and
that in a \emph{few-shot} learning setting, the network must learn concepts
that can be applied to new categories, avoiding rote memorization. To address
deficiencies in existing approaches to this problem, we propose three
approaches that efficiently integrate a class prior into a 3D reconstruction
model, allowing to account for intra-class variability and imposing an implicit
compositional structure that the model should learn. Experiments on the popular
ShapeNet database demonstrate that our method significantly outperform existing
baselines on this task in the few-shot setting
Variational Uncalibrated Photometric Stereo under General Lighting
Photometric stereo (PS) techniques nowadays remain constrained to an ideal
laboratory setup where modeling and calibration of lighting is amenable. To
eliminate such restrictions, we propose an efficient principled variational
approach to uncalibrated PS under general illumination. To this end, the
Lambertian reflectance model is approximated through a spherical harmonic
expansion, which preserves the spatial invariance of the lighting. The joint
recovery of shape, reflectance and illumination is then formulated as a single
variational problem. There the shape estimation is carried out directly in
terms of the underlying perspective depth map, thus implicitly ensuring
integrability and bypassing the need for a subsequent normal integration. To
tackle the resulting nonconvex problem numerically, we undertake a two-phase
procedure to initialize a balloon-like perspective depth map, followed by a
"lagged" block coordinate descent scheme. The experiments validate efficiency
and robustness of this approach. Across a variety of evaluations, we are able
to reduce the mean angular error consistently by a factor of 2-3 compared to
the state-of-the-art.Comment: Haefner and Ye contributed equall
Exploratory Procedure for Computer Vision
This paper deals with Exploratory Procedures for Computer Vision. The assumptions are that we have a mobile camera system with controllable focus, close/open aperture, and ability of recording its position, orientation and movement. Furthermore we assume an unknown and unstructured environment. For our analysis we consider two types of illumination sources: the point source and the extended sky-like source. The exploratory procedures determine the illumination energy, in some cases the illumination orientation, the albedo and the differentiation between the true 3D scene and its picture. The key idea is the mobile active observer
- âŠ