107,606 research outputs found

    Teaching Compositionality to CNNs

    Full text link
    Convolutional neural networks (CNNs) have shown great success in computer vision, approaching human-level performance when trained for specific tasks via application-specific loss functions. In this paper, we propose a method for augmenting and training CNNs so that their learned features are compositional. It encourages networks to form representations that disentangle objects from their surroundings and from each other, thereby promoting better generalization. Our method is agnostic to the specific details of the underlying CNN to which it is applied and can in principle be used with any CNN. As we show in our experiments, the learned representations lead to feature activations that are more localized and improve performance over non-compositional baselines in object recognition tasks.Comment: Preprint appearing in CVPR 201

    PIXOR: Real-time 3D Object Detection from Point Clouds

    Full text link
    We address the problem of real-time 3D object detection from point clouds in the context of autonomous driving. Computation speed is critical as detection is a necessary component for safety. Existing approaches are, however, expensive in computation due to high dimensionality of point clouds. We utilize the 3D data more efficiently by representing the scene from the Bird's Eye View (BEV), and propose PIXOR, a proposal-free, single-stage detector that outputs oriented 3D object estimates decoded from pixel-wise neural network predictions. The input representation, network architecture, and model optimization are especially designed to balance high accuracy and real-time efficiency. We validate PIXOR on two datasets: the KITTI BEV object detection benchmark, and a large-scale 3D vehicle detection benchmark. In both datasets we show that the proposed detector surpasses other state-of-the-art methods notably in terms of Average Precision (AP), while still runs at >28 FPS.Comment: Update of CVPR2018 paper: correct timing, fix typos, add acknowledgemen

    DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image

    Full text link
    3D reconstruction from a single image is a key problem in multiple applications ranging from robotic manipulation to augmented reality. Prior methods have tackled this problem through generative models which predict 3D reconstructions as voxels or point clouds. However, these methods can be computationally expensive and miss fine details. We introduce a new differentiable layer for 3D data deformation and use it in DeformNet to learn a model for 3D reconstruction-through-deformation. DeformNet takes an image input, searches the nearest shape template from a database, and deforms the template to match the query image. We evaluate our approach on the ShapeNet dataset and show that - (a) the Free-Form Deformation layer is a powerful new building block for Deep Learning models that manipulate 3D data (b) DeformNet uses this FFD layer combined with shape retrieval for smooth and detail-preserving 3D reconstruction of qualitatively plausible point clouds with respect to a single query image (c) compared to other state-of-the-art 3D reconstruction methods, DeformNet quantitatively matches or outperforms their benchmarks by significant margins. For more information, visit: https://deformnet-site.github.io/DeformNet-website/ .Comment: 11 pages, 9 figures, NIP

    Compact Model Representation for 3D Reconstruction

    Full text link
    3D reconstruction from 2D images is a central problem in computer vision. Recent works have been focusing on reconstruction directly from a single image. It is well known however that only one image cannot provide enough information for such a reconstruction. A prior knowledge that has been entertained are 3D CAD models due to its online ubiquity. A fundamental question is how to compactly represent millions of CAD models while allowing generalization to new unseen objects with fine-scaled geometry. We introduce an approach to compactly represent a 3D mesh. Our method first selects a 3D model from a graph structure by using a novel free-form deformation FFD 3D-2D registration, and then the selected 3D model is refined to best fit the image silhouette. We perform a comprehensive quantitative and qualitative analysis that demonstrates impressive dense and realistic 3D reconstruction from single images.Comment: 9 pages, 6 figure

    Hybrid Bayesian Eigenobjects: Combining Linear Subspace and Deep Network Methods for 3D Robot Vision

    Full text link
    We introduce Hybrid Bayesian Eigenobjects (HBEOs), a novel representation for 3D objects designed to allow a robot to jointly estimate the pose, class, and full 3D geometry of a novel object observed from a single viewpoint in a single practical framework. By combining both linear subspace methods and deep convolutional prediction, HBEOs efficiently learn nonlinear object representations without directly regressing into high-dimensional space. HBEOs also remove the onerous and generally impractical necessity of input data voxelization prior to inference. We experimentally evaluate the suitability of HBEOs to the challenging task of joint pose, class, and shape inference on novel objects and show that, compared to preceding work, HBEOs offer dramatically improved performance in all three tasks along with several orders of magnitude faster runtime performance.Comment: To appear in the International Conference on Intelligent Robots (IROS) - Madrid, 201

    Multi-body Non-rigid Structure-from-Motion

    Get PDF
    Conventional structure-from-motion (SFM) research is primarily concerned with the 3D reconstruction of a single, rigidly moving object seen by a static camera, or a static and rigid scene observed by a moving camera --in both cases there are only one relative rigid motion involved. Recent progress have extended SFM to the areas of {multi-body SFM} (where there are {multiple rigid} relative motions in the scene), as well as {non-rigid SFM} (where there is a single non-rigid, deformable object or scene). Along this line of thinking, there is apparently a missing gap of "multi-body non-rigid SFM", in which the task would be to jointly reconstruct and segment multiple 3D structures of the multiple, non-rigid objects or deformable scenes from images. Such a multi-body non-rigid scenario is common in reality (e.g. two persons shaking hands, multi-person social event), and how to solve it represents a natural {next-step} in SFM research. By leveraging recent results of subspace clustering, this paper proposes, for the first time, an effective framework for multi-body NRSFM, which simultaneously reconstructs and segments each 3D trajectory into their respective low-dimensional subspace. Under our formulation, 3D trajectories for each non-rigid structure can be well approximated with a sparse affine combination of other 3D trajectories from the same structure (self-expressiveness). We solve the resultant optimization with the alternating direction method of multipliers (ADMM). We demonstrate the efficacy of the proposed framework through extensive experiments on both synthetic and real data sequences. Our method clearly outperforms other alternative methods, such as first clustering the 2D feature tracks to groups and then doing non-rigid reconstruction in each group or first conducting 3D reconstruction by using single subspace assumption and then clustering the 3D trajectories into groups.Comment: 21 pages, 16 figure

    3D ShapeNets: A Deep Representation for Volumetric Shapes

    Full text link
    3D shape is a crucial but heavily underutilized cue in today's computer vision systems, mostly due to the lack of a good generic shape representation. With the recent availability of inexpensive 2.5D depth sensors (e.g. Microsoft Kinect), it is becoming increasingly important to have a powerful 3D shape representation in the loop. Apart from category recognition, recovering full 3D shapes from view-based 2.5D depth maps is also a critical part of visual understanding. To this end, we propose to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network. Our model, 3D ShapeNets, learns the distribution of complex 3D shapes across different object categories and arbitrary poses from raw CAD data, and discovers hierarchical compositional part representations automatically. It naturally supports joint object recognition and shape completion from 2.5D depth maps, and it enables active object recognition through view planning. To train our 3D deep learning model, we construct ModelNet -- a large-scale 3D CAD model dataset. Extensive experiments show that our 3D deep representation enables significant performance improvement over the-state-of-the-arts in a variety of tasks.Comment: to be appeared in CVPR 201
    corecore