56,958 research outputs found
Redemption from Range-view for Accurate 3D Object Detection
Most recent approaches for 3D object detection predominantly rely on
point-view or bird's-eye view representations, with limited exploration of
range-view-based methods. The range-view representation suffers from scale
variation and surface texture deficiency, both of which pose significant
limitations for developing corresponding methods. Notably, the surface texture
loss problem has been largely ignored by all existing methods, despite its
significant impact on the accuracy of range-view-based 3D object detection. In
this study, we propose Redemption from Range-view R-CNN (R2 R-CNN), a novel and
accurate approach that comprehensively explores the range-view representation.
Our proposed method addresses scale variation through the HD Meta Kernel, which
captures range-view geometry information in multiple scales. Additionally, we
introduce Feature Points Redemption (FPR) to recover the lost 3D surface
texture information from the range view, and Synchronous-Grid RoI Pooling
(S-Grid RoI Pooling), a multi-scaled approach with multiple receptive fields
for accurate box refinement. Our R2 R-CNN outperforms existing range-view-based
methods, achieving state-of-the-art performance on both the KITTI benchmark and
the Waymo Open Dataset. Our study highlights the critical importance of
addressing the surface texture loss problem for accurate 3D object detection in
range-view-based methods. Codes will be made publicly available
Pointwise Convolutional Neural Networks
Deep learning with 3D data such as reconstructed point clouds and CAD models
has received great research interests recently. However, the capability of
using point clouds with convolutional neural network has been so far not fully
explored. In this paper, we present a convolutional neural network for semantic
segmentation and object recognition with 3D point clouds. At the core of our
network is pointwise convolution, a new convolution operator that can be
applied at each point of a point cloud. Our fully convolutional network design,
while being surprisingly simple to implement, can yield competitive accuracy in
both semantic segmentation and object recognition task.Comment: 10 pages, 6 figures, 10 tables. Paper accepted to CVPR 201
Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling
Unlike on images, semantic learning on 3D point clouds using a deep network
is challenging due to the naturally unordered data structure. Among existing
works, PointNet has achieved promising results by directly learning on point
sets. However, it does not take full advantage of a point's local neighborhood
that contains fine-grained structural information which turns out to be helpful
towards better semantic learning. In this regard, we present two new operations
to improve PointNet with a more efficient exploitation of local structures. The
first one focuses on local 3D geometric structures. In analogy to a convolution
kernel for images, we define a point-set kernel as a set of learnable 3D points
that jointly respond to a set of neighboring data points according to their
geometric affinities measured by kernel correlation, adapted from a similar
technique for point cloud registration. The second one exploits local
high-dimensional feature structures by recursive feature aggregation on a
nearest-neighbor-graph computed from 3D positions. Experiments show that our
network can efficiently capture local information and robustly achieve better
performances on major datasets. Our code is available at
http://www.merl.com/research/license#KCNetComment: Accepted in CVPR'18. *indicates equal contributio
Geometric modeling of non-rigid 3D shapes : theory and application to object recognition.
One of the major goals of computer vision is the development of flexible and efficient methods for shape representation. This is true, especially for non-rigid 3D shapes where a great variety of shapes are produced as a result of deformations of a non-rigid object. Modeling these non-rigid shapes is a very challenging problem. Being able to analyze the properties of such shapes and describe their behavior is the key issue in research. Also, considering photometric features can play an important role in many shape analysis applications, such as shape matching and correspondence because it contains rich information about the visual appearance of real objects. This new information (contained in photometric features) and its important applications add another, new dimension to the problem\u27s difficulty. Two main approaches have been adopted in the literature for shape modeling for the matching and retrieval problem, local and global approaches. Local matching is performed between sparse points or regions of the shape, while the global shape approaches similarity is measured among entire models. These methods have an underlying assumption that shapes are rigidly transformed. And Most descriptors proposed so far are confined to shape, that is, they analyze only geometric and/or topological properties of 3D models. A shape descriptor or model should be isometry invariant, scale invariant, be able to capture the fine details of the shape, computationally efficient, and have many other good properties. A shape descriptor or model is needed. This shape descriptor should be: able to deal with the non-rigid shape deformation, able to handle the scale variation problem with less sensitivity to noise, able to match shapes related to the same class even if these shapes have missing parts, and able to encode both the photometric, and geometric information in one descriptor. This dissertation will address the problem of 3D non-rigid shape representation and textured 3D non-rigid shapes based on local features. Two approaches will be proposed for non-rigid shape matching and retrieval based on Heat Kernel (HK), and Scale-Invariant Heat Kernel (SI-HK) and one approach for modeling textured 3D non-rigid shapes based on scale-invariant Weighted Heat Kernel Signature (WHKS). For the first approach, the Laplace-Beltrami eigenfunctions is used to detect a small number of critical points on the shape surface. Then a shape descriptor is formed based on the heat kernels at the detected critical points for different scales. Sparse representation is used to reduce the dimensionality of the calculated descriptor. The proposed descriptor is used for classification via the Collaborative Representation-based Classification with a Regularized Least Square (CRC-RLS) algorithm. The experimental results have shown that the proposed descriptor can achieve state-of-the-art results on two benchmark data sets. For the second approach, an improved method to introduce scale-invariance has been also proposed to avoid noise-sensitive operations in the original transformation method. Then a new 3D shape descriptor is formed based on the histograms of the scale-invariant HK for a number of critical points on the shape at different time scales. A Collaborative Classification (CC) scheme is then employed for object classification. The experimental results have shown that the proposed descriptor can achieve high performance on the two benchmark data sets. An important observation from the experiments is that the proposed approach is more able to handle data under several distortion scenarios (noise, shot-noise, scale, and under missing parts) than the well-known approaches. For modeling textured 3D non-rigid shapes, this dissertation introduces, for the first time, a mathematical framework for the diffusion geometry on textured shapes. This dissertation presents an approach for shape matching and retrieval based on a weighted heat kernel signature. It shows how to include photometric information as a weight over the shape manifold, and it also propose a novel formulation for heat diffusion over weighted manifolds. Then this dissertation presents a new discretization method for the weighted heat kernel induced by the linear FEM weights. Finally, the weighted heat kernel signature is used as a shape descriptor. The proposed descriptor encodes both the photometric, and geometric information based on the solution of one equation. Finally, this dissertation proposes an approach for 3D face recognition based on the front contours of heat propagation over the face surface. The front contours are extracted automatically as heat is propagating starting from a detected set of landmarks. The propagation contours are used to successfully discriminate the various faces. The proposed approach is evaluated on the largest publicly available database of 3D facial images and successfully compared to the state-of-the-art approaches in the literature. This work can be extended to the problem of dense correspondence between non-rigid shapes. The proposed approaches with the properties of the Laplace-Beltrami eigenfunction can be utilized for 3D mesh segmentation. Another possible application of the proposed approach is the view point selection for 3D objects by selecting the most informative views that collectively provide the most descriptive presentation of the surface
Online learning and fusion of orientation appearance models for robust rigid object tracking
We introduce a robust framework for learning and fusing of orientation appearance models based on both texture and depth information for rigid object tracking. Our framework fuses data obtained from a standard visual camera and dense depth maps obtained by low-cost consumer depth cameras such as the Kinect. To combine these two completely different modalities, we propose to use features that do not depend on the data representation: angles. More specifically, our framework combines image gradient orientations as extracted from intensity images with the directions of surface normals computed from dense depth fields. We propose to capture the correlations between the obtained orientation appearance models using a fusion approach motivated by the original Active Appearance Models (AAMs). To incorporate these features in a learning framework, we use a robust kernel based on the Euler representation of angles which does not require off-line training, and can be efficiently implemented online. The robustness of learning from orientation appearance models is presented both theoretically and experimentally in this work. This kernel enables us to cope with gross measurement errors, missing data as well as other typical problems such as illumination changes and occlusions. By combining the proposed models with a particle filter, the proposed framework was used for performing 2D plus 3D rigid object tracking, achieving robust performance in very difficult tracking scenarios including extreme pose variations. © 2014 Elsevier B.V. All rights reserved
- …