798 research outputs found
Frustum PointNets for 3D Object Detection from RGB-D Data
In this work, we study 3D object detection from RGB-D data in both indoor and
outdoor scenes. While previous methods focus on images or 3D voxels, often
obscuring natural 3D patterns and invariances of 3D data, we directly operate
on raw point clouds by popping up RGB-D scans. However, a key challenge of this
approach is how to efficiently localize objects in point clouds of large-scale
scenes (region proposal). Instead of solely relying on 3D proposals, our method
leverages both mature 2D object detectors and advanced 3D deep learning for
object localization, achieving efficiency as well as high recall for even small
objects. Benefited from learning directly in raw point clouds, our method is
also able to precisely estimate 3D bounding boxes even under strong occlusion
or with very sparse points. Evaluated on KITTI and SUN RGB-D 3D detection
benchmarks, our method outperforms the state of the art by remarkable margins
while having real-time capability.Comment: 15 pages, 12 figures, 14 table
A^2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes
Constructing of molecular structural models from Cryo-Electron Microscopy
(Cryo-EM) density volumes is the critical last step of structure determination
by Cryo-EM technologies. Methods have evolved from manual construction by
structural biologists to perform 6D translation-rotation searching, which is
extremely compute-intensive. In this paper, we propose a learning-based method
and formulate this problem as a vision-inspired 3D detection and pose
estimation task. We develop a deep learning framework for amino acid
determination in a 3D Cryo-EM density volume. We also design a sequence-guided
Monte Carlo Tree Search (MCTS) to thread over the candidate amino acids to form
the molecular structure. This framework achieves 91% coverage on our newly
proposed dataset and takes only a few minutes for a typical structure with a
thousand amino acids. Our method is hundreds of times faster and several times
more accurate than existing automated solutions without any human intervention.Comment: 8 pages, 5 figures, 4 table
Learning to Find Good Correspondences
We develop a deep architecture to learn to find good correspondences for
wide-baseline stereo. Given a set of putative sparse matches and the camera
intrinsics, we train our network in an end-to-end fashion to label the
correspondences as inliers or outliers, while simultaneously using them to
recover the relative pose, as encoded by the essential matrix. Our architecture
is based on a multi-layer perceptron operating on pixel coordinates rather than
directly on the image, and is thus simple and small. We introduce a novel
normalization technique, called Context Normalization, which allows us to
process each data point separately while imbuing it with global information,
and also makes the network invariant to the order of the correspondences. Our
experiments on multiple challenging datasets demonstrate that our method is
able to drastically improve the state of the art with little training data.Comment: CVPR 2018 (Oral
- …