19 research outputs found
Is That a Chair? Imagining Affordances Using Simulations of an Articulated Human Body
For robots to exhibit a high level of intelligence in the real world, they
must be able to assess objects for which they have no prior knowledge.
Therefore, it is crucial for robots to perceive object affordances by reasoning
about physical interactions with the object. In this paper, we propose a novel
method to provide robots with an ability to imagine object affordances using
physical simulations. The class of chair is chosen here as an initial category
of objects to illustrate a more general paradigm. In our method, the robot
"imagines" the affordance of an arbitrarily oriented object as a chair by
simulating a physical sitting interaction between an articulated human body and
the object. This object affordance reasoning is used as a cue for object
classification (chair vs non-chair). Moreover, if an object is classified as a
chair, the affordance reasoning can also predict the upright pose of the object
which allows the sitting interaction to take place. We call this type of poses
the functional pose. We demonstrate our method in chair classification on
synthetic 3D CAD models. Although our method uses only 30 models for training,
it outperforms appearance-based deep learning methods, which require a large
amount of training data, when the upright orientation is not assumed to be
known a priori. In addition, we showcase that the functional pose predictions
of our method align well with human judgments on both synthetic models and real
objects scanned by a depth camera.Comment: 7 pages, 6 figures. Accepted to ICRA202
PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning
Contrastive Language-Image Pre-training (CLIP) has shown promising open-world
performance on 2D image tasks, while its transferred capacity on 3D point
clouds, i.e., PointCLIP, is still far from satisfactory. In this work, we
propose PointCLIP V2, a powerful 3D open-world learner, to fully unleash the
potential of CLIP on 3D point cloud data. First, we introduce a realistic shape
projection module to generate more realistic depth maps for CLIP's visual
encoder, which is quite efficient and narrows the domain gap between projected
point clouds with natural images. Second, we leverage large-scale language
models to automatically design a more descriptive 3D-semantic prompt for CLIP's
textual encoder, instead of the previous hand-crafted one. Without introducing
any training in 3D domains, our approach significantly surpasses PointCLIP by
+42.90%, +40.44%, and +28.75% accuracy on three datasets for zero-shot 3D
classification. Furthermore, PointCLIP V2 can be extended to few-shot
classification, zero-shot part segmentation, and zero-shot 3D object detection
in a simple manner, demonstrating our superior generalization ability for 3D
open-world learning. Code will be available at
https://github.com/yangyangyang127/PointCLIP_V2
Fast Hybrid Cascade for Voxel-based 3D Object Classification
Voxel-based 3D object classification has been frequently studied in recent
years. The previous methods often directly convert the classic 2D convolution
into a 3D form applied to an object with binary voxel representation. In this
paper, we investigate the reason why binary voxel representation is not very
suitable for 3D convolution and how to simultaneously improve the performance
both in accuracy and speed. We show that by giving each voxel a signed distance
value, the accuracy will gain about 30% promotion compared with binary voxel
representation using a two-layer fully connected network. We then propose a
fast fully connected and convolution hybrid cascade network for voxel-based 3D
object classification. This threestage cascade network can divide 3D models
into three categories: easy, moderate and hard. Consequently, the mean
inference time (0.3ms) can speedup about 5x and 2x compared with the
state-of-the-art point cloud and voxel based methods respectively, while
achieving the highest accuracy in the latter category of methods (92%).
Experiments with ModelNet andMNIST verify the performance of the proposed
hybrid cascade network