10,126 research outputs found
Parsing Occluded People by Flexible Compositions
This paper presents an approach to parsing humans when there is significant
occlusion. We model humans using a graphical model which has a tree structure
building on recent work [32, 6] and exploit the connectivity prior that, even
in presence of occlusion, the visible nodes form a connected subtree of the
graphical model. We call each connected subtree a flexible composition of
object parts. This involves a novel method for learning occlusion cues. During
inference we need to search over a mixture of different flexible models. By
exploiting part sharing, we show that this inference can be done extremely
efficiently requiring only twice as many computations as searching for the
entire object (i.e., not modeling occlusion). We evaluate our model on the
standard benchmarked "We Are Family" Stickmen dataset and obtain significant
performance improvements over the best alternative algorithms.Comment: CVPR 15 Camera Read
Learning Language from a Large (Unannotated) Corpus
A novel approach to the fully automated, unsupervised extraction of
dependency grammars and associated syntax-to-semantic-relationship mappings
from large text corpora is described. The suggested approach builds on the
authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well
as on a number of prior papers and approaches from the statistical language
learning literature. If successful, this approach would enable the mining of
all the information needed to power a natural language comprehension and
generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa
MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization
We present MultiBodySync, a novel, end-to-end trainable multi-body motion
segmentation and rigid registration framework for multiple input 3D point
clouds. The two non-trivial challenges posed by this multi-scan multibody
setting that we investigate are: (i) guaranteeing correspondence and
segmentation consistency across multiple input point clouds capturing different
spatial arrangements of bodies or body parts; and (ii) obtaining robust
motion-based rigid body segmentation applicable to novel object categories. We
propose an approach to address these issues that incorporates spectral
synchronization into an iterative deep declarative network, so as to
simultaneously recover consistent correspondences as well as motion
segmentation. At the same time, by explicitly disentangling the correspondence
and motion segmentation estimation modules, we achieve strong generalizability
across different object categories. Our extensive evaluations demonstrate that
our method is effective on various datasets ranging from rigid parts in
articulated objects to individually moving objects in a 3D scene, be it
single-view or full point clouds.Comment: Contact: huang-jh18mailstsinghuaeduc
CAPT: Category-level Articulation Estimation from a Single Point Cloud Using Transformer
The ability to estimate joint parameters is essential for various
applications in robotics and computer vision. In this paper, we propose CAPT:
category-level articulation estimation from a point cloud using Transformer.
CAPT uses an end-to-end transformer-based architecture for joint parameter and
state estimation of articulated objects from a single point cloud. The proposed
CAPT methods accurately estimate joint parameters and states for various
articulated objects with high precision and robustness. The paper also
introduces a motion loss approach, which improves articulation estimation
performance by emphasizing the dynamic features of articulated objects.
Additionally, the paper presents a double voting strategy to provide the
framework with coarse-to-fine parameter estimation. Experimental results on
several category datasets demonstrate that our methods outperform existing
alternatives for articulation estimation. Our research provides a promising
solution for applying Transformer-based architectures in articulated object
analysis.Comment: Accepted to ICRA 202
Structure from Action: Learning Interactions for Articulated Object 3D Structure Discovery
Articulated objects are abundant in daily life. Discovering their parts,
joints, and kinematics is crucial for robots to interact with these objects. We
introduce Structure from Action (SfA), a framework that discovers the 3D part
geometry and joint parameters of unseen articulated objects via a sequence of
inferred interactions. Our key insight is that 3D interaction and perception
should be considered in conjunction to construct 3D articulated CAD models,
especially in the case of categories not seen during training. By selecting
informative interactions, SfA discovers parts and reveals initially occluded
surfaces, like the inside of a closed drawer. By aggregating visual
observations in 3D, SfA accurately segments multiple parts, reconstructs part
geometry, and infers all joint parameters in a canonical coordinate frame. Our
experiments demonstrate that a single SfA model trained in simulation can
generalize to many unseen object categories with unknown kinematic structures
and to real-world objects. Code and data will be publicly available
Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation
A truly generalizable approach to rigid segmentation and motion estimation is
fundamental to 3D understanding of articulated objects and moving scenes. In
view of the tightly coupled relationship between segmentation and motion
estimates, we present an SE(3) equivariant architecture and a training strategy
to tackle this task in an unsupervised manner. Our architecture comprises two
lightweight and inter-connected heads that predict segmentation masks using
point-level invariant features and motion estimates from SE(3) equivariant
features without the prerequisites of category information. Our unified
training strategy can be performed online while jointly optimizing the two
predictions by exploiting the interrelations among scene flow, segmentation
mask, and rigid transformations. We show experiments on four datasets as
evidence of the superiority of our method both in terms of model performance
and computational efficiency with only 0.25M parameters and 0.92G FLOPs. To the
best of our knowledge, this is the first work designed for category-agnostic
part-level SE(3) equivariance in dynamic point clouds
- …