11,181 research outputs found
Deep Part Induction from Articulated Object Pairs
Object functionality is often expressed through part articulation -- as when
the two rigid parts of a scissor pivot against each other to perform the
cutting function. Such articulations are often similar across objects within
the same functional category. In this paper, we explore how the observation of
different articulation states provides evidence for part structure and motion
of 3D objects. Our method takes as input a pair of unsegmented shapes
representing two different articulation states of two functionally related
objects, and induces their common parts along with their underlying rigid
motion. This is a challenging setting, as we assume no prior shape structure,
no prior shape category information, no consistent shape orientation, the
articulation states may belong to objects of different geometry, plus we allow
inputs to be noisy and partial scans, or point clouds lifted from RGB images.
Our method learns a neural network architecture with three modules that
respectively propose correspondences, estimate 3D deformation flows, and
perform segmentation. To achieve optimal performance, our architecture
alternates between correspondence, deformation flow, and segmentation
prediction iteratively in an ICP-like fashion. Our results demonstrate that our
method significantly outperforms state-of-the-art techniques in the task of
discovering articulated parts of objects. In addition, our part induction is
object-class agnostic and successfully generalizes to new and unseen objects
Robust Temporally Coherent Laplacian Protrusion Segmentation of 3D Articulated Bodies
In motion analysis and understanding it is important to be able to fit a
suitable model or structure to the temporal series of observed data, in order
to describe motion patterns in a compact way, and to discriminate between them.
In an unsupervised context, i.e., no prior model of the moving object(s) is
available, such a structure has to be learned from the data in a bottom-up
fashion. In recent times, volumetric approaches in which the motion is captured
from a number of cameras and a voxel-set representation of the body is built
from the camera views, have gained ground due to attractive features such as
inherent view-invariance and robustness to occlusions. Automatic, unsupervised
segmentation of moving bodies along entire sequences, in a temporally-coherent
and robust way, has the potential to provide a means of constructing a
bottom-up model of the moving body, and track motion cues that may be later
exploited for motion classification. Spectral methods such as locally linear
embedding (LLE) can be useful in this context, as they preserve "protrusions",
i.e., high-curvature regions of the 3D volume, of articulated shapes, while
improving their separation in a lower dimensional space, making them in this
way easier to cluster. In this paper we therefore propose a spectral approach
to unsupervised and temporally-coherent body-protrusion segmentation along time
sequences. Volumetric shapes are clustered in an embedding space, clusters are
propagated in time to ensure coherence, and merged or split to accommodate
changes in the body's topology. Experiments on both synthetic and real
sequences of dense voxel-set data are shown. This supports the ability of the
proposed method to cluster body-parts consistently over time in a totally
unsupervised fashion, its robustness to sampling density and shape quality, and
its potential for bottom-up model constructionComment: 31 pages, 26 figure
Dependent landmark drift: robust point set registration with a Gaussian mixture model and a statistical shape model
The goal of point set registration is to find point-by-point correspondences
between point sets, each of which characterizes the shape of an object. Because
local preservation of object geometry is assumed, prevalent algorithms in the
area can often elegantly solve the problems without using geometric information
specific to the objects. This means that registration performance can be
further improved by using prior knowledge of object geometry. In this paper, we
propose a novel point set registration method using the Gaussian mixture model
with prior shape information encoded as a statistical shape model. Our
transformation model is defined as a combination of the similar transformation,
motion coherence, and the statistical shape model. Therefore, the proposed
method works effectively if the target point set includes outliers and missing
regions, or if it is rotated. The computational cost can be reduced to linear,
and therefore the method is scalable to large point sets. The effectiveness of
the method will be verified through comparisons with existing algorithms using
datasets concerning human body shapes, hands, and faces
Learning Articulated Motion Models from Visual and Lingual Signals
In order for robots to operate effectively in homes and workplaces, they must
be able to manipulate the articulated objects common within environments built
for and by humans. Previous work learns kinematic models that prescribe this
manipulation from visual demonstrations. Lingual signals, such as natural
language descriptions and instructions, offer a complementary means of
conveying knowledge of such manipulation models and are suitable to a wide
range of interactions (e.g., remote manipulation). In this paper, we present a
multimodal learning framework that incorporates both visual and lingual
information to estimate the structure and parameters that define kinematic
models of articulated objects. The visual signal takes the form of an RGB-D
image stream that opportunistically captures object motion in an unprepared
scene. Accompanying natural language descriptions of the motion constitute the
lingual signal. We present a probabilistic language model that uses word
embeddings to associate lingual verbs with their corresponding kinematic
structures. By exploiting the complementary nature of the visual and lingual
input, our method infers correct kinematic structures for various multiple-part
objects on which the previous state-of-the-art, visual-only system fails. We
evaluate our multimodal learning framework on a dataset comprised of a variety
of household objects, and demonstrate a 36% improvement in model accuracy over
the vision-only baseline
Learning 3D Deformation of Animals from 2D Images
Understanding how an animal can deform and articulate is essential for a
realistic modification of its 3D model. In this paper, we show that such
information can be learned from user-clicked 2D images and a template 3D model
of the target animal. We present a volumetric deformation framework that
produces a set of new 3D models by deforming a template 3D model according to a
set of user-clicked images. Our framework is based on a novel locally-bounded
deformation energy, where every local region has its own stiffness value that
bounds how much distortion is allowed at that location. We jointly learn the
local stiffness bounds as we deform the template 3D mesh to match each
user-clicked image. We show that this seemingly complex task can be solved as a
sequence of convex optimization problems. We demonstrate the effectiveness of
our approach on cats and horses, which are highly deformable and articulated
animals. Our framework produces new 3D models of animals that are significantly
more plausible than methods without learned stiffness.Comment: 10 pages, Eurographics 2016 (Best paper award
Modeling and Correspondence of Topologically Complex 3D Shapes
3D shape creation and modeling remains a challenging task especially for
novice users. Many methods in the field of computer graphics have been proposed
to automate the often repetitive and precise operations needed during the
modeling of detailed shapes. This report surveys different approaches of shape
modeling and correspondence especially for shapes exhibiting topological
complexity. We focus on methods designed to help generate or process shapes
with large number of interconnected components often found in man-made shapes.
We first discuss a variety of modeling techniques, that leverage existing
shapes, in easy to use creative modeling systems. We then discuss possible
correspondence strategies for topologically different shapes as it is a
requirement for such systems. Finally, we look at different shape
representations and tools that facilitate the modification of shape topology
and we focus on those particularly useful in free-form 3D modeling
A Probabilistic Framework for Learning Kinematic Models of Articulated Objects
Robots operating in domestic environments generally need to interact with
articulated objects, such as doors, cabinets, dishwashers or fridges. In this
work, we present a novel, probabilistic framework for modeling articulated
objects as kinematic graphs. Vertices in this graph correspond to object parts,
while edges between them model their kinematic relationship. In particular, we
present a set of parametric and non-parametric edge models and how they can
robustly be estimated from noisy pose observations. We furthermore describe how
to estimate the kinematic structure and how to use the learned kinematic models
for pose prediction and for robotic manipulation tasks. We finally present how
the learned models can be generalized to new and previously unseen objects. In
various experiments using real robots with different camera systems as well as
in simulation, we show that our approach is valid, accurate and efficient.
Further, we demonstrate that our approach has a broad set of applications, in
particular for the emerging fields of mobile manipulation and service robotics
SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Planning and Control
In this work, we present an approach to deep visuomotor control using
structured deep dynamics models. Our deep dynamics model, a variant of
SE3-Nets, learns a low-dimensional pose embedding for visuomotor control via an
encoder-decoder structure. Unlike prior work, our dynamics model is structured:
given an input scene, our network explicitly learns to segment salient parts
and predict their pose-embedding along with their motion modeled as a change in
the pose space due to the applied actions. We train our model using a pair of
point clouds separated by an action and show that given supervision only in the
form of point-wise data associations between the frames our network is able to
learn a meaningful segmentation of the scene along with consistent poses. We
further show that our model can be used for closed-loop control directly in the
learned low-dimensional pose space, where the actions are computed by
minimizing error in the pose space using gradient-based methods, similar to
traditional model-based control. We present results on controlling a Baxter
robot from raw depth data in simulation and in the real world and compare
against two baseline deep networks. Our method runs in real-time, achieves good
prediction of scene dynamics and outperforms the baseline methods on multiple
control runs. Video results can be found at:
https://rse-lab.cs.washington.edu/se3-structured-deep-ctrl/Comment: 8 pages, Initial submission to IEEE International Conference on
Robotics and Automation (ICRA) 201
Inner Space Preserving Generative Pose Machine
Image-based generative methods, such as generative adversarial networks
(GANs) have already been able to generate realistic images with much context
control, specially when they are conditioned. However, most successful
frameworks share a common procedure which performs an image-to-image
translation with pose of figures in the image untouched. When the objective is
reposing a figure in an image while preserving the rest of the image, the
state-of-the-art mainly assumes a single rigid body with simple background and
limited pose shift, which can hardly be extended to the images under normal
settings. In this paper, we introduce an image "inner space" preserving model
that assigns an interpretable low-dimensional pose descriptor (LDPD) to an
articulated figure in the image. Figure reposing is then generated by passing
the LDPD and the original image through multi-stage augmented hourglass
networks in a conditional GAN structure, called inner space preserving
generative pose machine (ISP-GPM). We evaluated ISP-GPM on reposing human
figures, which are highly articulated with versatile variations. Test of a
state-of-the-art pose estimator on our reposed dataset gave an accuracy over
80% on PCK0.5 metric. The results also elucidated that our ISP-GPM is able to
preserve the background with high accuracy while reasonably recovering the area
blocked by the figure to be reposed.Comment: http://www.northeastern.edu/ostadabbas/2018/07/23/inner-space-preserving-generative-pose-machine
Deep Affinity Network for Multiple Object Tracking
Multiple Object Tracking (MOT) plays an important role in solving many
fundamental problems in video analysis in computer vision. Most MOT methods
employ two steps: Object Detection and Data Association. The first step detects
objects of interest in every frame of a video, and the second establishes
correspondence between the detected objects in different frames to obtain their
tracks. Object detection has made tremendous progress in the last few years due
to deep learning. However, data association for tracking still relies on hand
crafted constraints such as appearance, motion, spatial proximity, grouping
etc. to compute affinities between the objects in different frames. In this
paper, we harness the power of deep learning for data association in tracking
by jointly modelling object appearances and their affinities between different
frames in an end-to-end fashion. The proposed Deep Affinity Network (DAN)
learns compact; yet comprehensive features of pre-detected objects at several
levels of abstraction, and performs exhaustive pairing permutations of those
features in any two frames to infer object affinities. DAN also accounts for
multiple objects appearing and disappearing between video frames. We exploit
the resulting efficient affinity computations to associate objects in the
current frame deep into the previous frames for reliable on-line tracking. Our
technique is evaluated on popular multiple object tracking challenges MOT15,
MOT17 and UA-DETRAC. Comprehensive benchmarking under twelve evaluation metrics
demonstrates that our approach is among the best performing techniques on the
leader board for these challenges. The open source implementation of our work
is available at https://github.com/shijieS/SST.git.Comment: To appear in IEEE TPAM
- …