23,925 research outputs found
Defining the Pose of any 3D Rigid Object and an Associated Distance
The pose of a rigid object is usually regarded as a rigid transformation,
described by a translation and a rotation. However, equating the pose space
with the space of rigid transformations is in general abusive, as it does not
account for objects with proper symmetries -- which are common among man-made
objects.In this article, we define pose as a distinguishable static state of an
object, and equate a pose with a set of rigid transformations. Based solely on
geometric considerations, we propose a frame-invariant metric on the space of
possible poses, valid for any physical rigid object, and requiring no arbitrary
tuning. This distance can be evaluated efficiently using a representation of
poses within an Euclidean space of at most 12 dimensions depending on the
object's symmetries. This makes it possible to efficiently perform neighborhood
queries such as radius searches or k-nearest neighbor searches within a large
set of poses using off-the-shelf methods. Pose averaging considering this
metric can similarly be performed easily, using a projection function from the
Euclidean space onto the pose space. The practical value of those theoretical
developments is illustrated with an application of pose estimation of instances
of a 3D rigid object given an input depth map, via a Mean Shift procedure
Image collection pop-up: 3D reconstruction and clustering of rigid and non-rigid categories
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper introduces an approach to simultaneously estimate 3D shape, camera pose, and object and type of deformation clustering, from partial 2D annotations in a multi-instance collection of images. Furthermore, we can indistinctly process rigid and non-rigid categories. This advances existing work, which only addresses the problem for one single object or, if multiple objects are considered, they are assumed to be clustered a priori. To handle this broader version of the problem, we model object deformation using a formulation based on multiple unions of subspaces, able to span from small rigid motion to complex deformations. The parameters of this model are learned via Augmented Lagrange Multipliers, in a completely unsupervised manner that does not require any training data at all. Extensive validation is provided in a wide variety of synthetic and real scenarios, including rigid and non-rigid categories with small and large deformations. In all cases our approach outperforms state-of-the-art in terms of 3D reconstruction accuracy, while also providing clustering results that allow segmenting the images into object instances and their associated type of deformation (or action the object is performing).Postprint (author's final draft
Articulation-aware Canonical Surface Mapping
We tackle the tasks of: 1) predicting a Canonical Surface Mapping (CSM) that
indicates the mapping from 2D pixels to corresponding points on a canonical
template shape, and 2) inferring the articulation and pose of the template
corresponding to the input image. While previous approaches rely on keypoint
supervision for learning, we present an approach that can learn without such
annotations. Our key insight is that these tasks are geometrically related, and
we can obtain supervisory signal via enforcing consistency among the
predictions. We present results across a diverse set of animal object
categories, showing that our method can learn articulation and CSM prediction
from image collections using only foreground mask labels for training. We
empirically show that allowing articulation helps learn more accurate CSM
prediction, and that enforcing the consistency with predicted CSM is similarly
critical for learning meaningful articulation.Comment: To appear at CVPR 2020, project page
https://nileshkulkarni.github.io/acsm
Extrinisic Calibration of a Camera-Arm System Through Rotation Identification
Determining extrinsic calibration parameters is a necessity in any robotic
system composed of actuators and cameras. Once a system is outside the lab
environment, parameters must be determined without relying on outside artifacts
such as calibration targets. We propose a method that relies on structured
motion of an observed arm to recover extrinsic calibration parameters. Our
method combines known arm kinematics with observations of conics in the image
plane to calculate maximum-likelihood estimates for calibration extrinsics.
This method is validated in simulation and tested against a real-world model,
yielding results consistent with ruler-based estimates. Our method shows
promise for estimating the pose of a camera relative to an articulated arm's
end effector without requiring tedious measurements or external artifacts.
Index Terms: robotics, hand-eye problem, self-calibration, structure from
motio
Capturing Hands in Action using Discriminative Salient Points and Physics Simulation
Hand motion capture is a popular research field, recently gaining more
attention due to the ubiquity of RGB-D sensors. However, even most recent
approaches focus on the case of a single isolated hand. In this work, we focus
on hands that interact with other hands or objects and present a framework that
successfully captures motion in such interaction scenarios for both rigid and
articulated objects. Our framework combines a generative model with
discriminatively trained salient points to achieve a low tracking error and
with collision detection and physics simulation to achieve physically plausible
estimates even in case of occlusions and missing visual data. Since all
components are unified in a single objective function which is almost
everywhere differentiable, it can be optimized with standard optimization
techniques. Our approach works for monocular RGB-D sequences as well as setups
with multiple synchronized RGB cameras. For a qualitative and quantitative
evaluation, we captured 29 sequences with a large variety of interactions and
up to 150 degrees of freedom.Comment: Accepted for publication by the International Journal of Computer
Vision (IJCV) on 16.02.2016 (submitted on 17.10.14). A combination into a
single framework of an ECCV'12 multicamera-RGB and a monocular-RGBD GCPR'14
hand tracking paper with several extensions, additional experiments and
detail
Pose and Shape Reconstruction of a Noncooperative Spacecraft Using Camera and Range Measurements
Recent interest in on-orbit proximity operations has pushed towards the development of autonomous GNC strategies. In this sense, optical navigation enables a wide variety of possibilities as it can provide information not only about the kinematic state but also about the shape of the observed object. Various mission architectures have been either tested in space or studied on Earth. The present study deals with on-orbit relative pose and shape estimation with the use of a monocular camera and a distance sensor. The goal is to develop a filter which estimates an observed satellite's relative position, velocity, attitude, and angular velocity, along with its shape, with the measurements obtained by a camera and a distance sensor mounted on board a chaser which is on a relative trajectory around the target. The filter's efficiency is proved with a simulation on a virtual target object. The results of the simulation, even though relevant to a simplified scenario, show that the estimation process is successful and can be considered a promising strategy for a correct and safe docking maneuver
Shape Animation with Combined Captured and Simulated Dynamics
We present a novel volumetric animation generation framework to create new
types of animations from raw 3D surface or point cloud sequence of captured
real performances. The framework considers as input time incoherent 3D
observations of a moving shape, and is thus particularly suitable for the
output of performance capture platforms. In our system, a suitable virtual
representation of the actor is built from real captures that allows seamless
combination and simulation with virtual external forces and objects, in which
the original captured actor can be reshaped, disassembled or reassembled from
user-specified virtual physics. Instead of using the dominant surface-based
geometric representation of the capture, which is less suitable for volumetric
effects, our pipeline exploits Centroidal Voronoi tessellation decompositions
as unified volumetric representation of the real captured actor, which we show
can be used seamlessly as a building block for all processing stages, from
capture and tracking to virtual physic simulation. The representation makes no
human specific assumption and can be used to capture and re-simulate the actor
with props or other moving scenery elements. We demonstrate the potential of
this pipeline for virtual reanimation of a real captured event with various
unprecedented volumetric visual effects, such as volumetric distortion,
erosion, morphing, gravity pull, or collisions
A survey of real-time crowd rendering
In this survey we review, classify and compare existing approaches for real-time crowd rendering. We first overview character animation techniques, as they are highly tied to crowd rendering performance, and then we analyze the state of the art in crowd rendering. We discuss different representations for level-of-detail (LoD) rendering of animated characters, including polygon-based, point-based, and image-based techniques, and review different criteria for runtime LoD selection. Besides LoD approaches, we review classic acceleration schemes, such as frustum culling and occlusion culling, and describe how they can be adapted to handle crowds of animated characters. We also discuss specific acceleration techniques for crowd rendering, such as primitive pseudo-instancing, palette skinning, and dynamic key-pose caching, which benefit from current graphics hardware. We also address other factors affecting performance and realism of crowds such as lighting, shadowing, clothing and variability. Finally we provide an exhaustive comparison of the most relevant approaches in the field.Peer ReviewedPostprint (author's final draft
- …