2,194 research outputs found
SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction
Recent hand-object interaction datasets show limited real object variability
and rely on fitting the MANO parametric model to obtain groundtruth hand
shapes. To go beyond these limitations and spur further research, we introduce
the SHOWMe dataset which consists of 96 videos, annotated with real and
detailed hand-object 3D textured meshes. Following recent work, we consider a
rigid hand-object scenario, in which the pose of the hand with respect to the
object remains constant during the whole video sequence. This assumption allows
us to register sub-millimetre-precise groundtruth 3D scans to the image
sequences in SHOWMe. Although simpler, this hypothesis makes sense in terms of
applications where the required accuracy and level of detail is important eg.,
object hand-over in human-robot collaboration, object scanning, or manipulation
and contact point analysis. Importantly, the rigidity of the hand-object
systems allows to tackle video-based 3D reconstruction of unknown hand-held
objects using a 2-stage pipeline consisting of a rigid registration step
followed by a multi-view reconstruction (MVR) part. We carefully evaluate a set
of non-trivial baselines for these two stages and show that it is possible to
achieve promising object-agnostic 3D hand-object reconstructions employing an
SfM toolbox or a hand pose estimator to recover the rigid transforms and
off-the-shelf MVR algorithms. However, these methods remain sensitive to the
initial camera pose estimates which might be imprecise due to lack of textures
on the objects or heavy occlusions of the hands, leaving room for improvements
in the reconstruction. Code and dataset are available at
https://europe.naverlabs.com/research/showmeComment: Paper and Appendix, Accepted in ACVR workshop at ICCV conferenc
Self-Supervised Object-in-Gripper Segmentation from Robotic Motions
Accurate object segmentation is a crucial task in the context of robotic
manipulation. However, creating sufficient annotated training data for neural
networks is particularly time consuming and often requires manual labeling. To
this end, we propose a simple, yet robust solution for learning to segment
unknown objects grasped by a robot. Specifically, we exploit motion and
temporal cues in RGB video sequences. Using optical flow estimation we first
learn to predict segmentation masks of our given manipulator. Then, these
annotations are used in combination with motion cues to automatically
distinguish between background, manipulator and unknown, grasped object. In
contrast to existing systems our approach is fully self-supervised and
independent of precise camera calibration, 3D models or potentially imperfect
depth data. We perform a thorough comparison with alternative baselines and
approaches from literature. The object masks and views are shown to be suitable
training data for segmentation networks that generalize to novel environments
and also allow for watertight 3D reconstruction.Comment: 15 pages, 11 figures. Video:
https://www.youtube.com/watch?v=srEwuuIIgz
Integrating Vision and Physical Interaction for Discovery, Segmentation and Grasping of Unknown Objects
In dieser Arbeit werden Verfahren der Bildverarbeitung und die Fähigkeit
humanoider Roboter, mit ihrer Umgebung physisch zu interagieren, in engem
Zusammenspiel eingesetzt, um unbekannte Objekte zu identifizieren, sie vom
Hintergrund und anderen Objekten zu trennen, und letztendlich zu greifen.
Im Verlauf dieser interaktiven Exploration werden auĂźerdem Eigenschaften
des Objektes wie etwa sein Aussehen und seine Form ermittelt
- …