Search CORE

9 research outputs found

Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation

Author: Hampali Shreyas
Lepetit Vincent
Rad Mahdi
Sarkar Sayan Deb
Publication venue
Publication date: 01/01/2022
Field of study

We propose a robust and accurate method for estimating the 3D poses of two hands in close interaction from a single color image. This is a very challenging problem, as large occlusions and many confusions between the joints may happen. State-of-the-art methods solve this problem by regressing a heatmap for each joint, which requires solving two problems simultaneously: localizing the joints and recognizing them. In this work, we propose to separate these tasks by relying on a CNN to first localize joints as 2D keypoints, and on self-attention between the CNN features at these keypoints to associate them with the corresponding hand joint. The resulting architecture, which we call "Keypoint Transformer", is highly efficient as it achieves state-of-the-art performance with roughly half the number of model parameters on the InterHand2.6M dataset. We also show it can be easily extended to estimate the 3D pose of an object manipulated by one or two hands with high performance. Moreover, we created a new dataset of more than 75,000 images of two hands manipulating an object fully annotated in 3D and will make it publicly available.Comment: Accepted at CVPR202

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

HAL-Ecole des Ponts ParisTech

In-Hand 3D Object Scanning from an RGB Sequence

Author: Hampali Shreyas
Hodan Tomas
Keskin Cem
Lepetit Vincent
Ma Lingni
Tran Luan
Publication venue
Publication date: 22/06/2023
Field of study

We propose a method for in-hand 3D scanning of an unknown object with a monocular camera. Our method relies on a neural implicit surface representation that captures both the geometry and the appearance of the object, however, by contrast with most NeRF-based methods, we do not assume that the camera-object relative poses are known. Instead, we simultaneously optimize both the object shape and the pose trajectory. As direct optimization over all shape and pose parameters is prone to fail without coarse-level initialization, we propose an incremental approach that starts by splitting the sequence into carefully selected overlapping segments within which the optimization is likely to succeed. We reconstruct the object shape and track its poses independently within each segment, then merge all the segments before performing a global optimization. We show that our method is able to reconstruct the shape and color of both textured and challenging texture-less objects, outperforms classical methods that rely only on appearance features, and that its performance is close to recent methods that assume known camera poses.Comment: CVPR 202

arXiv.org e-Print Archive

Honnotate: A method for 3D annotation of hand and object pose

Author: Hampali Shreyas
Lepetit Vincent
Oberweger Markus
Rad Mahdi
Publication venue: HAL CCSD
Publication date: 01/01/2020
Field of study

International audienc

arXiv.org e-Print Archive

HAL-Ecole des Ponts ParisTech

Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation

Author: Hampali Shreyas
Lepetit Vincent
Rad Mahdi
Sarkar Sayan Deb
Publication venue: HAL CCSD
Publication date: 01/01/2022
Field of study

Accepted at CVPR2022International audienceWe propose a robust and accurate method for estimating the 3D poses of two hands in close interaction from a single color image. This is a very challenging problem, as large occlusions and many confusions between the joints may happen. State-of-the-art methods solve this problem by regressing a heatmap for each joint, which requires solving two problems simultaneously: localizing the joints and recognizing them. In this work, we propose to separate these tasks by relying on a CNN to first localize joints as 2D keypoints, and on self-attention between the CNN features at these keypoints to associate them with the corresponding hand joint. The resulting architecture, which we call "Keypoint Transformer", is highly efficient as it achieves state-of-the-art performance with roughly half the number of model parameters on the InterHand2.6M dataset. We also show it can be easily extended to estimate the 3D pose of an object manipulated by one or two hands with high performance. Moreover, we created a new dataset of more than 75,000 images of two hands manipulating an object fully annotated in 3D and will make it publicly available

HAL-Ecole des Ponts ParisTech

Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation

Author: Hampali Shreyas
Lepetit Vincent
Rad Mahdi
Sarkar Sayan Deb
Publication venue: HAL CCSD
Publication date: 01/01/2022
Field of study

Hal-Diderot

In-Hand 3D Object Scanning from an RGB Sequence

Author: Hampali Shreyas
Hodan Tomas
Keskin Cem
Lepetit Vincent
Ma Lingni
Tran Luan
Publication venue: HAL CCSD
Publication date: 01/01/2023
Field of study

International audienc

HAL-Ecole des Ponts ParisTech

Monte Carlo Scene Search for 3D Scene Understanding

Author: Fraundorfer Friedrich
Hampali Shreyas
Kumar Chetan,
Lepetit Vincent
Sarkar Sayan,
Stekovic Sinisa
Publication venue: HAL CCSD
Publication date: 01/01/2021
Field of study

International audienc

Institute of Transport Research:Publications

arXiv.org e-Print Archive

HAL-Ecole des Ponts ParisTech

DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions

Author: Christen Sammy
Hampali Shreyas
Hodan Tomas
Ma Shugao
Remelli Edoardo
Sauser Eric
Sener Fadime
Tekin Bugra
Publication venue
Publication date: 26/03/2024
Field of study

Generating natural hand-object interactions in 3D is challenging as the resulting hand and object motions are expected to be physically plausible and semantically meaningful. Furthermore, generalization to unseen objects is hindered by the limited scale of available hand-object interaction datasets. We propose DiffH2O, a novel method to synthesize realistic, one or two-handed object interactions from provided text prompts and geometry of the object. The method introduces three techniques that enable effective learning from limited data. First, we decompose the task into a grasping stage and a text-based interaction stage and use separate diffusion models for each. In the grasping stage, the model only generates hand motions, whereas in the interaction phase both hand and object poses are synthesized. Second, we propose a compact representation that tightly couples hand and object poses. Third, we propose two different guidance schemes to allow more control of the generated motions: grasp guidance and detailed textual guidance. Grasp guidance takes a single target grasping pose and guides the diffusion model to reach this grasp at the end of the grasping stage, which provides control over the grasping pose. Given a grasping motion from this stage, multiple different actions can be prompted in the interaction phase. For textual guidance, we contribute comprehensive text descriptions to the GRAB dataset and show that they enable our method to have more fine-grained control over hand-object interactions. Our quantitative and qualitative evaluation demonstrates that the proposed method outperforms baseline methods and leads to natural hand-object motions. Moreover, we demonstrate the practicality of our framework by utilizing a hand pose estimate from an off-the-shelf pose estimator for guidance, and then sampling multiple different actions in the interaction stage.Comment: Project Page: https://diffh2o.github.io

arXiv.org e-Print Archive