Acquiring contact patterns between hands and nonrigid objects is a common
concern in the vision and robotics community. However, existing learning-based
methods focus more on contact with rigid ones from monocular images. When
adopting them for nonrigid contact, a major problem is that the existing
contact representation is restricted by the geometry of the object.
Consequently, contact neighborhoods are stored in an unordered manner and
contact features are difficult to align with image cues. At the core of our
approach lies a novel hand-object contact representation called RUPs (Region
Unwrapping Profiles), which unwrap the roughly estimated hand-object surfaces
as multiple high-resolution 2D regional profiles. The region grouping strategy
is consistent with the hand kinematic bone division because they are the
primitive initiators for a composite contact pattern. Based on this
representation, our Regional Unwrapping Transformer (RUFormer) learns the
correlation priors across regions from monocular inputs and predicts
corresponding contact and deformed transformations. Our experiments demonstrate
that the proposed framework can robustly estimate the deformed degrees and
deformed transformations, which makes it suitable for both nonrigid and rigid
contact.Comment: Accepted by ICCV202