Reconstructing hand-held objects from a single RGB image is an important and
challenging problem. Existing works utilizing Signed Distance Fields (SDF)
reveal limitations in comprehensively capturing the complex hand-object
interactions, since SDF is only reliable within the proximity of the target,
and hence, infeasible to simultaneously encode local hand and object cues. To
address this issue, we propose DDF-HO, a novel approach leveraging Directed
Distance Field (DDF) as the shape representation. Unlike SDF, DDF maps a ray in
3D space, consisting of an origin and a direction, to corresponding DDF values,
including a binary visibility signal determining whether the ray intersects the
objects and a distance value measuring the distance from origin to target in
the given direction. We randomly sample multiple rays and collect local to
global geometric features for them by introducing a novel 2D ray-based feature
aggregation scheme and a 3D intersection-aware hand pose embedding, combining
2D-3D features to model hand-object interactions. Extensive experiments on
synthetic and real-world datasets demonstrate that DDF-HO consistently
outperforms all baseline methods by a large margin, especially under Chamfer
Distance, with about 80% leap forward. Codes and trained models will be
released soon