24 research outputs found
Split, Merge, and Refine: Fitting Tight Bounding Boxes via Learned Over-Segmentation and Iterative Search
We present a novel framework for finding a set of tight bounding boxes of a
3D shape via neural-network-based over-segmentation and iterative merging and
refinement. Achieving tight bounding boxes of a shape while guaranteeing the
complete boundness is an essential task for efficient geometric operations and
unsupervised semantic part detection, but previous methods fail to achieve both
full coverage and tightness. Neural-network-based methods are not suitable for
these goals due to the non-differentiability of the objective, and also classic
iterative search methods suffer from their sensitivity to the initialization.
We demonstrate that the best integration of the learning-based and iterative
search methods can achieve the bounding boxes with both properties. We employ
an existing unsupervised segmentation network to split the shape and obtain
over-segmentation. Then, we apply hierarchical merging with our novel
tightness-aware merging and stopping criteria. To overcome the sensitivity to
the initialization, we also refine the bounding box parameters in a game setup
with a soft reward function promoting a wider exploration. Lastly, we further
improve the bounding boxes with a MCTS-based multi-action space exploration.
Our experimental results demonstrate the full coverage, tightness, and the
adequate number of bounding boxes of our method
SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions
The remarkable capabilities of pretrained image diffusion models have been
utilized not only for generating fixed-size images but also for creating
panoramas. However, naive stitching of multiple images often results in visible
seams. Recent techniques have attempted to address this issue by performing
joint diffusions in multiple windows and averaging latent features in
overlapping regions. However, these approaches, which focus on seamless montage
generation, often yield incoherent outputs by blending different scenes within
a single image. To overcome this limitation, we propose SyncDiffusion, a
plug-and-play module that synchronizes multiple diffusions through gradient
descent from a perceptual similarity loss. Specifically, we compute the
gradient of the perceptual loss using the predicted denoised images at each
denoising step, providing meaningful guidance for achieving coherent montages.
Our experimental results demonstrate that our method produces significantly
more coherent outputs compared to previous methods (66.35% vs. 33.65% in our
user study) while still maintaining fidelity (as assessed by GIQA) and
compatibility with the input prompt (as measured by CLIP score).Comment: Project page: https://syncdiffusion.github.i
Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes
We present Implicit Two Hands (Im2Hands), the first neural implicit
representation of two interacting hands. Unlike existing methods on two-hand
reconstruction that rely on a parametric hand model and/or low-resolution
meshes, Im2Hands can produce fine-grained geometry of two hands with high
hand-to-hand and hand-to-image coherency. To handle the shape complexity and
interaction context between two hands, Im2Hands models the occupancy volume of
two hands - conditioned on an RGB image and coarse 3D keypoints - by two novel
attention-based modules responsible for (1) initial occupancy estimation and
(2) context-aware occupancy refinement, respectively. Im2Hands first learns
per-hand neural articulated occupancy in the canonical space designed for each
hand using query-image attention. It then refines the initial two-hand
occupancy in the posed space to enhance the coherency between the two hand
shapes using query-anchor attention. In addition, we introduce an optional
keypoint refinement module to enable robust two-hand shape estimation from
predicted hand keypoints in a single-image reconstruction scenario. We
experimentally demonstrate the effectiveness of Im2Hands on two-hand
reconstruction in comparison to related methods, where ours achieves
state-of-the-art results. Our code is publicly available at
https://github.com/jyunlee/Im2Hands.Comment: 6 figures, 14 pages, accepted to CVPR 2023, project page:
https://jyunlee.github.io/projects/implicit-two-hands