24 research outputs found

    Split, Merge, and Refine: Fitting Tight Bounding Boxes via Learned Over-Segmentation and Iterative Search

    Full text link
    We present a novel framework for finding a set of tight bounding boxes of a 3D shape via neural-network-based over-segmentation and iterative merging and refinement. Achieving tight bounding boxes of a shape while guaranteeing the complete boundness is an essential task for efficient geometric operations and unsupervised semantic part detection, but previous methods fail to achieve both full coverage and tightness. Neural-network-based methods are not suitable for these goals due to the non-differentiability of the objective, and also classic iterative search methods suffer from their sensitivity to the initialization. We demonstrate that the best integration of the learning-based and iterative search methods can achieve the bounding boxes with both properties. We employ an existing unsupervised segmentation network to split the shape and obtain over-segmentation. Then, we apply hierarchical merging with our novel tightness-aware merging and stopping criteria. To overcome the sensitivity to the initialization, we also refine the bounding box parameters in a game setup with a soft reward function promoting a wider exploration. Lastly, we further improve the bounding boxes with a MCTS-based multi-action space exploration. Our experimental results demonstrate the full coverage, tightness, and the adequate number of bounding boxes of our method

    SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions

    Full text link
    The remarkable capabilities of pretrained image diffusion models have been utilized not only for generating fixed-size images but also for creating panoramas. However, naive stitching of multiple images often results in visible seams. Recent techniques have attempted to address this issue by performing joint diffusions in multiple windows and averaging latent features in overlapping regions. However, these approaches, which focus on seamless montage generation, often yield incoherent outputs by blending different scenes within a single image. To overcome this limitation, we propose SyncDiffusion, a plug-and-play module that synchronizes multiple diffusions through gradient descent from a perceptual similarity loss. Specifically, we compute the gradient of the perceptual loss using the predicted denoised images at each denoising step, providing meaningful guidance for achieving coherent montages. Our experimental results demonstrate that our method produces significantly more coherent outputs compared to previous methods (66.35% vs. 33.65% in our user study) while still maintaining fidelity (as assessed by GIQA) and compatibility with the input prompt (as measured by CLIP score).Comment: Project page: https://syncdiffusion.github.i

    Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes

    Full text link
    We present Implicit Two Hands (Im2Hands), the first neural implicit representation of two interacting hands. Unlike existing methods on two-hand reconstruction that rely on a parametric hand model and/or low-resolution meshes, Im2Hands can produce fine-grained geometry of two hands with high hand-to-hand and hand-to-image coherency. To handle the shape complexity and interaction context between two hands, Im2Hands models the occupancy volume of two hands - conditioned on an RGB image and coarse 3D keypoints - by two novel attention-based modules responsible for (1) initial occupancy estimation and (2) context-aware occupancy refinement, respectively. Im2Hands first learns per-hand neural articulated occupancy in the canonical space designed for each hand using query-image attention. It then refines the initial two-hand occupancy in the posed space to enhance the coherency between the two hand shapes using query-anchor attention. In addition, we introduce an optional keypoint refinement module to enable robust two-hand shape estimation from predicted hand keypoints in a single-image reconstruction scenario. We experimentally demonstrate the effectiveness of Im2Hands on two-hand reconstruction in comparison to related methods, where ours achieves state-of-the-art results. Our code is publicly available at https://github.com/jyunlee/Im2Hands.Comment: 6 figures, 14 pages, accepted to CVPR 2023, project page: https://jyunlee.github.io/projects/implicit-two-hands
    corecore