2 research outputs found

    MBW: Multi-view Bootstrapping in the Wild

    Full text link
    Labeling articulated objects in unconstrained settings have a wide variety of applications including entertainment, neuroscience, psychology, ethology, and many fields of medicine. Large offline labeled datasets do not exist for all but the most common articulated object categories (e.g., humans). Hand labeling these landmarks within a video sequence is a laborious task. Learned landmark detectors can help, but can be error-prone when trained from only a few examples. Multi-camera systems that train fine-grained detectors have shown significant promise in detecting such errors, allowing for self-supervised solutions that only need a small percentage of the video sequence to be hand-labeled. The approach, however, is based on calibrated cameras and rigid geometry, making it expensive, difficult to manage, and impractical in real-world scenarios. In this paper, we address these bottlenecks by combining a non-rigid 3D neural prior with deep flow to obtain high-fidelity landmark estimates from videos with only two or three uncalibrated, handheld cameras. With just a few annotations (representing 1-2% of the frames), we are able to produce 2D results comparable to state-of-the-art fully supervised methods, along with 3D reconstructions that are impossible with other existing approaches. Our Multi-view Bootstrapping in the Wild (MBW) approach demonstrates impressive results on standard human datasets, as well as tigers, cheetahs, fish, colobus monkeys, chimpanzees, and flamingos from videos captured casually in a zoo. We release the codebase for MBW as well as this challenging zoo dataset consisting image frames of tail-end distribution categories with their corresponding 2D, 3D labels generated from minimal human intervention.Comment: NeurIPS 2022 conference. Project webpage and code: https://github.com/mosamdabhi/MB

    The 2nd 3D Face Alignment In The Wild Challenge (3DFAW-video): Dense Reconstruction From Video

    Get PDF
    3D face alignment approaches have strong advantages over 2D with respect to representational power and robustness to illumination and pose. Over the past few years, a number of research groups have made rapid advances in dense 3D alignment from 2D video and obtained impressive results. How these various methods compare is relatively unknown. Previous benchmarks addressed sparse 3D alignment and single image 3D reconstruction. No commonly accepted evaluation protocol exists for dense 3D face reconstruction from video with which to compare them. The 2nd 3D Face Alignment in the Wild from Videos (3DFAW-Video) Challenge extends the previous 3DFAW 2016 competition to the estimation of dense 3D facial structure from video. It presented a new large corpora of profile-to-profile face videos recorded under different imaging conditions and annotated with corresponding high-resolution 3D ground truth meshes. In this paper we outline the evaluation protocol, the data used, and the results. 3DFAW-Video is to be held in conjunction with the 2019 International Conference on Computer Vision, in Seoul, Korea
    corecore