45 research outputs found

    PIZZA: A Powerful Image-only Zero-Shot Zero-CAD Approach to 6 DoF Tracking

    Full text link
    Estimating the relative pose of a new object without prior knowledge is a hard problem, while it is an ability very much needed in robotics and Augmented Reality. We present a method for tracking the 6D motion of objects in RGB video sequences when neither the training images nor the 3D geometry of the objects are available. In contrast to previous works, our method can therefore consider unknown objects in open world instantly, without requiring any prior information or a specific training phase. We consider two architectures, one based on two frames, and the other relying on a Transformer Encoder, which can exploit an arbitrary number of past frames. We train our architectures using only synthetic renderings with domain randomization. Our results on challenging datasets are on par with previous works that require much more information (training images of the target objects, 3D models, and/or depth data). Our source code is available at https://github.com/nv-nguyen/pizzaComment: 3DV Ora

    Expropriated from the hereafter: the fate of the landless in the Southern Highlands of Madagascar

    Get PDF
    During the period following the abolition of slavery by the French colonial government in 1896, the Southern Highlands of Madagascar was settled by ex-slaves. These early settlers constructed a foundation myth of themselves as tompon-tany, or 'masters of the land', a discourse not only equating land with tombs, kinship and ancestors, but also coupled with a skilful deployment of 'Malagasy customs'. In order to exclude later migrants who also wanted to settle, the 'masters of the land' attempted to establish control over holdings in the area. To this end, and to reinforce their own legitimacy as landholders, the tompon-tany labelled subsequent migrants andevo ('lave' or of 'slave descent') who - as a tombless people - have no rights to land. Because they have neither tombs nor ancestors, the landless andevo are socially ostracised and economically marginalised. As an 'impure people', they are not entitled to a place in the hereafter

    SparseFormer: Attention-based Depth Completion Network

    Full text link
    Most pipelines for Augmented and Virtual Reality estimate the ego-motion of the camera by creating a map of sparse 3D landmarks. In this paper, we tackle the problem of depth completion, that is, densifying this sparse 3D map using RGB images as guidance. This remains a challenging problem due to the low density, non-uniform and outlier-prone 3D landmarks produced by SfM and SLAM pipelines. We introduce a transformer block, SparseFormer, that fuses 3D landmarks with deep visual features to produce dense depth. The SparseFormer has a global receptive field, making the module especially effective for depth completion with low-density and non-uniform landmarks. To address the issue of depth outliers among the 3D landmarks, we introduce a trainable refinement module that filters outliers through attention between the sparse landmarks.Comment: Accepted at CV4ARVR 202

    Single Image Depth Prediction with Wavelet Decomposition

    No full text
    International audienceWe present a novel method for predicting accurate depths from monocular images with high efficiency. This optimal efficiency is achieved by exploiting wavelet decomposition, which is integrated in a fully differentiable encoder-decoder architecture. We demonstrate that we can reconstruct high-fidelity depth maps by predicting sparse wavelet coefficients. In contrast with previous works, we show that wavelet coefficients can be learned without direct supervision on coefficients. Instead we supervise only the final depth image that is reconstructed through the inverse wavelet transform. We additionally show that wavelet coefficients can be learned in fully self-supervised scenarios, without access to ground-truth depth. Finally, we apply our method to different state-of-the-art monocular depth estimation models, in each case giving similar or better results compared to the original model, while requiring less than half the multiplyadds in the decoder network

    PIZZA: A Powerful Image-only Zero-Shot Zero-CAD Approach to 6 DoF Tracking

    No full text
    3DV OralInternational audienceEstimating the relative pose of a new object without prior knowledge is a hard problem, while it is an ability very much needed in robotics and Augmented Reality. We present a method for tracking the 6D motion of objects in RGB video sequences when neither the training images nor the 3D geometry of the objects are available. In contrast to previous works, our method can therefore consider unknown objects in open world instantly, without requiring any prior information or a specific training phase. We consider two architectures, one based on two frames, and the other relying on a Transformer Encoder, which can exploit an arbitrary number of past frames. We train our architectures using only synthetic renderings with domain randomization. Our results on challenging datasets are on par with previous works that require much more information (training images of the target objects, 3D models, and/or depth data). Our source code is available at https://github.com/nv-nguyen/pizz
    corecore