45 research outputs found
PIZZA: A Powerful Image-only Zero-Shot Zero-CAD Approach to 6 DoF Tracking
Estimating the relative pose of a new object without prior knowledge is a
hard problem, while it is an ability very much needed in robotics and Augmented
Reality. We present a method for tracking the 6D motion of objects in RGB video
sequences when neither the training images nor the 3D geometry of the objects
are available. In contrast to previous works, our method can therefore consider
unknown objects in open world instantly, without requiring any prior
information or a specific training phase. We consider two architectures, one
based on two frames, and the other relying on a Transformer Encoder, which can
exploit an arbitrary number of past frames. We train our architectures using
only synthetic renderings with domain randomization. Our results on challenging
datasets are on par with previous works that require much more information
(training images of the target objects, 3D models, and/or depth data). Our
source code is available at https://github.com/nv-nguyen/pizzaComment: 3DV Ora
Expropriated from the hereafter: the fate of the landless in the Southern Highlands of Madagascar
During the period following the abolition of slavery by the French colonial government in 1896, the Southern Highlands of Madagascar was settled by ex-slaves. These early settlers constructed a foundation myth of themselves as tompon-tany, or 'masters of the land', a discourse not only equating land with tombs, kinship and ancestors, but also coupled with a skilful deployment of 'Malagasy customs'. In order to exclude later migrants who also wanted to settle, the 'masters of the land' attempted to establish control over holdings in the area. To this end, and to reinforce their own legitimacy as landholders, the tompon-tany labelled subsequent migrants andevo ('lave' or of 'slave descent') who - as a tombless people - have no rights to land. Because they have neither tombs nor ancestors, the landless andevo are socially ostracised and economically marginalised. As an 'impure people', they are not entitled to a place in the hereafter
SparseFormer: Attention-based Depth Completion Network
Most pipelines for Augmented and Virtual Reality estimate the ego-motion of
the camera by creating a map of sparse 3D landmarks. In this paper, we tackle
the problem of depth completion, that is, densifying this sparse 3D map using
RGB images as guidance. This remains a challenging problem due to the low
density, non-uniform and outlier-prone 3D landmarks produced by SfM and SLAM
pipelines. We introduce a transformer block, SparseFormer, that fuses 3D
landmarks with deep visual features to produce dense depth. The SparseFormer
has a global receptive field, making the module especially effective for depth
completion with low-density and non-uniform landmarks. To address the issue of
depth outliers among the 3D landmarks, we introduce a trainable refinement
module that filters outliers through attention between the sparse landmarks.Comment: Accepted at CV4ARVR 202
Single Image Depth Prediction with Wavelet Decomposition
International audienceWe present a novel method for predicting accurate depths from monocular images with high efficiency. This optimal efficiency is achieved by exploiting wavelet decomposition, which is integrated in a fully differentiable encoder-decoder architecture. We demonstrate that we can reconstruct high-fidelity depth maps by predicting sparse wavelet coefficients. In contrast with previous works, we show that wavelet coefficients can be learned without direct supervision on coefficients. Instead we supervise only the final depth image that is reconstructed through the inverse wavelet transform. We additionally show that wavelet coefficients can be learned in fully self-supervised scenarios, without access to ground-truth depth. Finally, we apply our method to different state-of-the-art monocular depth estimation models, in each case giving similar or better results compared to the original model, while requiring less than half the multiplyadds in the decoder network
PIZZA: A Powerful Image-only Zero-Shot Zero-CAD Approach to 6 DoF Tracking
3DV OralInternational audienceEstimating the relative pose of a new object without prior knowledge is a hard problem, while it is an ability very much needed in robotics and Augmented Reality. We present a method for tracking the 6D motion of objects in RGB video sequences when neither the training images nor the 3D geometry of the objects are available. In contrast to previous works, our method can therefore consider unknown objects in open world instantly, without requiring any prior information or a specific training phase. We consider two architectures, one based on two frames, and the other relying on a Transformer Encoder, which can exploit an arbitrary number of past frames. We train our architectures using only synthetic renderings with domain randomization. Our results on challenging datasets are on par with previous works that require much more information (training images of the target objects, 3D models, and/or depth data). Our source code is available at https://github.com/nv-nguyen/pizz