52 research outputs found
Keypoint detection by wave propagation
We propose to rely on the wave equation for the detection of repeatable keypoints invariant up to image scale and rotation and robust to viewpoint variations, blur, and lighting changes. The algorithm exploits the properties of local spatial–temporal extrema of the evolution of image intensities under the wave propagation to highlight salient symmetries at different scales. Although the image structures found by most state-of-the-art detectors, such as blobs and corners, occur typically on highly textured surfaces, salient symmetries are widespread in diverse kinds of images, including those related to poorly textured objects, which are hardly dealt with by current pipelines based on local invariant features. The impact on the overall algorithm of different numerical wave simulation schemes and their parameters is discussed, and a pyramidal approximation to speed-up the simulation is proposed and validated. Experiments on publicly available datasets show that the proposed algorithm offers state-of-the-art repeatability on a broad set of different images while detecting regions that can be distinctively described and
robustly matched
Looking at words and points with attention: a benchmark for text-to-shape coherence
While text-conditional 3D object generation and manipulation have seen rapid
progress, the evaluation of coherence between generated 3D shapes and input
textual descriptions lacks a clear benchmark. The reason is twofold: a) the low
quality of the textual descriptions in the only publicly available dataset of
text-shape pairs; b) the limited effectiveness of the metrics used to
quantitatively assess such coherence. In this paper, we propose a comprehensive
solution that addresses both weaknesses. Firstly, we employ large language
models to automatically refine textual descriptions associated with shapes.
Secondly, we propose a quantitative metric to assess text-to-shape coherence,
through cross-attention mechanisms. To validate our approach, we conduct a user
study and compare quantitatively our metric with existing ones. The refined
dataset, the new metric and a set of text-shape pairs validated by the user
study comprise a novel, fine-grained benchmark that we publicly release to
foster research on text-to-shape coherence of text-conditioned 3D generative
models. Benchmark available at
https://cvlab-unibo.github.io/CrossCoherence-Web/.Comment: ICCV 2023 Workshop "AI for 3D Content Creation", Project page:
https://cvlab-unibo.github.io/CrossCoherence-Web/, 26 page
Shallow Features Guide Unsupervised Domain Adaptation for Semantic Segmentation at Class Boundaries
Although deep neural networks have achieved remarkable results for the task of semantic segmentation, they usually fail to generalize towards new domains, especially when performing synthetic-to-real adaptation. Such domain shift is particularly noticeable along class boundaries, invalidating one of the main goals of semantic segmentation that consists in obtaining sharp segmentation masks.In this work, we specifically address this core problem in the context of Unsupervised Domain Adaptation and present a novel low-level adaptation strategy that allows us to obtain sharp predictions. Moreover, inspired by recent self-training techniques, we introduce an effective data augmentation that alleviates the noise typically present at semantic boundaries when employing pseudo-labels for self-training. Our contributions can be easily integrated into other popular adaptation frameworks, and extensive experiments show that they effectively improve performance along class boundaries
RefRec: Pseudo-labels Refinement via Shape Reconstruction for Unsupervised 3D Domain Adaptation
Unsupervised Domain Adaptation (UDA) for point cloud classification is an emerging research problem with relevant practical motivations. Reliance on multi-task learning to align features across domains has been the standard way to tackle it. In this paper, we take a different path and propose RefRec, the first approach to investigate pseudo-labels and self-training in UDA for point clouds. We present two main innovations to make self-training effective on 3D data: i) refinement of noisy pseudo-labels by matching shape descriptors that are learned by the unsupervised task of shape reconstruction on both domains; ii) a novel self-training protocol that learns domain-specific decision boundaries and reduces the negative impact of mislabelled target samples and in-domain intra-class variability. RefRec sets the new state of the art in both standard benchmarks used to test UDA for point cloud classification, showcasing the effectiveness of self-training for this important problem
Booster: a Benchmark for Depth from Images of Specular and Transparent Surfaces
Estimating depth from images nowadays yields outstanding results, both in
terms of in-domain accuracy and generalization. However, we identify two main
challenges that remain open in this field: dealing with non-Lambertian
materials and effectively processing high-resolution images. Purposely, we
propose a novel dataset that includes accurate and dense ground-truth labels at
high resolution, featuring scenes containing several specular and transparent
surfaces. Our acquisition pipeline leverages a novel deep space-time stereo
framework, enabling easy and accurate labeling with sub-pixel precision. The
dataset is composed of 606 samples collected in 85 different scenes, each
sample includes both a high-resolution pair (12 Mpx) as well as an unbalanced
stereo pair (Left: 12 Mpx, Right: 1.1 Mpx), typical of modern mobile devices
that mount sensors with different resolutions. Additionally, we provide
manually annotated material segmentation masks and 15K unlabeled samples. The
dataset is composed of a train set and two test sets, the latter devoted to the
evaluation of stereo and monocular depth estimation networks. Our experiments
highlight the open challenges and future research directions in this field.Comment: Extension of the paper "Open Challenges in Deep Stereo: the Booster
Dataset" presented at CVPR 2022. Accepted at TPAM
Lightweight and Effective Convolutional Neural Networks for Vehicle Viewpoint Estimation From Monocular Images
Vehicle viewpoint estimation from monocular images is a crucial component for autonomous driving vehicles and for fleet management applications. In this paper, we make several contributions to advance the state-of-the-art on this problem. We show the effectiveness of applying a smoothing filter to the output neurons of a Convolutional Neural Network (CNN) when estimating vehicle viewpoint. We point out the overlooked fact that, under the same viewpoint, the appearance of a vehicle is strongly influenced by its position in the image plane, which renders viewpoint estimation from appearance an ill-posed problem. We show how, by inserting in the model a CoordConv layer to provide the coordinates of the vehicle, we are able to solve such ambiguity and greatly increase performance. Finally, we introduce a new data augmentation technique that improves viewpoint estimation on vehicles that are closer to the camera or partially occluded. All these improvements let a lightweight CNN reach optimal results while keeping inference time low. An extensive evaluation on a viewpoint estimation benchmark and on actual vehicle camera data shows that our method significantly outperforms the state-of-the-art in vehicle viewpoint estimation, both in terms of accuracy and memory footprint
- …