808 research outputs found
Surface analysis and visualization from multi-light image collections
Multi-Light Image Collections (MLICs) are stacks of photos of a scene acquired with a fixed viewpoint and a varying surface illumination that provides large amounts of visual and geometric information. Over the last decades, a wide variety of methods have been devised to extract information from MLICs and have shown its use in different application domains to support daily activities. In this thesis, we present methods that leverage a MLICs for surface analysis and visualization. First, we provide background information: acquisition setup, light calibration and application areas where MLICs have been successfully used for the research of daily analysis work. Following, we discuss the use of MLIC for surface visualization and analysis and available tools used to support the analysis. Here, we discuss methods that strive to support the direct exploration of the captured MLIC, methods that generate relightable models from MLIC, non-photorealistic visualization methods that rely on MLIC, methods that estimate normal map from MLIC and we point out visualization tools used to do MLIC analysis. In chapter 3 we propose novel benchmark datasets (RealRTI, SynthRTI and SynthPS) that can be used to evaluate algorithms that rely on MLIC and discusses available benchmark for validation of photometric algorithms that can be also used to validate other MLIC-based algorithms. In chapter 4, we evaluate the performance of different photometric stereo algorithms using SynthPS for cultural heritage applications. RealRTI and SynthRTI have been used to evaluate the performance of (Neural)RTI method. Then, in chapter 5, we present a neural network-based RTI method, aka NeuralRTI, a framework for pixel-based encoding and relighting of RTI data. In this method using a simple autoencoder architecture, we show that it is possible to obtain a highly compressed representation that better preserves the original information and provides increased quality of virtual images relighted from novel directions, particularly in the case of challenging glossy materials. Finally, in chapter 6, we present a method for the detection of crack on the surface of paintings from multi-light image acquisitions and that can be used as well on single images and conclude our presentation
Distortion Estimation Through Explicit Modeling of the Refractive Surface
Precise calibration is a must for high reliance 3D computer vision
algorithms. A challenging case is when the camera is behind a protective glass
or transparent object: due to refraction, the image is heavily distorted; the
pinhole camera model alone can not be used and a distortion correction step is
required. By directly modeling the geometry of the refractive media, we build
the image generation process by tracing individual light rays from the camera
to a target. Comparing the generated images to their distorted - observed -
counterparts, we estimate the geometry parameters of the refractive surface via
model inversion by employing an RBF neural network. We present an image
collection methodology that produces data suited for finding the distortion
parameters and test our algorithm on synthetic and real-world data. We analyze
the results of the algorithm.Comment: Accepted to ICANN 201
Pseudo-keypoint RKHS Learning for Self-supervised 6DoF Pose Estimation
This paper addresses the simulation-to-real domain gap in 6DoF PE, and
proposes a novel self-supervised keypoint radial voting-based 6DoF PE
framework, effectively narrowing this gap using a learnable kernel in RKHS. We
formulate this domain gap as a distance in high-dimensional feature space,
distinct from previous iterative matching methods. We propose an adapter
network, which evolves the network parameters from the source domain, which has
been massively trained on synthetic data with synthetic poses, to the target
domain, which is trained on real data. Importantly, the real data training only
uses pseudo-poses estimated by pseudo-keypoints, and thereby requires no real
groundtruth data annotations. RKHSPose achieves state-of-the-art performance on
three commonly used 6DoF PE datasets including LINEMOD (+4.2%), Occlusion
LINEMOD (+2%), and YCB-Video (+3%). It also compares favorably to fully
supervised methods on all six applicable BOP core datasets, achieving within
-10.8% to -0.3% of the top fully supervised results
Recommended from our members
Learning to See with Minimal Human Supervision
Deep learning has significantly advanced computer vision in the past decade, paving the way for practical applications such as facial recognition and autonomous driving. However, current techniques depend heavily on human supervision, limiting their broader deployment. This dissertation tackles this problem by introducing algorithms and theories to minimize human supervision in three key areas: data, annotations, and neural network architectures, in the context of various visual understanding tasks such as object detection, image restoration, and 3D generation.
First, we present self-supervised learning algorithms to handle in-the-wild images and videos that traditionally require time-consuming manual curation and labeling. We demonstrate that when a deep network is trained to be invariant to geometric and photometric transformations, representations from its intermediate layers are highly predictive of object semantic parts such as eyes and noses. This insight offers a simple unsupervised learning framework that significantly improves the efficiency and accuracy of few-shot landmark prediction and matching. We then present a technique for learning single-view 3D object pose estimation models by utilizing in-the-wild videos where objects turn (e.g., cars in roundabouts). This technique achieves competitive performance with respect to existing state-of-the-art without requiring any manual labels during training. We also contribute an Accidental Turntables Dataset, containing a challenging set of 41,212 images of cars in cluttered backgrounds, motion blur, and illumination changes that serve as a benchmark for 3D pose estimation.
Second, we address variations in labeling styles across different annotators, which leads to a type of noisy label referred to as heterogeneous label. This variability in human annotation can cause subpar performance during both the training and testing phases. To mitigate this, we have developed a framework that models the labeling styles of individual annotators, reducing the impact of human annotation variations and enhancing the performance of standard object detection models. We have also applied this framework to analyze ecological data, which are often collected opportunistically across different case studies without consistent annotation guidelines. Through this application, we have obtained several insightful observations into large-scale bird migration behaviors and their relationship to climate change.
Our next study explores the challenges of designing neural networks, an area that lacks a comprehensive theoretical understanding. By linking deep neural networks with Gaussian processes, we propose a novel Bayesian interpretation of the deep image prior, which parameterizes a natural image as the output of a convolutional network with random parameters and random input. This approach offers valuable insights to optimize the design of neural networks for various image restoration tasks.
Lastly, we introduce several machine-learning techniques to reconstruct and edit 3D shapes from 2D images with minimal human effort. We first present a generic multi-modal generative model that bridges 2D images and 3D shapes via a shared latent space, and demonstrate its applications on versatile 3D shape generation and manipulation tasks. Additionally, we develop a framework for joint estimation of 3D neural scene representation and camera poses. This approach outperforms prior works and allows us to operate in the general SE(3) camera pose setting, unlike the baselines. The results also indicate this method can be complementary to classical structure-from-motion (SfM) pipelines as it compares favorably to SfM on low-texture and low-resolution images
- …