754 research outputs found
Data Fusion of Objects Using Techniques Such as Laser Scanning, Structured Light and Photogrammetry for Cultural Heritage Applications
In this paper we present a semi-automatic 2D-3D local registration pipeline
capable of coloring 3D models obtained from 3D scanners by using uncalibrated
images. The proposed pipeline exploits the Structure from Motion (SfM)
technique in order to reconstruct a sparse representation of the 3D object and
obtain the camera parameters from image feature matches. We then coarsely
register the reconstructed 3D model to the scanned one through the Scale
Iterative Closest Point (SICP) algorithm. SICP provides the global scale,
rotation and translation parameters, using minimal manual user intervention. In
the final processing stage, a local registration refinement algorithm optimizes
the color projection of the aligned photos on the 3D object removing the
blurring/ghosting artefacts introduced due to small inaccuracies during the
registration. The proposed pipeline is capable of handling real world cases
with a range of characteristics from objects with low level geometric features
to complex ones
Structure from Articulated Motion: Accurate and Stable Monocular 3D Reconstruction without Training Data
Recovery of articulated 3D structure from 2D observations is a challenging
computer vision problem with many applications. Current learning-based
approaches achieve state-of-the-art accuracy on public benchmarks but are
restricted to specific types of objects and motions covered by the training
datasets. Model-based approaches do not rely on training data but show lower
accuracy on these datasets. In this paper, we introduce a model-based method
called Structure from Articulated Motion (SfAM), which can recover multiple
object and motion types without training on extensive data collections. At the
same time, it performs on par with learning-based state-of-the-art approaches
on public benchmarks and outperforms previous non-rigid structure from motion
(NRSfM) methods. SfAM is built upon a general-purpose NRSfM technique while
integrating a soft spatio-temporal constraint on the bone lengths. We use
alternating optimization strategy to recover optimal geometry (i.e., bone
proportions) together with 3D joint positions by enforcing the bone lengths
consistency over a series of frames. SfAM is highly robust to noisy 2D
annotations, generalizes to arbitrary objects and does not rely on training
data, which is shown in extensive experiments on public benchmarks and real
video sequences. We believe that it brings a new perspective on the domain of
monocular 3D recovery of articulated structures, including human motion
capture.Comment: 21 pages, 8 figures, 2 table
3D Volumetric Reconstruction and Characterization of Objects from Uncalibrated Images
Three-dimensional (3D) object reconstruction using only bi-dimensional (2D) images has been a major research topic in Computer Vision. However, it is still a complex problem to solve, when automation, speed and precision are required. In the work presented in this paper, we developed a computational platform with the main purpose of building 3D geometric models from uncalibrated images of objects. Simplicity and automation were our major guidelines to choose volumetric reconstruction methods, such as Generalized Voxel Coloring. This method uses photo-consistency measures to build an accurate 3D geometric model, without imposing any kind of restrictions on the relative motion between the camera used and the object to be reconstructed. Our final goal is to use our computational platform in building and characterize human external anatomical shapes using a single off-the-shelf camera
Incremental Non-Rigid Structure-from-Motion with Unknown Focal Length
The perspective camera and the isometric surface prior have recently gathered
increased attention for Non-Rigid Structure-from-Motion (NRSfM). Despite the
recent progress, several challenges remain, particularly the computational
complexity and the unknown camera focal length. In this paper we present a
method for incremental Non-Rigid Structure-from-Motion (NRSfM) with the
perspective camera model and the isometric surface prior with unknown focal
length. In the template-based case, we provide a method to estimate four
parameters of the camera intrinsics. For the template-less scenario of NRSfM,
we propose a method to upgrade reconstructions obtained for one focal length to
another based on local rigidity and the so-called Maximum Depth Heuristics
(MDH). On its basis we propose a method to simultaneously recover the focal
length and the non-rigid shapes. We further solve the problem of incorporating
a large number of points and adding more views in MDH-based NRSfM and
efficiently solve them with Second-Order Cone Programming (SOCP). This does not
require any shape initialization and produces results orders of times faster
than many methods. We provide evaluations on standard sequences with
ground-truth and qualitative reconstructions on challenging YouTube videos.
These evaluations show that our method performs better in both speed and
accuracy than the state of the art.Comment: ECCV 201
Multiple View Geometry For Video Analysis And Post-production
Multiple view geometry is the foundation of an important class of computer vision techniques for simultaneous recovery of camera motion and scene structure from a set of images. There are numerous important applications in this area. Examples include video post-production, scene reconstruction, registration, surveillance, tracking, and segmentation. In video post-production, which is the topic being addressed in this dissertation, computer analysis of the motion of the camera can replace the currently used manual methods for correctly aligning an artificially inserted object in a scene. However, existing single view methods typically require multiple vanishing points, and therefore would fail when only one vanishing point is available. In addition, current multiple view techniques, making use of either epipolar geometry or trifocal tensor, do not exploit fully the properties of constant or known camera motion. Finally, there does not exist a general solution to the problem of synchronization of N video sequences of distinct general scenes captured by cameras undergoing similar ego-motions, which is the necessary step for video post-production among different input videos. This dissertation proposes several advancements that overcome these limitations. These advancements are used to develop an efficient framework for video analysis and post-production in multiple cameras. In the first part of the dissertation, the novel inter-image constraints are introduced that are particularly useful for scenes where minimal information is available. This result extends the current state-of-the-art in single view geometry techniques to situations where only one vanishing point is available. The property of constant or known camera motion is also described in this dissertation for applications such as calibration of a network of cameras in video surveillance systems, and Euclidean reconstruction from turn-table image sequences in the presence of zoom and focus. We then propose a new framework for the estimation and alignment of camera motions, including both simple (panning, tracking and zooming) and complex (e.g. hand-held) camera motions. Accuracy of these results is demonstrated by applying our approach to video post-production applications such as video cut-and-paste and shadow synthesis. As realistic image-based rendering problems, these applications require extreme accuracy in the estimation of camera geometry, the position and the orientation of the light source, and the photometric properties of the resulting cast shadows. In each case, the theoretical results are fully supported and illustrated by both numerical simulations and thorough experimentation on real data
Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos
Learning to predict scene depth from RGB inputs is a challenging task both
for indoor and outdoor robot navigation. In this work we address unsupervised
learning of scene depth and robot ego-motion where supervision is provided by
monocular videos, as cameras are the cheapest, least restrictive and most
ubiquitous sensor for robotics.
Previous work in unsupervised image-to-depth learning has established strong
baselines in the domain. We propose a novel approach which produces higher
quality results, is able to model moving objects and is shown to transfer
across data domains, e.g. from outdoors to indoor scenes. The main idea is to
introduce geometric structure in the learning process, by modeling the scene
and the individual objects; camera ego-motion and object motions are learned
from monocular videos as input. Furthermore an online refinement method is
introduced to adapt learning on the fly to unknown domains.
The proposed approach outperforms all state-of-the-art approaches, including
those that handle motion e.g. through learned flow. Our results are comparable
in quality to the ones which used stereo as supervision and significantly
improve depth prediction on scenes and datasets which contain a lot of object
motion. The approach is of practical relevance, as it allows transfer across
environments, by transferring models trained on data collected for robot
navigation in urban scenes to indoor navigation settings. The code associated
with this paper can be found at https://sites.google.com/view/struct2depth.Comment: Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19
- …