247 research outputs found

    Improved Multistage Learning for Multibody Motion Segmentation

    Get PDF
    We present an improved version of the MSL method of Sugaya and Kanatani for multibody motion segmentation. We replace their initial segmentation based on heuristic clustering by an analytical computation based on GPCA, fitting two 2-D affine spaces in 3-D by the Taubin method. This initial segmentation alone can segment most of the motions in natural scenes fairly correctly, and the result is successively optimized by the EM algorithm in 3-D, 5-D, and 7-D. Using simulated and real videos, we demonstrate that our method outperforms the previous MSL and other existing methods. We also illustrate its mechanism by our visualization technique

    Universal Geometric Camera Calibration with Statistical Model Selection

    Get PDF
    We propose a new universal camera calibration approach that uses statistical information criteria for automatic camera model selection. It requires the camera to observe a planar pattern from different positions, and then closed-form estimates for the intrinsic and extrinsic parameters are computed followed by nonlinear optimization. In lieu of modeling radial distortion, the lens projection of the camera is modeled, and in addition we include decentering distortion. This approach is particularly advantageous for wide angle (fisheye) camera calibration because it often reduces the complexity of the model compared to modeling radial distortion. We then apply statistical information criteria to automatically select the complexity of the camera model for any lens type. The complete algorithm is evaluated on synthetic and real data for several different lens projections, and a comparison between existing methods which use radial distortion is done

    Acquiring 3D scene information from 2D images

    Get PDF
    In recent years, people are becoming increasingly acquainted with 3D technologies such as 3DTV, 3D movies and 3D virtual navigation of city environments in their daily life. Commercial 3D movies are now commonly available for consumers. Virtual navigation of our living environment as used on a personal computer has become a reality due to well-known web-based geographic applications using advanced imaging technologies. To enable such 3D applications, many technological challenges such as 3D content creation, 3D displaying technology and 3D content transmission need to tackled and deployed at low cost. This thesis concentrates on the reconstruction of 3D scene information from multiple 2D images, aiming for an automatic and low-cost production of the 3D content. In this thesis, two multiple-view 3D reconstruction systems are proposed: a 3D modeling system for reconstructing the sparse 3D scene model from long video sequences captured with a hand-held consumer camcorder, and a depth reconstruction system for creating depth maps from multiple-view videos taken by multiple synchronized cameras. Both systems are designed to compute the 3D scene information in an automated way with minimum human interventions, in order to reduce the production cost of 3D contents. Experimental results on real videos of hundreds and thousands frames have shown that the two systems are able to accurately and automatically reconstruct the 3D scene information from 2D image data. The findings of this research are useful for emerging 3D applications such as 3D games, 3D visualization and 3D content production. Apart from designing and implementing the two proposed systems, we have developed three key scientific contributions to enable the two proposed 3D reconstruction systems. The first contribution is that we have designed a novel feature point matching algorithm that uses only a smoothness constraint for matching the points, which states that neighboring feature points in images tend to move with similar directions and magnitudes. The employed smoothness assumption is not only valid but also robust for most images with limited image motion, regardless of the camera motion and scene structure. Because of this, the algorithm obtains two major advan- 1 tages. First, the algorithm is robust to illumination changes, as the employed smoothness constraint does not rely on any texture information. Second, the algorithm has a good capability to handle the drift of the feature points over time, as the drift can hardly lead to a violation of the smoothness constraint. This leads to the large number of feature points matched and tracked by the proposed algorithm, which significantly helps the subsequent 3D modeling process. Our feature point matching algorithm is specifically designed for matching and tracking feature points in image/video sequences where the image motion is limited. Our extensive experimental results show that the proposed algorithm is able to track at least 2.5 times as many feature points compared with the state-of-the-art algorithms, with a comparable or higher accuracy. This contributes significantly to the robustness of the 3D reconstruction process. The second contribution is that we have developed algorithms to detect critical configurations where the factorization-based 3D reconstruction degenerates. Based on the detection, we have proposed a sequence-dividing algorithm to divide a long sequence into subsequences, such that successful 3D reconstructions can be performed on individual subsequences with a high confidence. The partial reconstructions are merged later to obtain the 3D model of the complete scene. In the critical configuration detection algorithm, the four critical configurations are detected: (1) coplanar 3D scene points, (2) pure camera rotation, (3) rotation around two camera centers, and (4) presence of excessive noise and outliers in the measurements. The configurations in cases (1), (2) and (4) will affect the rank of the Scaled Measurement Matrix (SMM). The number of camera centers in case (3) will affect the number of independent rows of the SMM. By examining the rank and the row space of the SMM, the abovementioned critical configurations are detected. Based on the detection results, the proposed sequence-dividing algorithm divides a long sequence into subsequences, such that each subsequence is free of the four critical configurations in order to obtain successful 3D reconstructions on individual subsequences. Experimental results on both synthetic and real sequences have demonstrated that the above four critical configurations are robustly detected, and a long sequence of thousands frames is automatically divided into subsequences, yielding successful 3D reconstructions. The proposed critical configuration detection and sequence-dividing algorithms provide an essential processing block for an automatical 3D reconstruction on long sequences. The third contribution is that we have proposed a coarse-to-fine multiple-view depth labeling algorithm to compute depth maps from multiple-view videos, where the accuracy of resulting depth maps is gradually refined in multiple optimization passes. In the proposed algorithm, multiple-view depth reconstruction is formulated as an image-based labeling problem using the framework of Maximum A Posterior (MAP) on Markov Random Fields (MRF). The MAP-MRF framework allows the combination of various objective and heuristic depth cues to define the local penalty and the interaction energies, which provides a straightforward and computationally tractable formulation. Furthermore, the global optimal MAP solution to depth labeli ing can be found by minimizing the local energies, using existing MRF optimization algorithms. The proposed algorithm contains the following three key contributions. (1) A graph construction algorithm to proposed to construct triangular meshes on over-segmentation maps, in order to exploit the color and the texture information for depth labeling. (2) Multiple depth cues are combined to define the local energies. Furthermore, the local energies are adapted to the local image content, in order to consider the varying nature of the image content for an accurate depth labeling. (3) Both the density of the graph nodes and the intervals of the depth labels are gradually refined in multiple labeling passes. By doing so, both the computational efficiency and the robustness of the depth labeling process are improved. The experimental results on real multiple-view videos show that the depth maps of for selected reference view are accurately reconstructed. Depth discontinuities are very well preserved

    Depth Estimation Using 2D RGB Images

    Get PDF
    Single image depth estimation is an ill-posed problem. That is, it is not mathematically possible to uniquely estimate the 3rd dimension (or depth) from a single 2D image. Hence, additional constraints need to be incorporated in order to regulate the solution space. As a result, in the first part of this dissertation, the idea of constraining the model for more accurate depth estimation by taking advantage of the similarity between the RGB image and the corresponding depth map at the geometric edges of the 3D scene is explored. Although deep learning based methods are very successful in computer vision and handle noise very well, they suffer from poor generalization when the test and train distributions are not close. While, the geometric methods do not have the generalization problem since they benefit from temporal information in an unsupervised manner. They are sensitive to noise, though. At the same time, explicitly modeling of a dynamic scenes as well as flexible objects in traditional computer vision methods is a big challenge. Considering the advantages and disadvantages of each approach, a hybrid method, which benefits from both, is proposed here by extending traditional geometric models’ abilities to handle flexible and dynamic objects in the scene. This is made possible by relaxing geometric computer vision rules from one motion model for some areas of the scene into one for every pixel in the scene. This enables the model to detect even small, flexible, floating debris in a dynamic scene. However, it makes the optimization under-constrained. To change the optimization from under-constrained to over-constrained while maintaining the model’s flexibility, ”moving object detection loss” and ”synchrony loss” are designed. The algorithm is trained in an unsupervised fashion. The primary results are in no way comparable to the current state of the art. Because the training process is so slow, it is difficult to compare it to the current state of the art. Also, the algorithm lacks stability. In addition, the optical flow model is extremely noisy and naive. At the end, some solutions are suggested to address these issues

    A Model-Selection Framework for Multibody Structure-and-Motion of Image Sequences

    Get PDF
    Given an image sequence of a scene consisting of multiple rigidly moving objects, multi-body structure-and-motion (MSaM) is the task to segment the image feature tracks into the different rigid objects and compute the multiple-view geometry of each object. We present a framework for multibody structure-and-motion based on model selection. In a recover-and-select procedure, a redundant set of hypothetical scene motions is generated. Each subset of this pool of motion candidates is regarded as a possible explanation of the image feature tracks, and the most likely explanation is selected with model selection. The framework is generic and can be used with any parametric camera model, or with a combination of different models. It can deal with sets of correspondences, which change over time, and it is robust to realistic amounts of outliers. The framework is demonstrated for different camera and scene model

    Theorems and algorithms for multiple view geometry with applications to electron tomography

    Get PDF
    The thesis considers both theory and algorithms for geometric computer vision. The framework of the work is built around the application of autonomous transmission electron microscope image registration. The theoretical part of the thesis first develops a consistent robust estimator that is evaluated in estimating two view geometry with both affine and projective camera models. The uncertainty of the fundamental matrix is similarly estimated robustly, and the previous observation whether the covariance matrix of the fundamental matrix contains disparity information of the scene is explained and its utilization in matching is discussed. For point tracking purposes, a reliable wavelet-based matching technique and two EM algorithms for the maximum likelihood affine reconstruction under missing data are proposed. The thesis additionally discusses identification of degeneracy as well as affine bundle adjustment. The application part of the thesis considers transmission electron microscope image registration, first with fiducial gold markers and thereafter without markers. Both methods utilize the techniques proposed in the theoretical part of the thesis and, in addition, a graph matching method is proposed for matching gold markers. Conversely, alignment without markers is disposed by tracking interest points of the intensity surface of the images. At the present level of development, the former method is more accurate but the latter is appropriate for situations where fiducial markers cannot be used. Perhaps the most significant result of the thesis is the proposed robust estimator because of consistence proof and its many application areas, which are not limited to the computer vision field. The other algorithms could be found useful in multiple view applications in computer vision that have to deal with uncertainty, matching, tracking, and reconstruction. From the viewpoint of image registration, the thesis further achieved its aims since two accurate image alignment methods are suggested for obtaining the most exact reconstructions in electron tomography.reviewe

    LiDAR-Based Object Tracking and Shape Estimation

    Get PDF
    Umfeldwahrnehmung stellt eine Grundvoraussetzung für den sicheren und komfortablen Betrieb automatisierter Fahrzeuge dar. Insbesondere bewegte Verkehrsteilnehmer in der unmittelbaren Fahrzeugumgebung haben dabei große Auswirkungen auf die Wahl einer angemessenen Fahrstrategie. Dies macht ein System zur Objektwahrnehmung notwendig, welches eine robuste und präzise Zustandsschätzung der Fremdfahrzeugbewegung und -geometrie zur Verfügung stellt. Im Kontext des automatisierten Fahrens hat sich das Box-Geometriemodell über die Zeit als Quasistandard durchgesetzt. Allerdings stellt die Box aufgrund der ständig steigenden Anforderungen an Wahrnehmungssysteme inzwischen häufig eine unerwünscht grobe Approximation der tatsächlichen Geometrie anderer Verkehrsteilnehmer dar. Dies motiviert einen Übergang zu genaueren Formrepräsentationen. In der vorliegenden Arbeit wird daher ein probabilistisches Verfahren zur gleichzeitigen Schätzung von starrer Objektform und -bewegung mittels Messdaten eines LiDAR-Sensors vorgestellt. Der Vergleich dreier Freiform-Geometriemodelle mit verschiedenen Detaillierungsgraden (Polygonzug, Dreiecksnetz und Surfel Map) gegenüber dem einfachen Boxmodell zeigt, dass die Reduktion von Modellierungsfehlern in der Objektgeometrie eine robustere und präzisere Parameterschätzung von Objektzuständen ermöglicht. Darüber hinaus können automatisierte Fahrfunktionen, wie beispielsweise ein Park- oder Ausweichassistent, von einem genaueren Wissen über die Fremdobjektform profitieren. Es existieren zwei Einflussgrößen, welche die Auswahl einer angemessenen Formrepräsentation maßgeblich beeinflussen sollten: Beobachtbarkeit (Welchen Detaillierungsgrad lässt die Sensorspezifikation theoretisch zu?) und Modell-Adäquatheit (Wie gut bildet das gegebene Modell die tatsächlichen Beobachtungen ab?). Auf Basis dieser Einflussgrößen wird in der vorliegenden Arbeit eine Strategie zur Modellauswahl vorgestellt, die zur Laufzeit adaptiv das am besten geeignete Formmodell bestimmt. Während die Mehrzahl der Algorithmen zur LiDAR-basierten Objektverfolgung ausschließlich auf Punktmessungen zurückgreift, werden in der vorliegenden Arbeit zwei weitere Arten von Messungen vorgeschlagen: Information über den vermessenen Freiraum wird verwendet, um über Bereiche zu schlussfolgern, welche nicht von Objektgeometrie belegt sein können. Des Weiteren werden LiDAR-Intensitäten einbezogen, um markante Merkmale wie Nummernschilder und Retroreflektoren zu detektieren und über die Zeit zu verfolgen. Eine ausführliche Auswertung auf über 1,5 Stunden von aufgezeichneten Fremdfahrzeugtrajektorien im urbanen Bereich und auf der Autobahn zeigen, dass eine präzise Modellierung der Objektoberfläche die Bewegungsschätzung um bis zu 30%-40% verbessern kann. Darüber hinaus wird gezeigt, dass die vorgestellten Methoden konsistente und hochpräzise Rekonstruktionen von Objektgeometrien generieren können, welche die häufig signifikante Überapproximation durch das einfache Boxmodell vermeiden

    Multiframe Motion Segmentation via Penalized MAP Estimation and Linear Programming

    Full text link
    Motion segmentation is an important topic in computer vision. In this paper, we study the problem of multi-body motion segmentation under the affine camera model. We use a mixture of subspace model to describe the multi-body motions. Then the motion segmentation problem is formulated as an MAP estimation problem with model complexity penalty. With several candidate motion models, the problem can be naturally converted into a linear programming problem, which guarantees a global optimality. The main advantages of our algorithm include: It needs no priori on the number of motions and it has comparable high segmentation accuracy with the best of motion-number-known algorithms. Experiments on benchmark data sets illustrate these points
    corecore