21 research outputs found

    Optimal Spatial Registration of SLAM for Augmented Reality

    No full text
    Augmented reality (AR) is a paradigm that aims at fusing the perceived real environment of a human with digital information located in 3D space. Typically, virtual 3D graphics are overlayed into the captured images of a moving camera or directly into the user's field-of-view by means of optical see-through displays (OST). For a correct perspective and view-dependent alignment of the visualization, it is required to solve various static and dynamic geometric registration problems in order to create the impression that the virtual and the real world are seamlessly interconnected. The advances during the last decade in the field of simultaneous localization and mapping (SLAM) represent an important contribution to this general problem. It is now possible to reconstruct the real environment and to simultaneously capture the dynamic movements of a camera from the images without having to instrument the environment in advance. However, SLAM in general can only partly solve the entire registration problem, because the retrieved 3D scene geometry and the calculated motion path are spatially related only with regard to an arbitrarily selected coordinate system. Without a proper reconciliation of coordinate systems (spatial registration), the real world of the human observer still remains decoupled from the virtual world. Existing approaches for solving this problem either require the availability of a virtual 3D model that represents a real object with sufficient accuracy (model-based tracking), or they rely on use-case specific assumptions and additional sensor data (such as GPS signals or the Manhattan-world assumption). Therefore, these approaches are bound to these additional prerequisites, which limit the general applicability. The circumstance that automated registration is desirable but not always possible, creates the need for techniques that allow a user to specify connections between the real and the virtual world when setting up AR applications, so that it becomes possible to support and control the process of registration. These techniques must be complemented with numerical algorithms that optimally exploit the provided information to obtain precise registration results. Within this context, the present thesis provides the following contributions. * We propose a novel, closed-form (non-iterative) algorithm for calculating a Euclidean or a similarity transformation. The presented algorithm is a generalization of recent state-of-the-art solvers for computing the camera pose based on 2D measurement points in the image (perspective-n-point problem) - a fundamental problem in computer vision that has attracted research for many decades. The generalization consists in extending and unifying these algorithms, so that they can handle other types of input correspondences than originally designed for. With this algorithm, it becomes possible to perform a rigid registration of SLAM systems to a target coordinate system based on heterogeneous and partially indeterminate input data. * We address the global refinement of structure and motion parameters by means of iterative sparse minimization (bundle adjustment or BA), which has become a standard technique inside SLAM systems. We propose a variant of BA in which information about the virtual domain is integrated as constraints by means of an optimization-on-manifold approach. This serves for compensating low-frequency deformations (non-rigid registration) of the estimated camera path and the reconstructed scene geometry caused by measurement error accumulation and the ill-conditionedness of the BA problem. * We present two approaches in which a user can contribute with his knowledge for registering a SLAM system. In a first variant, the user can place markers in the real environment with predefined connections to the virtual coordinate system. Precise positioning of the markers is not required, rather they can be placed arbitrarily on surfaces or along edges, which notably facilitates the preparative effort. During run-time, the dispersed information is collected and registration is accomplished automatically. In a second variant, the user is given the possibility to mark salient points in an image sequence during a preparative preprocessing step and to assign corresponding points in the virtual 3D space via a simple point-and-click metaphor. The result of this preparative phase is a precisely registered and ready-to-use reference model for camera tracking at run-time. * Finally, we propose an approach for geometric calibration of optical see-trough displays. We present a parametric model, which allows to dynamically adapt the rendering of virtual 3D content to the current viewpoint of the human observer, including a pre-correction of image aberrations caused by the optics or irregularly curved combiners. In order to retrieve its parameters, we propose a camera-based approach, in which elements of the real and the virtual domain are simultaneously observed. The calibration procedure was developed for a head-up display in a vehicle. A prototypical extension to head-mounted displays is also presented

    Optimal Spatial Registration of SLAM for Augmented Reality

    Get PDF
    Augmented reality (AR) is a paradigm that aims at fusing the perceived real environment of a human with digital information located in 3D space. Typically, virtual 3D graphics are overlayed into the captured images of a moving camera or directly into the user's field-of-view by means of optical see-through displays (OST). For a correct perspective and view-dependent alignment of the visualization, it is required to solve various static and dynamic geometric registration problems in order to create the impression that the virtual and the real world are seamlessly interconnected. The advances during the last decade in the field of simultaneous localization and mapping (SLAM) represent an important contribution to this general problem. It is now possible to reconstruct the real environment and to simultaneously capture the dynamic movements of a camera from the images without having to instrument the environment in advance. However, SLAM in general can only partly solve the entire registration problem, because the retrieved 3D scene geometry and the calculated motion path are spatially related only with regard to an arbitrarily selected coordinate system. Without a proper reconciliation of coordinate systems (spatial registration), the real world of the human observer still remains decoupled from the virtual world. Existing approaches for solving this problem either require the availability of a virtual 3D model that represents a real object with sufficient accuracy (model-based tracking), or they rely on use-case specific assumptions and additional sensor data (such as GPS signals or the Manhattan-world assumption). Therefore, these approaches are bound to these additional prerequisites, which limit the general applicability. The circumstance that automated registration is desirable but not always possible, creates the need for techniques that allow a user to specify connections between the real and the virtual world when setting up AR applications, so that it becomes possible to support and control the process of registration. These techniques must be complemented with numerical algorithms that optimally exploit the provided information to obtain precise registration results. Within this context, the present thesis provides the following contributions. * We propose a novel, closed-form (non-iterative) algorithm for calculating a Euclidean or a similarity transformation. The presented algorithm is a generalization of recent state-of-the-art solvers for computing the camera pose based on 2D measurement points in the image (perspective-n-point problem) - a fundamental problem in computer vision that has attracted research for many decades. The generalization consists in extending and unifying these algorithms, so that they can handle other types of input correspondences than originally designed for. With this algorithm, it becomes possible to perform a rigid registration of SLAM systems to a target coordinate system based on heterogeneous and partially indeterminate input data. * We address the global refinement of structure and motion parameters by means of iterative sparse minimization (bundle adjustment or BA), which has become a standard technique inside SLAM systems. We propose a variant of BA in which information about the virtual domain is integrated as constraints by means of an optimization-on-manifold approach. This serves for compensating low-frequency deformations (non-rigid registration) of the estimated camera path and the reconstructed scene geometry caused by measurement error accumulation and the ill-conditionedness of the BA problem. * We present two approaches in which a user can contribute with his knowledge for registering a SLAM system. In a first variant, the user can place markers in the real environment with predefined connections to the virtual coordinate system. Precise positioning of the markers is not required, rather they can be placed arbitrarily on surfaces or along edges, which notably facilitates the preparative effort. During run-time, the dispersed information is collected and registration is accomplished automatically. In a second variant, the user is given the possibility to mark salient points in an image sequence during a preparative preprocessing step and to assign corresponding points in the virtual 3D space via a simple point-and-click metaphor. The result of this preparative phase is a precisely registered and ready-to-use reference model for camera tracking at run-time. * Finally, we propose an approach for geometric calibration of optical see-trough displays. We present a parametric model, which allows to dynamically adapt the rendering of virtual 3D content to the current viewpoint of the human observer, including a pre-correction of image aberrations caused by the optics or irregularly curved combiners. In order to retrieve its parameters, we propose a camera-based approach, in which elements of the real and the virtual domain are simultaneously observed. The calibration procedure was developed for a head-up display in a vehicle. A prototypical extension to head-mounted displays is also presented

    Unifying Algebraic Solvers for Scaled Euclidean Registration from Point, Line and Plane Constraints

    No full text
    We investigate recent state-of-the-art algorithms for absolute pose problems (PnP and GPnP) and analyse their applicability to a more general type, namely the scaled Euclidean registration from pointto- point, point-to-line and point-to plane correspondences. Similar to previous formulations we first compress the original set of equations to a least squares error function that only depends on the non-linear rotation parameters and a small symmetric coefficient matrix of fixed size. Then, in a second step the rotation is solved with algorithms which are derived using methods from algebraic geometry such as the Gröbner basis method. In previous approaches the first compression step was usually tailored to a specific correspondence types and problem instances. Here, we propose a unified formulation based on a representation with orthogonal complements which allows to combine different types of constraints elegantly in one single framework. We show that with our unified formulation existing polynomial solvers can be interchangeably applied to problem instances other than those they were originally proposed for. It becomes possible to compare them on various registrations problems with respect to accuracy, numerical stability, and computational speed. Our compression procedure not only preserves linear complexity, it is even faster than previous formulations. For the second step we also derive an own algebraic equation solver, which can additionally handle the registration from 3D point-to-point correspondences, where other solvers surprisingly fail

    Reconstruction and accurate alignment of feature maps for augmented reality

    No full text
    This paper focuses on the preparative process of retrieving accurate feature maps for a camera-based tracking system. With this system it is possible to create ready-touse Augmented Reality applications with a very easy setup work-flow, which in practice only involves three steps: filming the object or environment from various viewpoints, defining a transformation between the reconstructed map and the target coordinate frame based on a small number of 3D-3D correspondences and, finally, initiating a feature learning and Bundle Adjustment step. Technically, the solution comprises several sub-algorithms. Given the image sequence provided by the user, a feature map is initially reconstructed and incrementally extended using a Simultaneous-Localization-and-Mapping (SLAM) approach. For the automatic initialization of the SLAM module, a method for detecting the amount of translation is proposed. Since the initially reconstructed map is defined in an arbitrary coordinate system, we present a method for optimally aligning the feature map to the target coordinated frame of the augmentation models based on 3D-3D correspondences defined by the user. As an initial estimate we solve for a rigid transformation with scaling, known as Absolute Orientation. For refinement of the alignment we present a modification of the well-known Bundle Adjustment, where we include these 3D-3D-correspondences as constraints. Compared to ordinary Bundle Adjustment we show that this leads to significantly more accurate reconstructions, since map deformations due to systematic errors such as small camera calibration errors or outliers are well compensated. This again results in a better alignment of the augmentations during run-time of the application, even in large-scale environments

    Adaptable model-based tracking using analysis-by-synthesis techniques

    No full text
    In this paper we present a novel analysis-by-synthesis approach for real-time camera tracking in industrial scenarios. The camera pose estimation is based on the tracking of line features which are generated dynamically in every frame by rendering a polygonal model and extracting contours out of the rendered scene. Different methods of the line model generation are investigated. Depending on the scenario and the given 3D model either the image gradient of the frame buffer or discontinuities of the z-buffer and the normal map are used for the generation of a 2D edge map. The 3D control points on a contour are calculated by using the depth value stored in the z-buffer. By aligning the generated features with edges in the current image, the extrinsic parameters of the camera are estimated. The camera pose used for rendering is predicted by a line-based frame-to-frame tracking which takes advantage of the generated edge features. The method is validated and evaluated with the help of ground-truth data as well as real image sequences

    Composing the feature map retrieval process for robust and ready-to-use monocular tracking

    No full text
    This paper focuses on the preparative process of natural feature map retrieval for a mobile camera-based tracking system. We cover the most important aspects of a general purpose tracking system including the acquisition of the scene's geometry, tracking initialization and fast and accurate frame-by-frame tracking. To this end, several state-of-the-art techniques - each targeted at one particular subproblem - are fused together, whereby their interplay and complementary benefits form the core of the system and the thread of our discussion. The choice of the individual sub-algorithms in our system reflects the scarcity of computational resources on mobile devices. In order to allow a more accurate, more robust and faster tracking during run-time, we therefore transfer the computational load into the preparative customization step wherever possible. From the viewpoint of the user, the preparative stage is kept very simple. It only involves recording the scene from various viewpoints and defining a transformation into a target coordinate frame via manual definition of only a few 3D to 3D point correspondences. Technically, the image sequence is used to (1) capture the scene's geometry by a SLAM-Method and subsequent refinement via constrained Bundle Adjustment, (2) to train a Randomized-Trees classifier for wide-baseline tracking initialization, and (3) to analyze the view-point dependent visibility of each feature. During run-time, robustness and performance of the frame-to-frame tracking are further increased by fusing inertial measurements within a combined pose estimation

    Linear-projection-based classification of human postures in time-of-flight data

    No full text
    This paper presents a simple yet effective approach for classification of human postures by using a time-of-flight camera. We investigate and adopt linear projection techniques such as Locality Preserving Projections (LPP), Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA), which are more widespread in face recognition and other pattern recognition tasks.We analyze the relations between LPP and LDA and show experimentally that using LPP in a supervised manner effectively yields very similar results as LDA, implying that LPP may be regarded as a generalization of LDA. Features for offline training and online classification are created by adopting common image processing techniques such as background-subtraction and blob detection to the time-of-flight data

    Augmented-Reality-basierte Interaktion mit Smartphone-Systemen zur UnterstĂĽtzung von Servicetechnikern

    No full text
    The use of smartphone systems requires new interaction paradigms that can process the integrated sensory (GPS, inertial, compass) but that in particular benefits from the integrated video camera used to capture the environment. In this context Augmented Reality offers high potential to support industrial maintenance and service procedures

    A camera-based calibration for automotive augmented reality Head-Up-Displays

    No full text
    Using Head-up-Displays (HUD) for Augmented Reality requires to have an accurate internal model of the image generation process, so that 3D content can be visualized perspectively correct from the viewpoint of the user. We present a generic and cost-effective camera-based calibration for an automotive HUD which uses the windshield as a combiner. Our proposed calibration model encompasses the view-independent spatial geometry, i.e. the exact location, orientation and scaling of the virtual plane, and a view-dependent image warping transformation for correcting the distortions caused by the optics and the irregularly curved windshield. View-dependency is achieved by extending the classical polynomial distortion model for cameras and projectors to a generic five-variate mapping with the head position of the viewer as additional input. The calibration involves the capturing of an image sequence from varying viewpoints, while displaying a known target pattern on the HUD. The accurate registration of the camera path is retrieved with state-of-the-art vision-based tracking. As all necessary data is acquired directly from the images, no external tracking equipment needs to be installed. After calibration, the HUD can be used together with a head-tracker to form a head-coupled display which ensures a perspectively correct rendering of any 3D object in vehicle coordinates from a large range of possible viewpoints. We evaluate the accuracy of our model quantitatively and qualitatively

    A universal, closed-form approach for absolute pose problems

    No full text
    We propose a general approach for absolute pose problems including the well known perspective-n-point (PnP) problem, its generalized variant (GPnP) with and without scale, and the pose from 2D line correspondences (PnL). These have received a tremendous attention in the computer vision community during the last decades. However, it was only recently that efficient, globally optimal, closed-form solutions have been proposed, which can handle arbitrary numbers of correspondences including minimal configurations as well as over-constrained cases with linear complexity. We follow the general scheme by eliminating the linear parameters first, which results in a least squares error function that only depends on the non-linear rotation and a small symmetric coefficient matrix of fixed size. Then, in a second step the rotation is solved with algorithms which are derived using methods from algebraic geometry such as the Gröbner basis method. We propose a unified formulation based on a representation with orthogonal complements which allows to combine different types of constraints elegantly in one single framework. We show that with our unified formulation existing polynomial solvers can be interchangeably applied to problem instances other than those they were originally proposed for. It becomes possible to compare them on various registrations problems with respect to accuracy, numerical stability, and computational speed. Our compression procedure not only preserves linear complexity, it is even faster than previous formulations. For the second step we also derive an own algebraic equation solver, which can additionally handle the registration from 3D point-to-point correspondences, where other rotation solvers fail. Finally, we also present a marker-based SLAM approach with automatic registration to a target coordinate system based on partial and distributed reference information. It represents an application example that goes beyond classical camera pose estimation from image measurements and also serves for evaluation on real data
    corecore