117 research outputs found

    Autonomous vision-based terrain-relative navigation for planetary exploration

    Get PDF
    Abstract: The interest of major space agencies in the world for vision sensors in their mission designs has been increasing over the years. Indeed, cameras offer an efficient solution to address the ever-increasing requirements in performance. In addition, these sensors are multipurpose, lightweight, proven and a low-cost technology. Several researchers in vision sensing for space application currently focuse on the navigation system for autonomous pin-point planetary landing and for sample and return missions to small bodies. In fact, without a Global Positioning System (GPS) or radio beacon around celestial bodies, high-accuracy navigation around them is a complex task. Most of the navigation systems are based only on accurate initialization of the states and on the integration of the acceleration and the angular rate measurements from an Inertial Measurement Unit (IMU). This strategy can track very accurately sudden motions of short duration, but their estimate diverges in time and leads normally to high landing error. In order to improve navigation accuracy, many authors have proposed to fuse those IMU measurements with vision measurements using state estimators, such as Kalman filters. The first proposed vision-based navigation approach relies on feature tracking between sequences of images taken in real time during orbiting and/or landing operations. In that case, image features are image pixels that have a high probability of being recognized between images taken from different camera locations. By detecting and tracking these features through a sequence of images, the relative motion of the spacecraft can be determined. This technique, referred to as Terrain-Relative Relative Navigation (TRRN), relies on relatively simple, robust and well-developed image processing techniques. It allows the determination of the relative motion (velocity) of the spacecraft. Despite the fact that this technology has been demonstrated with space qualified hardware, its gain in accuracy remains limited since the spacecraft absolute position is not observable from the vision measurements. The vision-based navigation techniques currently studied consist in identifying features and in mapping them into an on-board cartographic database indexed by an absolute coordinate system, thereby providing absolute position determination. This technique, referred to as Terrain-Relative Absolute Navigation (TRAN), relies on very complex Image Processing Software (IPS) having an obvious lack of robustness. In fact, these software depend often on the spacecraft attitude and position, they are sensitive to illumination conditions (the elevation and azimuth of the Sun when the geo-referenced database is built must be similar to the ones present during mission), they are greatly influenced by the image noise and finally they hardly manage multiple varieties of terrain seen during the same mission (the spacecraft can fly over plain zone as well as mountainous regions, the images may contain old craters with noisy rims as well as young crater with clean rims and so on). At this moment, no real-time hardware-in-the-loop experiment has been conducted to demonstrate the applicability of this technology to space mission. The main objective of the current study is to develop autonomous vision-based navigation algorithms that provide absolute position and surface-relative velocity during the proximity operations of a planetary mission (orbiting phase and landing phase) using a combined approach of TRRN and TRAN technologies. The contributions of the study are: (1) reference mission definition, (2) advancements in the TRAN theory (image processing as well as state estimation) and (3) practical implementation of vision-based navigation.Résumé: L’intérêt des principales agences spatiales envers les technologies basées sur la vision artificielle ne cesse de croître. En effet, les caméras offrent une solution efficace pour répondre aux exigences de performance, toujours plus élevées, des missions spatiales. De surcroît, ces capteurs sont multi-usages, légers, éprouvés et peu coûteux. Plusieurs chercheurs dans le domaine de la vision artificielle se concentrent actuellement sur les systèmes autonomes pour l’atterrissage de précision sur des planètes et sur les missions d’échantillonnage sur des astéroïdes. En effet, sans système de positionnement global « Global Positioning System (GPS) » ou de balises radio autour de ces corps célestes, la navigation de précision est une tâche très complexe. La plupart des systèmes de navigation sont basés seulement sur l’intégration des mesures provenant d’une centrale inertielle. Cette stratégie peut être utilisée pour suivre les mouvements du véhicule spatial seulement sur une courte durée, car les données estimées divergent rapidement. Dans le but d’améliorer la précision de la navigation, plusieurs auteurs ont proposé de fusionner les mesures provenant de la centrale inertielle avec des mesures d’images du terrain. Les premiers algorithmes de navigation utilisant l’imagerie du terrain qui ont été proposés reposent sur l’extraction et le suivi de traits caractéristiques dans une séquence d’images prises en temps réel pendant les phases d’orbite et/ou d’atterrissage de la mission. Dans ce cas, les traits caractéristiques de l’image correspondent à des pixels ayant une forte probabilité d’être reconnus entre des images prises avec différentes positions de caméra. En détectant et en suivant ces traits caractéristiques, le déplacement relatif du véhicule (la vitesse) peut être déterminé. Ces techniques, nommées navigation relative, utilisent des algorithmes de traitement d’images robustes, faciles à implémenter et bien développés. Bien que cette technologie a été éprouvée sur du matériel de qualité spatiale, le gain en précision demeure limité étant donné que la position absolue du véhicule n’est pas observable dans les mesures extraites de l’image. Les techniques de navigation basées sur la vision artificielle actuellement étudiées consistent à identifier des traits caractéristiques dans l’image pour les apparier avec ceux contenus dans une base de données géo-référencées de manière à fournir une mesure de position absolue au filtre de navigation. Cependant, cette technique, nommée navigation absolue, implique l’utilisation d’algorithmes de traitement d’images très complexes souffrant pour le moment des problèmes de robustesse. En effet, ces algorithmes dépendent souvent de la position et de l’attitude du véhicule. Ils sont très sensibles aux conditions d’illuminations (l’élévation et l’azimut du Soleil présents lorsque la base de données géo-référencée est construite doit être similaire à ceux observés pendant la mission). Ils sont grandement influencés par le bruit dans l’image et enfin ils supportent mal les multiples variétés de terrain rencontrées pendant la même mission (le véhicule peut survoler autant des zones de plaine que des régions montagneuses, les images peuvent contenir des vieux cratères avec des contours flous aussi bien que des cratères jeunes avec des contours bien définis, etc.). De plus, actuellement, aucune expérimentation en temps réel et sur du matériel de qualité spatiale n’a été réalisée pour démontrer l’applicabilité de cette technologie pour les missions spatiales. Par conséquent, l’objectif principal de ce projet de recherche est de développer un système de navigation autonome par imagerie du terrain qui fournit la position absolue et la vitesse relative au terrain d’un véhicule spatial pendant les opérations à basse altitude sur une planète. Les contributions de ce travail sont : (1) la définition d’une mission de référence, (2) l’avancement de la théorie de la navigation par imagerie du terrain (algorithmes de traitement d’images et estimation d’états) et (3) implémentation pratique de cette technologie

    Quantization, Calibration and Planning for Euclidean Motions in Robotic Systems

    Get PDF
    The properties of Euclidean motions are fundamental in all areas of robotics research. Throughout the past several decades, investigations on some low-level tasks like parameterizing specific movements and generating effective motion plans have fostered high-level operations in an autonomous robotic system. In typical applications, before executing robot motions, a proper quantization of basic motion primitives could simplify online computations; a precise calibration of sensor readings could elevate the accuracy of the system controls. Of particular importance in the whole autonomous robotic task, a safe and efficient motion planning framework would make the whole system operate in a well-organized and effective way. All these modules encourage huge amounts of efforts in solving various fundamental problems, such as the uniformity of quantization in non-Euclidean manifolds, the calibration errors on unknown rigid transformations due to the lack of data correspondence and noise, the narrow passage and the curse of dimensionality bottlenecks in developing motion planning algorithms, etc. Therefore, the goal of this dissertation is to tackle these challenges in the topics of quantization, calibration and planning for Euclidean motions

    Model-Based Environmental Visual Perception for Humanoid Robots

    Get PDF
    The visual perception of a robot should answer two fundamental questions: What? and Where? In order to properly and efficiently reply to these questions, it is essential to establish a bidirectional coupling between the external stimuli and the internal representations. This coupling links the physical world with the inner abstraction models by sensor transformation, recognition, matching and optimization algorithms. The objective of this PhD is to establish this sensor-model coupling

    Automatic video segmentation employing object/camera modeling techniques

    Get PDF
    Practically established video compression and storage techniques still process video sequences as rectangular images without further semantic structure. However, humans watching a video sequence immediately recognize acting objects as semantic units. This semantic object separation is currently not reflected in the technical system, making it difficult to manipulate the video at the object level. The realization of object-based manipulation will introduce many new possibilities for working with videos like composing new scenes from pre-existing video objects or enabling user-interaction with the scene. Moreover, object-based video compression, as defined in the MPEG-4 standard, can provide high compression ratios because the foreground objects can be sent independently from the background. In the case that the scene background is static, the background views can even be combined into a large panoramic sprite image, from which the current camera view is extracted. This results in a higher compression ratio since the sprite image for each scene only has to be sent once. A prerequisite for employing object-based video processing is automatic (or at least user-assisted semi-automatic) segmentation of the input video into semantic units, the video objects. This segmentation is a difficult problem because the computer does not have the vast amount of pre-knowledge that humans subconsciously use for object detection. Thus, even the simple definition of the desired output of a segmentation system is difficult. The subject of this thesis is to provide algorithms for segmentation that are applicable to common video material and that are computationally efficient. The thesis is conceptually separated into three parts. In Part I, an automatic segmentation system for general video content is described in detail. Part II introduces object models as a tool to incorporate userdefined knowledge about the objects to be extracted into the segmentation process. Part III concentrates on the modeling of camera motion in order to relate the observed camera motion to real-world camera parameters. The segmentation system that is described in Part I is based on a background-subtraction technique. The pure background image that is required for this technique is synthesized from the input video itself. Sequences that contain rotational camera motion can also be processed since the camera motion is estimated and the input images are aligned into a panoramic scene-background. This approach is fully compatible to the MPEG-4 video-encoding framework, such that the segmentation system can be easily combined with an object-based MPEG-4 video codec. After an introduction to the theory of projective geometry in Chapter 2, which is required for the derivation of camera-motion models, the estimation of camera motion is discussed in Chapters 3 and 4. It is important that the camera-motion estimation is not influenced by foreground object motion. At the same time, the estimation should provide accurate motion parameters such that all input frames can be combined seamlessly into a background image. The core motion estimation is based on a feature-based approach where the motion parameters are determined with a robust-estimation algorithm (RANSAC) in order to distinguish the camera motion from simultaneously visible object motion. Our experiments showed that the robustness of the original RANSAC algorithm in practice does not reach the theoretically predicted performance. An analysis of the problem has revealed that this is caused by numerical instabilities that can be significantly reduced by a modification that we describe in Chapter 4. The synthetization of static-background images is discussed in Chapter 5. In particular, we present a new algorithm for the removal of the foreground objects from the background image such that a pure scene background remains. The proposed algorithm is optimized to synthesize the background even for difficult scenes in which the background is only visible for short periods of time. The problem is solved by clustering the image content for each region over time, such that each cluster comprises static content. Furthermore, it is exploited that the times, in which foreground objects appear in an image region, are similar to the corresponding times of neighboring image areas. The reconstructed background could be used directly as the sprite image in an MPEG-4 video coder. However, we have discovered that the counterintuitive approach of splitting the background into several independent parts can reduce the overall amount of data. In the case of general camera motion, the construction of a single sprite image is even impossible. In Chapter 6, a multi-sprite partitioning algorithm is presented, which separates the video sequence into a number of segments, for which independent sprites are synthesized. The partitioning is computed in such a way that the total area of the resulting sprites is minimized, while simultaneously satisfying additional constraints. These include a limited sprite-buffer size at the decoder, and the restriction that the image resolution in the sprite should never fall below the input-image resolution. The described multisprite approach is fully compatible to the MPEG-4 standard, but provides three advantages. First, any arbitrary rotational camera motion can be processed. Second, the coding-cost for transmitting the sprite images is lower, and finally, the quality of the decoded sprite images is better than in previously proposed sprite-generation algorithms. Segmentation masks for the foreground objects are computed with a change-detection algorithm that compares the pure background image with the input images. A special effect that occurs in the change detection is the problem of image misregistration. Since the change detection compares co-located image pixels in the camera-motion compensated images, a small error in the motion estimation can introduce segmentation errors because non-corresponding pixels are compared. We approach this problem in Chapter 7 by integrating risk-maps into the segmentation algorithm that identify pixels for which misregistration would probably result in errors. For these image areas, the change-detection algorithm is modified to disregard the difference values for the pixels marked in the risk-map. This modification significantly reduces the number of false object detections in fine-textured image areas. The algorithmic building-blocks described above can be combined into a segmentation system in various ways, depending on whether camera motion has to be considered or whether real-time execution is required. These different systems and example applications are discussed in Chapter 8. Part II of the thesis extends the described segmentation system to consider object models in the analysis. Object models allow the user to specify which objects should be extracted from the video. In Chapters 9 and 10, a graph-based object model is presented in which the features of the main object regions are summarized in the graph nodes, and the spatial relations between these regions are expressed with the graph edges. The segmentation algorithm is extended by an object-detection algorithm that searches the input image for the user-defined object model. We provide two objectdetection algorithms. The first one is specific for cartoon sequences and uses an efficient sub-graph matching algorithm, whereas the second processes natural video sequences. With the object-model extension, the segmentation system can be controlled to extract individual objects, even if the input sequence comprises many objects. Chapter 11 proposes an alternative approach to incorporate object models into a segmentation algorithm. The chapter describes a semi-automatic segmentation algorithm, in which the user coarsely marks the object and the computer refines this to the exact object boundary. Afterwards, the object is tracked automatically through the sequence. In this algorithm, the object model is defined as the texture along the object contour. This texture is extracted in the first frame and then used during the object tracking to localize the original object. The core of the algorithm uses a graph representation of the image and a newly developed algorithm for computing shortest circular-paths in planar graphs. The proposed algorithm is faster than the currently known algorithms for this problem, and it can also be applied to many alternative problems like shape matching. Part III of the thesis elaborates on different techniques to derive information about the physical 3-D world from the camera motion. In the segmentation system, we employ camera-motion estimation, but the obtained parameters have no direct physical meaning. Chapter 12 discusses an extension to the camera-motion estimation to factorize the motion parameters into physically meaningful parameters (rotation angles, focal-length) using camera autocalibration techniques. The speciality of the algorithm is that it can process camera motion that spans several sprites by employing the above multi-sprite technique. Consequently, the algorithm can be applied to arbitrary rotational camera motion. For the analysis of video sequences, it is often required to determine and follow the position of the objects. Clearly, the object position in image coordinates provides little information if the viewing direction of the camera is not known. Chapter 13 provides a new algorithm to deduce the transformation between the image coordinates and the real-world coordinates for the special application of sport-video analysis. In sport videos, the camera view can be derived from markings on the playing field. For this reason, we employ a model of the playing field that describes the arrangement of lines. After detecting significant lines in the input image, a combinatorial search is carried out to establish correspondences between lines in the input image and lines in the model. The algorithm requires no information about the specific color of the playing field and it is very robust to occlusions or poor lighting conditions. Moreover, the algorithm is generic in the sense that it can be applied to any type of sport by simply exchanging the model of the playing field. In Chapter 14, we again consider panoramic background images and particularly focus ib their visualization. Apart from the planar backgroundsprites discussed previously, a frequently-used visualization technique for panoramic images are projections onto a cylinder surface which is unwrapped into a rectangular image. However, the disadvantage of this approach is that the viewer has no good orientation in the panoramic image because he looks into all directions at the same time. In order to provide a more intuitive presentation of wide-angle views, we have developed a visualization technique specialized for the case of indoor environments. We present an algorithm to determine the 3-D shape of the room in which the image was captured, or, more generally, to compute a complete floor plan if several panoramic images captured in each of the rooms are provided. Based on the obtained 3-D geometry, a graphical model of the rooms is constructed, where the walls are displayed with textures that are extracted from the panoramic images. This representation enables to conduct virtual walk-throughs in the reconstructed room and therefore, provides a better orientation for the user. Summarizing, we can conclude that all segmentation techniques employ some definition of foreground objects. These definitions are either explicit, using object models like in Part II of this thesis, or they are implicitly defined like in the background synthetization in Part I. The results of this thesis show that implicit descriptions, which extract their definition from video content, work well when the sequence is long enough to extract this information reliably. However, high-level semantics are difficult to integrate into the segmentation approaches that are based on implicit models. Intead, those semantics should be added as postprocessing steps. On the other hand, explicit object models apply semantic pre-knowledge at early stages of the segmentation. Moreover, they can be applied to short video sequences or even still pictures since no background model has to be extracted from the video. The definition of a general object-modeling technique that is widely applicable and that also enables an accurate segmentation remains an important yet challenging problem for further research

    Affine multi-view modelling for close range object measurement

    Get PDF
    In photogrammetry, sensor modelling with 3D point estimation is a fundamental topic of research. Perspective frame cameras offer the mathematical basis for close range modelling approaches. The norm is to employ robust bundle adjustments for simultaneous parameter estimation and 3D object measurement. In 2D to 3D modelling strategies image resolution, scale, sampling and geometric distortion are prior factors. Non-conventional image geometries that implement uncalibrated cameras are established in computer vision approaches; these aim for fast solutions at the expense of precision. The projective camera is defined in homogeneous terms and linear algorithms are employed. An attractive sensor model disembodied from projective distortions is the affine. Affine modelling has been studied in the contexts of geometry recovery, feature detection and texturing in vision, however multi-view approaches for precise object measurement are not yet widely available. This project investigates affine multi-view modelling from a photogrammetric standpoint. A new affine bundle adjustment system has been developed for point-based data observed in close range image networks. The system allows calibration, orientation and 3D point estimation. It is processed as a least squares solution with high redundancy providing statistical analysis. Starting values are recovered from a combination of implicit perspective and explicit affine approaches. System development focuses on retrieval of orientation parameters, 3D point coordinates and internal calibration with definition of system datum, sensor scale and radial lens distortion. Algorithm development is supported with method description by simulation. Initialization and implementation are evaluated with the statistical indicators, algorithm convergence and correlation of parameters. Object space is assessed with evaluation of the 3D point correlation coefficients and error ellipsoids. Sensor scale is checked with comparison of camera systems utilizing quality and accuracy metrics. For independent method evaluation, testing is implemented over a perspective bundle adjustment tool with similar indicators. Test datasets are initialized from precise reference image networks. Real affine image networks are acquired with an optical system (~1M pixel CCD cameras with 0.16x telecentric lens). Analysis of tests ascertains that the affine method results in an RMS image misclosure at a sub-pixel level and precisions of a few tenths of microns in object space

    Towards 3D Scanning from Digital Images by Novice Users

    Get PDF
    The uptake of hobbyist 3D printers is being held back, in part, due to the barriers associated with creating a computer model to be printed. One way of creating such a computer model is to take a 3D scan of a pre-existing object using multiple digital images of the object showing the object from different points of view. This document details one way of doing this, with particular emphasis on camera calibration: the process of estimating camera parameters for the camera that took an image. In common calibration scenarios, multiple images are used where it is assumed that the internal parameters, such as zoom and focus settings, are fixed between images and the relative placement of the camera between images needs to be estimated. This is not ideal for a novice doing 3D scanning with a “point and shoot” camera where these internal parameters may not have been held fixed between images. A common coordinate system between images with a known relationship to real-world measurements is also desirable. Additionally, in some 3D scanning scenarios that use digital images, where it is expected that a trained individual will be doing the photography and internal settings can be held constant throughout the process, the images used for doing the calibration are different from those that are used to do the object capture. A technique has been developed to overcome these shortcomings. It uses a known printed sheet of paper, called the calibration sheet, that the object to be scanned sits on so that object acquisition and camera calibration can be done from the same image. Each image is processed independently with reference to the known size of the calibration sheet so the output is automatically to scale and minor camera calibration errors with one image do not propagate and affect estimates of camera calibration parameters for other images. The calibration process developed is also one that will work where large parts of the calibration sheet are obscured

    Visual Perception for Manipulation and Imitation in Humanoid Robots

    Get PDF
    This thesis deals with visual perception for manipulation and imitation in humanoid robots. In particular, real-time applicable methods for object recognition and pose estimation as well as for markerless human motion capture have been developed. As only sensor a small baseline stereo camera system (approx. human eye distance) was used. An extensive experimental evaluation has been performed on simulated as well as real image data from real-world scenarios using the humanoid robot ARMAR-III

    Reconstructing plant architecture from 3D laser scanner data

    Full text link
    En infographie, les modèles virtuels de plantes sont de plus en plus réalistes visuellement. Cependant, dans le contexte de la biologie et l'agronomie, l'acquisition de modèles précis de plantes réelles reste un problème majeur pour la construction de modèles quantitatifs du développement des plantes. Récemment, des scanners laser 3D permettent d'acquérir des images 3D avec pour chaque pixel une profondeur correspondant à la distance entre le scanner et la surface de l'objet visé. Cependant, une plante est généralement un ensemble important de petites surfaces sur lesquelles les méthodes classiques de reconstruction échouent. Dans cette thèse, nous présentons une méthode pour reconstruire des modèles virtuels de plantes à partir de scans laser. Mesurer des plantes avec un scanner laser produit des données avec différents niveaux de précision. Les scans sont généralement denses sur la surface des branches principales mais recouvrent avec peu de points les branches fines. Le cur de notre méthode est de créer itérativement un squelette de la structure de la plante en fonction de la densité locale de points. Pour cela, une méthode localement adaptative a été développée qui combine une phase de contraction et un algorithme de suivi de points. Nous présentons également une procédure d'évaluation quantitative pour comparer nos reconstructions avec des structures reconstruites par des experts de plantes réelles. Pour cela, nous explorons d'abord l'utilisation d'une distance d'édition entre arborescence. Finalement, nous formalisons la comparaison sous forme d'un problème d'assignation pour trouver le meilleur appariement entre deux structures et quantifier leurs différences. (Résumé d'auteur
    • …
    corecore