    A Study of Projections for Key Point Based Registration of Panoramic Terrestrial 3D Laser Scans

    Abstract This paper surveys state of the art image features and descriptors for the task of 3D scan registration based on panoramic reflectance images. As modern terrestrial laser scanners digitize their environment in a spherical way, the sphere has to be projected to a two-dimensional image. To this end, we evaluate the equirectangular, the cylindrical, the Mercator, the rectilinear, the Pannini, the stereographic, and the z-axis projection. We show that the Mercator and the Pannini projection outperform the other projection methods

    Indoor Mapping and Reconstruction with Mobile Augmented Reality Sensor Systems

    Augmented Reality (AR) ermöglicht es, virtuelle, dreidimensionale Inhalte direkt innerhalb der realen Umgebung darzustellen. Anstatt jedoch beliebige virtuelle Objekte an einem willkürlichen Ort anzuzeigen, kann AR Technologie auch genutzt werden, um Geodaten in situ an jenem Ort darzustellen, auf den sich die Daten beziehen. Damit eröffnet AR die Möglichkeit, die reale Welt durch virtuelle, ortbezogene Informationen anzureichern. Im Rahmen der vorliegenen Arbeit wird diese Spielart von AR als "Fused Reality" definiert und eingehend diskutiert. Der praktische Mehrwert, den dieses Konzept der Fused Reality bietet, lässt sich gut am Beispiel seiner Anwendung im Zusammenhang mit digitalen Gebäudemodellen demonstrieren, wo sich gebäudespezifische Informationen - beispielsweise der Verlauf von Leitungen und Kabeln innerhalb der Wände - lagegerecht am realen Objekt darstellen lassen. Um das skizzierte Konzept einer Indoor Fused Reality Anwendung realisieren zu können, müssen einige grundlegende Bedingungen erfüllt sein. So kann ein bestimmtes Gebäude nur dann mit ortsbezogenen Informationen augmentiert werden, wenn von diesem Gebäude ein digitales Modell verfügbar ist. Zwar werden größere Bauprojekt heutzutage oft unter Zuhilfename von Building Information Modelling (BIM) geplant und durchgeführt, sodass ein digitales Modell direkt zusammen mit dem realen Gebäude ensteht, jedoch sind im Falle älterer Bestandsgebäude digitale Modelle meist nicht verfügbar. Ein digitales Modell eines bestehenden Gebäudes manuell zu erstellen, ist zwar möglich, jedoch mit großem Aufwand verbunden. Ist ein passendes Gebäudemodell vorhanden, muss ein AR Gerät außerdem in der Lage sein, die eigene Position und Orientierung im Gebäude relativ zu diesem Modell bestimmen zu können, um Augmentierungen lagegerecht anzeigen zu können. Im Rahmen dieser Arbeit werden diverse Aspekte der angesprochenen Problematik untersucht und diskutiert. Dabei werden zunächst verschiedene Möglichkeiten diskutiert, Indoor-Gebäudegeometrie mittels Sensorsystemen zu erfassen. Anschließend wird eine Untersuchung präsentiert, inwiefern moderne AR Geräte, die in der Regel ebenfalls über eine Vielzahl an Sensoren verfügen, ebenfalls geeignet sind, als Indoor-Mapping-Systeme eingesetzt zu werden. Die resultierenden Indoor Mapping Datensätze können daraufhin genutzt werden, um automatisiert Gebäudemodelle zu rekonstruieren. Zu diesem Zweck wird ein automatisiertes, voxel-basiertes Indoor-Rekonstruktionsverfahren vorgestellt. Dieses wird außerdem auf der Grundlage vierer zu diesem Zweck erfasster Datensätze mit zugehörigen Referenzdaten quantitativ evaluiert. Desweiteren werden verschiedene Möglichkeiten diskutiert, mobile AR Geräte innerhalb eines Gebäudes und des zugehörigen Gebäudemodells zu lokalisieren. In diesem Kontext wird außerdem auch die Evaluierung einer Marker-basierten Indoor-Lokalisierungsmethode präsentiert. Abschließend wird zudem ein neuer Ansatz, Indoor-Mapping Datensätze an den Achsen des Koordinatensystems auszurichten, vorgestellt

    User-oriented markerless augmented reality framework based on 3D reconstruction and loop closure detection

    An augmented reality (AR) system needs to track the user-view to perform an accurate augmentation registration. The present research proposes a conceptual marker-less, natural feature-based AR framework system, the process for which is divided into two stages - an offline database training session for the application developers, and an online AR tracking and display session for the final users. In the offline session, two types of 3D reconstruction application, RGBD-SLAM and SfM are integrated into the development framework for building the reference template of a target environment. The performance and applicable conditions of these two methods are presented in the present thesis, and the application developers can choose which method to apply for their developmental demands. A general developmental user interface is provided to the developer for interaction, including a simple GUI tool for augmentation configuration. The present proposal also applies a Bag of Words strategy to enable a rapid "loop-closure detection" in the online session, for efficiently querying the application user-view from the trained database to locate the user pose. The rendering and display process of augmentation is currently implemented within an OpenGL window, which is one result of the research that is worthy of future detailed investigation and development

    Detail Enhancing Denoising of Digitized 3D Models from a Mobile Scanning System

    The acquisition process of digitizing a large-scale environment produces an enormous amount of raw geometry data. This data is corrupted by system noise, which leads to 3D surfaces that are not smooth and details that are distorted. Any scanning system has noise associate with the scanning hardware, both digital quantization errors and measurement inaccuracies, but a mobile scanning system has additional system noise introduced by the pose estimation of the hardware during data acquisition. The combined system noise generates data that is not handled well by existing noise reduction and smoothing techniques. This research is focused on enhancing the 3D models acquired by mobile scanning systems used to digitize large-scale environments. These digitization systems combine a variety of sensors – including laser range scanners, video cameras, and pose estimation hardware – on a mobile platform for the quick acquisition of 3D models of real world environments. The data acquired by such systems are extremely noisy, often with significant details being on the same order of magnitude as the system noise. By utilizing a unique 3D signal analysis tool, a denoising algorithm was developed that identifies regions of detail and enhances their geometry, while removing the effects of noise on the overall model. The developed algorithm can be useful for a variety of digitized 3D models, not just those involving mobile scanning systems. The challenges faced in this study were the automatic processing needs of the enhancement algorithm, and the need to fill a hole in the area of 3D model analysis in order to reduce the effect of system noise on the 3D models. In this context, our main contributions are the automation and integration of a data enhancement method not well known to the computer vision community, and the development of a novel 3D signal decomposition and analysis tool. The new technologies featured in this document are intuitive extensions of existing methods to new dimensionality and applications. The totality of the research has been applied towards detail enhancing denoising of scanned data from a mobile range scanning system, and results from both synthetic and real models are presented

    Advances in top-down and bottom-up approaches to video-based camera tracking

    Video-based camera tracking consists in trailing the three dimensional pose followed by a mobile camera using video as sole input. In order to estimate the pose of a camera with respect to a real scene, one or more three dimensional references are needed. Examples of such references are landmarks with known geometric shape, or objects for which a model is generated beforehand. By comparing what is seen by a camera with what is geometrically known from reality, it is possible to recover the pose of the camera that is sensing these references. In this thesis, we investigate the problem of camera tracking at two levels. Firstly, we work at the low level of feature point recognition. Feature points are used as references for tracking and we propose a method to robustly recognise them. More specifically, we introduce a rotation-discriminative region descriptor and an efficient rotation-discriminative method to match feature point descriptors. The descriptor is based on orientation gradient histograms and template intensity information. Secondly, we have worked at the higher level of camera tracking and propose a fusion of top-down (TDA) and bottom-up approaches (BUA). We combine marker-based tracking using a BUA and feature points recognised from a TDA into a particle filter. Feature points are recognised with the method described before. We take advantage of the identification of the rotation of points for tracking purposes. The goal of the fusion is to take advantage of their compensated strengths. In particular, we are interested in covering the main capabilities that a camera tracker should provide. These capabilities are automatic initialisation, automatic recovery after loss of track, and tracking beyond references known a priori. Experiments have been performed at the two levels of investigation. Firstly, tests have been conducted to evaluate the performance of the recognition method proposed. The assessment consists in a set of patches extracted from eight textured images. The images are rotated and matching is done for each patch. The results show that the method is capable of matching accurately despite the rotations. A comparison with similar techniques in the state of the art depicts the equal or even higher precision of our method with much lower computational cost. Secondly, experimental assessment of the tracking system is also conducted. The evaluation consists in four sequences with specific problematic situations namely, occlusions of the marker, illumination changes, and erratic and/or fast motion. Results show that the fusion tracker solves characteristic failure modes of the two combined approaches. A comparison with similar trackers shows competitive accuracy. In addition, the three capabilities stated earlier are fulfilled in our tracker, whereas the state of the art reveals that no other published tracker covers these three capabilities simultaneously. The camera tracking system has a potential application in the robotics domain. It has been successfully used as a man-machine interface and applied in Augmented Reality environments. In particular, the system has been used by students of the University of art and design Lausanne (ECAL) with the purpose of conceiving new interaction concepts. Moreover, in collaboration with ECAL and fabric | ch (studio for architecture & research), we have jointly developed the Augmented interactive Reality Toolkit (AiRToolkit). The system has also proved to be reliable in public events and is the basis of a game-oriented demonstrator installed in the Swiss National Museum of Audiovisual and Multimedia (Audiorama) in Montreux

    Room layout estimation on mobile devices

    Room layout generation is the problem of generating a drawing or a digital model of an existing room from a set of measurements such as laser data or images. The generation of floor plans can find application in the building industry to assess the quality and the correctness of an ongoing construction w.r.t. the initial model, or to quickly sketch the renovation of an apartment. Real estate industry can rely on automatic generation of floor plans to ease the process of checking the livable surface and to propose virtual visits to prospective customers. As for the general public, the room layout can be integrated into mixed reality games to provide a better immersiveness experience, or used in other related augmented reality applications such room redecoration. The goal of this industrial thesis (CIFRE) is to investigate and take advantage of the state-of-the art mobile devices in order to automate the process of generating room layouts. Nowadays, modern mobile devices usually come a wide range of sensors, such as inertial motion unit (IMU), RGB cameras and, more recently, depth cameras. Moreover, tactile touchscreens offer a natural and simple way to interact with the user, thus favoring the development of interactive applications, in which the user can be part of the processing loop. This work aims at exploiting the richness of such devices to address the room layout generation problem. The thesis has three major contributions. We first show how the classic problem of detecting vanishing points in an image can benefit from an a-priori given by the IMU sensor. We propose a simple and effective algorithm for detecting vanishing points relying on the gravity vector estimated by the IMU. A new public dataset containing images and the relevant IMU data is introduced to help assessing vanishing point algorithms and foster further studies in the field. As a second contribution, we explored the state of-the-art of real-time localization and map optimization algorithms for RGB-D sensors. Real-time localization is a fundamental task to enable augmented reality applications, and thus it is a critical component when designing interactive applications. We propose an evaluation of existing algorithms for the common desktop set-up in order to be employed on a mobile device. For each considered method, we assess the accuracy of the localization as well as the computational performances when ported on a mobile device. Finally, we present a proof of concept of application able to generate the room layout relying on a Project Tango tablet equipped with an RGB-D sensor. In particular, we propose an algorithm that incrementally processes and fuses the 3D data provided by the sensor in order to obtain the layout of the room. We show how our algorithm can rely on the user interactions in order to correct the generated 3D model during the acquisition process

    Adaptive Vision Based Scene Registration for Outdoor Augmented Reality

    Augmented Reality (AR) involves adding virtual content into real scenes. Scenes are viewed using a Head-Mounted Display or other display type. In order to place content into the user's view of a scene, the user's position and orientation relative to the scene, commonly referred to as their pose, must be determined accurately. This allows the objects to be placed in the correct positions and to remain there when the user moves or the scene changes. It is achieved by tracking the user in relation to their environment using a variety of technology. One technology which has proven to provide accurate results is computer vision. Computer vision involves a computer analysing images and achieving an understanding of them. This may be locating objects such as faces in the images, or in the case of AR, determining the pose of the user. One of the ultimate goals of AR systems is to be capable of operating under any condition. For example, a computer vision system must be robust under a range of different scene types, and under unpredictable environmental conditions due to variable illumination and weather. The majority of existing literature tests algorithms under the assumption of ideal or 'normal' imaging conditions. To ensure robustness under as many circumstances as possible it is also important to evaluate the systems under adverse conditions. This thesis seeks to analyse the effects that variable illumination has on computer vision algorithms. To enable this analysis, test data is required to isolate weather and illumination effects, without other factors such as changes in viewpoint that would bias the results. A new dataset is presented which also allows controlled viewpoint differences in the presence of weather and illumination changes. This is achieved by capturing video from a camera undergoing a repeatable motion sequence. Ground truth data is stored per frame allowing images from the same position under differing environmental conditions, to be easily extracted from the videos. An in depth analysis of six detection algorithms and five matching techniques demonstrates the impact that non-uniform illumination changes can have on vision algorithms. Specifically, shadows can degrade performance and reduce confidence in the system, decrease reliability, or even completely prevent successful operation. An investigation into approaches to improve performance yields techniques that can help reduce the impact of shadows. A novel algorithm is presented that merges reference data captured at different times, resulting in reference data with minimal shadow effects. This can significantly improve performance and reliability when operating on images containing shadow effects. These advances improve the robustness of computer vision systems and extend the range of conditions in which they can operate. This can increase the usefulness of the algorithms and the AR systems that employ them

    Change Detection using Models derived from Point Clouds

    This thesis examines the detection of geometric changes in 3D data based on models derived from the point clouds. The process chain consists of the registration of point clouds, the derivation of a model and the detection of changes in these models. For the registration AprilTags are used, which, in combination with the trajectories of the respective measuring runs, allow the estimation of a transformation between two local coordinate systems without a preceding assessment of the tag position. The point cloud is sampled down to a uniform distribution for faster processing and the normals and curvatures are calculated for the remaining points. A region growing process uses this additional information to divide the point clouds into planar and nonplanar areas. The former are modeled as planes through simplification and meshing using a modified quadtree structure. The non-planar points are clustered by supervoxels and the extracted clusters are approximated by Axis Aligned Bounding Boxes. Both planar and non-planar modeling is performed for all datasets which are to be compared. The bounding boxes are used for the detection of changes. The box model of a dataset can easily be examined for intersections with a second dataset, due to the axis alignment. The intersections allow the detection of changed areas in the derived models. The methods proposed are suitable for both indoor and outdoor applications, provided that the changed objects are well separated and the compared datasets cover overlapping areas. The accuracy depends on the chosen size of the bounding boxes as well as the size of the changed objects. The evaluation has also shown that the proposed method can be integrated into a realtime-capable system

    Lidar-based Obstacle Detection and Recognition for Autonomous Agricultural Vehicles

    Today, agricultural vehicles are available that can drive autonomously and follow exact route plans more precisely than human operators. Combined with advancements in precision agriculture, autonomous agricultural robots can reduce manual labor, improve workflow, and optimize yield. However, as of today, human operators are still required for monitoring the environment and acting upon potential obstacles in front of the vehicle. To eliminate this need, safety must be ensured by accurate and reliable obstacle detection and avoidance systems.In this thesis, lidar-based obstacle detection and recognition in agricultural environments has been investigated. A rotating multi-beam lidar generating 3D point clouds was used for point-wise classification of agricultural scenes, while multi-modal fusion with cameras and radar was used to increase performance and robustness. Two research perception platforms were presented and used for data acquisition. The proposed methods were all evaluated on recorded datasets that represented a wide range of realistic agricultural environments and included both static and dynamic obstacles.For 3D point cloud classification, two methods were proposed for handling density variations during feature extraction. One method outperformed a frequently used generic 3D feature descriptor, whereas the other method showed promising preliminary results using deep learning on 2D range images. For multi-modal fusion, four methods were proposed for combining lidar with color camera, thermal camera, and radar. Gradual improvements in classification accuracy were seen, as spatial, temporal, and multi-modal relationships were introduced in the models. Finally, occupancy grid mapping was used to fuse and map detections globally, and runtime obstacle detection was applied on mapped detections along the vehicle path, thus simulating an actual traversal.The proposed methods serve as a first step towards full autonomy for agricultural vehicles. The study has thus shown that recent advancements in autonomous driving can be transferred to the agricultural domain, when accurate distinctions are made between obstacles and processable vegetation. Future research in the domain has further been facilitated with the release of the multi-modal obstacle dataset, FieldSAFE