211 research outputs found

    Polylidar3D -- Fast Polygon Extraction from 3D Data

    Full text link
    Flat surfaces captured by 3D point clouds are often used for localization, mapping, and modeling. Dense point cloud processing has high computation and memory costs making low-dimensional representations of flat surfaces such as polygons desirable. We present Polylidar3D, a non-convex polygon extraction algorithm which takes as input unorganized 3D point clouds (e.g., LiDAR data), organized point clouds (e.g., range images), or user-provided meshes. Non-convex polygons represent flat surfaces in an environment with interior cutouts representing obstacles or holes. The Polylidar3D front-end transforms input data into a half-edge triangular mesh. This representation provides a common level of input data abstraction for subsequent back-end processing. The Polylidar3D back-end is composed of four core algorithms: mesh smoothing, dominant plane normal estimation, planar segment extraction, and finally polygon extraction. Polylidar3D is shown to be quite fast, making use of CPU multi-threading and GPU acceleration when available. We demonstrate Polylidar3D's versatility and speed with real-world datasets including aerial LiDAR point clouds for rooftop mapping, autonomous driving LiDAR point clouds for road surface detection, and RGBD cameras for indoor floor/wall detection. We also evaluate Polylidar3D on a challenging planar segmentation benchmark dataset. Results consistently show excellent speed and accuracy.Comment: 40 page

    VISUAL SEMANTIC SEGMENTATION AND ITS APPLICATIONS

    Get PDF
    This dissertation addresses the difficulties of semantic segmentation when dealing with an extensive collection of images and 3D point clouds. Due to the ubiquity of digital cameras that help capture the world around us, as well as the advanced scanning techniques that are able to record 3D replicas of real cities, the sheer amount of visual data available presents many opportunities for both academic research and industrial applications. But the mere quantity of data also poses a tremendous challenge. In particular, the problem of distilling useful information from such a large repository of visual data has attracted ongoing interests in the fields of computer vision and data mining. Structural Semantics are fundamental to understanding both natural and man-made objects. Buildings, for example, are like languages in that they are made up of repeated structures or patterns that can be captured in images. In order to find these recurring patterns in images, I present an unsupervised frequent visual pattern mining approach that goes beyond co-location to identify spatially coherent visual patterns, regardless of their shape, size, locations and orientation. First, my approach categorizes visual items from scale-invariant image primitives with similar appearance using a suite of polynomial-time algorithms that have been designed to identify consistent structural associations among visual items, representing frequent visual patterns. After detecting repetitive image patterns, I use unsupervised and automatic segmentation of the identified patterns to generate more semantically meaningful representations. The underlying assumption is that pixels capturing the same portion of image patterns are visually consistent, while pixels that come from different backdrops are usually inconsistent. I further extend this approach to perform automatic segmentation of foreground objects from an Internet photo collection of landmark locations. New scanning technologies have successfully advanced the digital acquisition of large-scale urban landscapes. In addressing semantic segmentation and reconstruction of this data using LiDAR point clouds and geo-registered images of large-scale residential areas, I develop a complete system that simultaneously uses classification and segmentation methods to first identify different object categories and then apply category-specific reconstruction techniques to create visually pleasing and complete scene models

    Visual Perception For Robotic Spatial Understanding

    Get PDF
    Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don\u27t have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don\u27t yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet

    Algorithms for the reconstruction, analysis, repairing and enhancement of 3D urban models from multiple data sources

    Get PDF
    Over the last few years, there has been a notorious growth in the field of digitization of 3D buildings and urban environments. The substantial improvement of both scanning hardware and reconstruction algorithms has led to the development of representations of buildings and cities that can be remotely transmitted and inspected in real-time. Among the applications that implement these technologies are several GPS navigators and virtual globes such as Google Earth or the tools provided by the Institut Cartogràfic i Geològic de Catalunya. In particular, in this thesis, we conceptualize cities as a collection of individual buildings. Hence, we focus on the individual processing of one structure at a time, rather than on the larger-scale processing of urban environments. Nowadays, there is a wide diversity of digitization technologies, and the choice of the appropriate one is key for each particular application. Roughly, these techniques can be grouped around three main families: - Time-of-flight (terrestrial and aerial LiDAR). - Photogrammetry (street-level, satellite, and aerial imagery). - Human-edited vector data (cadastre and other map sources). Each of these has its advantages in terms of covered area, data quality, economic cost, and processing effort. Plane and car-mounted LiDAR devices are optimal for sweeping huge areas, but acquiring and calibrating such devices is not a trivial task. Moreover, the capturing process is done by scan lines, which need to be registered using GPS and inertial data. As an alternative, terrestrial LiDAR devices are more accessible but cover smaller areas, and their sampling strategy usually produces massive point clouds with over-represented plain regions. A more inexpensive option is street-level imagery. A dense set of images captured with a commodity camera can be fed to state-of-the-art multi-view stereo algorithms to produce realistic-enough reconstructions. One other advantage of this approach is capturing high-quality color data, whereas the geometric information is usually lacking. In this thesis, we analyze in-depth some of the shortcomings of these data-acquisition methods and propose new ways to overcome them. Mainly, we focus on the technologies that allow high-quality digitization of individual buildings. These are terrestrial LiDAR for geometric information and street-level imagery for color information. Our main goal is the processing and completion of detailed 3D urban representations. For this, we will work with multiple data sources and combine them when possible to produce models that can be inspected in real-time. Our research has focused on the following contributions: - Effective and feature-preserving simplification of massive point clouds. - Developing normal estimation algorithms explicitly designed for LiDAR data. - Low-stretch panoramic representation for point clouds. - Semantic analysis of street-level imagery for improved multi-view stereo reconstruction. - Color improvement through heuristic techniques and the registration of LiDAR and imagery data. - Efficient and faithful visualization of massive point clouds using image-based techniques.Durant els darrers anys, hi ha hagut un creixement notori en el camp de la digitalització d'edificis en 3D i entorns urbans. La millora substancial tant del maquinari d'escaneig com dels algorismes de reconstrucció ha portat al desenvolupament de representacions d'edificis i ciutats que es poden transmetre i inspeccionar remotament en temps real. Entre les aplicacions que implementen aquestes tecnologies hi ha diversos navegadors GPS i globus virtuals com Google Earth o les eines proporcionades per l'Institut Cartogràfic i Geològic de Catalunya. En particular, en aquesta tesi, conceptualitzem les ciutats com una col·lecció d'edificis individuals. Per tant, ens centrem en el processament individual d'una estructura a la vegada, en lloc del processament a gran escala d'entorns urbans. Avui en dia, hi ha una àmplia diversitat de tecnologies de digitalització i la selecció de l'adequada és clau per a cada aplicació particular. Aproximadament, aquestes tècniques es poden agrupar en tres famílies principals: - Temps de vol (LiDAR terrestre i aeri). - Fotogrametria (imatges a escala de carrer, de satèl·lit i aèries). - Dades vectorials editades per humans (cadastre i altres fonts de mapes). Cadascun d'ells presenta els seus avantatges en termes d'àrea coberta, qualitat de les dades, cost econòmic i esforç de processament. Els dispositius LiDAR muntats en avió i en cotxe són òptims per escombrar àrees enormes, però adquirir i calibrar aquests dispositius no és una tasca trivial. A més, el procés de captura es realitza mitjançant línies d'escaneig, que cal registrar mitjançant GPS i dades inercials. Com a alternativa, els dispositius terrestres de LiDAR són més accessibles, però cobreixen àrees més petites, i la seva estratègia de mostreig sol produir núvols de punts massius amb regions planes sobrerepresentades. Una opció més barata són les imatges a escala de carrer. Es pot fer servir un conjunt dens d'imatges capturades amb una càmera de qualitat mitjana per obtenir reconstruccions prou realistes mitjançant algorismes estèreo d'última generació per produir. Un altre avantatge d'aquest mètode és la captura de dades de color d'alta qualitat. Tanmateix, la informació geomètrica resultant sol ser de baixa qualitat. En aquesta tesi, analitzem en profunditat algunes de les mancances d'aquests mètodes d'adquisició de dades i proposem noves maneres de superar-les. Principalment, ens centrem en les tecnologies que permeten una digitalització d'alta qualitat d'edificis individuals. Es tracta de LiDAR terrestre per obtenir informació geomètrica i imatges a escala de carrer per obtenir informació sobre colors. El nostre objectiu principal és el processament i la millora de representacions urbanes 3D amb molt detall. Per a això, treballarem amb diverses fonts de dades i les combinarem quan sigui possible per produir models que es puguin inspeccionar en temps real. La nostra investigació s'ha centrat en les següents contribucions: - Simplificació eficaç de núvols de punts massius, preservant detalls d'alta resolució. - Desenvolupament d'algoritmes d'estimació normal dissenyats explícitament per a dades LiDAR. - Representació panoràmica de baixa distorsió per a núvols de punts. - Anàlisi semàntica d'imatges a escala de carrer per millorar la reconstrucció estèreo de façanes. - Millora del color mitjançant tècniques heurístiques i el registre de dades LiDAR i imatge. - Visualització eficient i fidel de núvols de punts massius mitjançant tècniques basades en imatges

    Tecniche per la rilevazione automatica marker-less di persone e marker-based di robot all'interno di reti di telecamere RGB-Depth

    Get PDF
    OpenPTrack is a state of the art solution for people detection and tracking, in this work we extended some of the functionalities (detection from highly tilted camera) of the software and introduced new ones (automatic ground plane equation calculator). Also, we test the feasibility and the behaviour of a mobile camera mounted on a people-following robot and dynamically registered in the OPT network through a fiducial cubic marke
    • …
    corecore