32 research outputs found

    Initial investigations into using an ensemble of deep neural networks for building façade image semantic segmentation

    Get PDF
    Due to now outdated construction technology, houses which have not been retrofitted since construction typically fail to meet modern energy performance levels. However, identifying at a city scale which houses could benefit the most from retrofit solutions is currently a labour intensive process. In this paper, a system that uses a vehicle mounted camera to capture pictures of residential buildings and then performs semantic segmentation to differentiate components of captured buildings is presented. An ensemble of U-Net semantic segmentation models are trained to identify walls, roofs, chimneys, windows and doors from building façade images and differentiate between window and door instances which are partially visible or obscured. Results show that the ensemble of U-Net models achieved high accuracy in identifying walls, roofs and chimneys, moderate accuracy in identifying windows and low accuracy in identifying doors and instances of windows and doors which were partially visible or obscured. When U-Net models were retrained to identify doors or windows, irrespective of partially visible and obscured instances, a significant rise in door and window identification accuracy was observed. It is believed that a larger training dataset would produce significantly improved results across all classes. The results presented here prove the operational feasibility in the first part of a process to combine this model with high-resolution thermography and GPS for automating building retrofitting evaluations

    Extracting structured information from 2D images

    Get PDF
    Convolutional neural networks can handle an impressive array of supervised learning tasks while relying on a single backbone architecture, suggesting that one solution fits all vision problems. But for many tasks, we can directly make use of the problem structure within neural networks to deliver more accurate predictions. In this thesis, we propose novel deep learning components that exploit the structured output space of an increasingly complex set of problems. We start from Optical Character Recognition (OCR) in natural scenes and leverage the constraints imposed by a spatial outline of letters and language requirements. Conventional OCR systems do not work well in natural scenes due to distortions, blur, or letter variability. We introduce a new attention-based model, equipped with extra information about the neuron positions to guide its focus across characters sequentially. It beats the previous state-of-the-art benchmark by a significant margin. We then turn to dense labeling tasks employing encoder-decoder architectures. We start with an experimental study that documents the drastic impact that decoder design can have on task performance. Rather than optimizing one decoder per task separately, we propose new robust layers for the upsampling of high-dimensional encodings. We show that these better suit the structured per pixel output across the board of all tasks. Finally, we turn to the problem of urban scene understanding. There is an elaborate structure in both the input space (multi-view recordings, aerial and street-view scenes) and the output space (multiple fine-grained attributes for holistic building understanding). We design new models that benefit from a relatively simple cuboidal-like geometry of buildings to create a single unified representation from multiple views. To benchmark our model, we build a new multi-view large-scale dataset of buildings images and fine-grained attributes and show systematic improvements when compared to a broad range of strong CNN-based baselines

    BigSUR: Large-scale Structured Urban Reconstruction

    Get PDF
    The creation of high-quality semantically parsed 3D models for dense metropolitan areas is a fundamental urban modeling problem. Although recent advances in acquisition techniques and processing algorithms have resulted in large-scale imagery or 3D polygonal reconstructions, such data-sources are typically noisy, and incomplete, with no semantic structure. In this paper, we present an automatic data fusion technique that produces high-quality structured models of city blocks. From coarse polygonal meshes, street-level imagery, and GIS footprints, we formulate a binary integer program that globally balances sources of error to produce semantically parsed mass models with associated facade elements. We demonstrate our system on four city regions of varying complexity; our examples typically contain densely built urban blocks spanning hundreds of buildings. In our largest example, we produce a structured model of 37 city blocks spanning a total of 1, 011 buildings at a scale and quality previously impossible to achieve automatically

    REVIEW OF WINDOW AND DOOR TYPE DETECTION APPROACHES

    Get PDF
    The use of as-built Building Information Models (BIM) has become increasingly commonplace. This process of creating a BIM model from point cloud data, also referred to as Scan-to-BIM, is a mostly manual task. Due to the large amount of manual work, the entire Scan-to-BIM process is time-consuming and error prone. Current research focuses on the automation of the Scan-to-BIM pipeline by applying state-of-the-art techniques on its consecutive steps including the data acquisition, data processing, data interpretation and modelling. By automating the matching and modelling of window and door objects, a considerable amount of time can be saved in the Scan-to-BIM process. This is so because each window and door instance needs to be examined by the modeller and must be adapted to the actual on-site situation. Large object libraries containing predefined window and door objects exists but the matching to the best-fit predefined object remains time consuming. The aim of this research is to examine the possibilities to speed up the modelling of window and door objects. First, a literature review discussing existing methods for window and door detection and matching is presented. Second, the acquired data is examined to explore the capabilities of capturing window and door information for different remote sensing devices. Followed by tests of some commonplace features in the use for window and door occurrence matching and clustering

    Segmentation d'images de façades de bâtiments acquises d'un point de vue terrestre

    Get PDF
    L'analyse de façades (détection, compréhension et reconstruction) à partir d'images acquises depuis la rue est aujourd'hui un thème de recherche très actif en photogrammétrie et en vision par ordinateur de part ses nombreuses applications industrielles. Cette thèse montre des avancées réalisées dans le domaine de la segmentation générique de grands volumes de ce type d'images, contenant une ou plusieurs zones de façades (entières ou tronquées).Ce type de données se caractérise par une complexité architecturale très riche ainsi que par des problèmes liés à l'éclairage et au point de vue d'acquisition. La généricité des traitements est un enjeu important. La contrainte principale est de n'introduire que le minimum d'a priori possible. Nous basons nos approches sur les propriétés d'alignements et de répétitivité des structures principales de la façade. Nous proposons un partitionnement hiérarchique des contours de l'image ainsi qu'une détection de grilles de structures répétitives par processus ponctuels marqués. Sur les résultats, la façade est séparée de ses voisines et de son environnement (rue, ciel). D'autre part, certains éléments comme les fenêtres, les balcons ou le fond de mur, sans être reconnus, sont extraits de manière cohérente. Le paramétrage s'effectue en une seule passe et s'applique à tous les styles d'architecture rencontrés. La problématique se situe en amont de nombreuses thématiques comme la séparation de façades, l'accroissement du niveau de détail de modèles urbains 3D générés à partir de photos aériennes ou satellitaires, la compression ou encore l'indexation à partir de primitives géométriques (regroupement de structures et espacements entre ellesFacade analysis (detection, understanding and field of reconstruction) in street level imagery is currently a very active field of research in photogrammetric computer vision due to its many applications. This thesis shows some progress made in the field of generic segmentation of a broad range of images that contain one or more facade areas (as a whole or in part).This kind of data is carecterized by a very rich and varied architectural complexity and by problems in lighting conditions and in the choice of a camera's point of view. Workflow genericity is an important issue. One significant constraint is to be as little biased as possible. The approches presented extract the main facade structures based on geometric properties such as alignment and repetitivity. We propose a hierarchic partition of the image contour edges and a detection of repetitive grid patterns based on marked point processes. The facade is set appart from its neighbooring façades and from its environment (the ground, the sky). Some elements such as windows, balconies or wall backgrounds, are extracted in a relevant way, without being recognized. The parameters regulation is done in one step and refers to all architectural styles encountered. The problem originates from most themes such as facade separation, the increase of level of details in 3D city models generated from aerial or satellite imagery, compression or indexation based on geometric primitives (structure grouping and space between them)PARIS-EST-Université (770839901) / SudocSudocFranceF

    Positionnement visuel pour la réalité augmentée en environnement plan

    Get PDF
    Mesurer en temps réel la pose d'une caméra relativement à des repères tridimensionnels identifiés dans une image vidéo est un, sinon le pilier fondamental de la réalité augmentée. Nous proposons de résoudre ce problème dans des environnements bâtis, à l'aide de la visionpar ordinateur. Nous montrons qu'un système de positionnement plus précis que le GPS, et par ailleurs plus stable, plus rapide et moins coûteux en mémoire que d'autres systèmes de positionnement visuel introduits dans la littérature, peut êtreobtenu en faisant coopérer : approche probabiliste et géométrie aléatoire (détection a contrario des points de fuite del'image), apprentissage profond (proposition de boites contenant des façades, élaboration d'un descripteur de façades basé sur un réseau deneurones convolutifs), inférence bayésienne (recalage par espérance-maximisation d'un modèle géométrique et sémantique compactdes façades identifiées) et sélection de modèle (analyse des mouvements de la caméra par suivi de plans texturés). Nous décrivonsde plus une méthode de modélisation in situ, qui permet d'obtenir de manière fiable, de par leur confrontation immédiate à la réalité, des modèles 3D utiles au calcul de pose tel que nous l'envisageons

    Feature preserving decimation of urban meshes

    Get PDF
    1 online resource (vii, 72 pages) : illustrations (chiefly colour), charts (chiefly colour)Includes abstract.Includes bibliographical references (pages 65-72).Commercial buildings as well as residential houses represent core structures of any modern day urban or semi-urban areas. Consequently, 3D models of urban buildings are of paramount importance to a majority of digital urban applications such as city planning, 3D mapping and navigation, video games and movies, among others. However, current studies suggest that existing 3D modeling approaches often involve high computational cost and large storage volumes for processing the geometric details of the buildings. Therefore, it is essential to generate concise digital representations of urban buildings from the 3D measurements or images, so that the acquired information can be efficiently utilized for various urban applications. Such concise representations, often referred to as “lightweight” models, strive to capture the details of the physical objects with less computational storage. Furthermore, lightweight models consume less bandwidth for online applications and facilitate accelerated visualizations. In this thesis, we provide an assessment study on state-of-the-art data structures for storing lightweight urban buildings. Then we propose a method to generate lightweight yet highly detailed 3D building models from LiDAR scans. The lightweight modeling pipeline comprises the following stages: mesh reconstruction, feature points detection and mesh decimation through gradient structure tensors. The gradient of each vertex of the reconstructed mesh is obtained by estimating the vertex confidence through eigen analysis and further encoded into a 3 X 3 structure tensor. We analyze the eigenvalues of structure tensor representing gradient variations and use it to classify vertices into various feature classes, e.g., edges, and corners. While decimating the mesh, fea ture points are preserved through a mean cost-based edge collapse operation. The experiments on different building facade models show that our method is effective in generating simplified models with a trade-off between simplification and accuracy

    Algorithms for the reconstruction, analysis, repairing and enhancement of 3D urban models from multiple data sources

    Get PDF
    Over the last few years, there has been a notorious growth in the field of digitization of 3D buildings and urban environments. The substantial improvement of both scanning hardware and reconstruction algorithms has led to the development of representations of buildings and cities that can be remotely transmitted and inspected in real-time. Among the applications that implement these technologies are several GPS navigators and virtual globes such as Google Earth or the tools provided by the Institut Cartogràfic i Geològic de Catalunya. In particular, in this thesis, we conceptualize cities as a collection of individual buildings. Hence, we focus on the individual processing of one structure at a time, rather than on the larger-scale processing of urban environments. Nowadays, there is a wide diversity of digitization technologies, and the choice of the appropriate one is key for each particular application. Roughly, these techniques can be grouped around three main families: - Time-of-flight (terrestrial and aerial LiDAR). - Photogrammetry (street-level, satellite, and aerial imagery). - Human-edited vector data (cadastre and other map sources). Each of these has its advantages in terms of covered area, data quality, economic cost, and processing effort. Plane and car-mounted LiDAR devices are optimal for sweeping huge areas, but acquiring and calibrating such devices is not a trivial task. Moreover, the capturing process is done by scan lines, which need to be registered using GPS and inertial data. As an alternative, terrestrial LiDAR devices are more accessible but cover smaller areas, and their sampling strategy usually produces massive point clouds with over-represented plain regions. A more inexpensive option is street-level imagery. A dense set of images captured with a commodity camera can be fed to state-of-the-art multi-view stereo algorithms to produce realistic-enough reconstructions. One other advantage of this approach is capturing high-quality color data, whereas the geometric information is usually lacking. In this thesis, we analyze in-depth some of the shortcomings of these data-acquisition methods and propose new ways to overcome them. Mainly, we focus on the technologies that allow high-quality digitization of individual buildings. These are terrestrial LiDAR for geometric information and street-level imagery for color information. Our main goal is the processing and completion of detailed 3D urban representations. For this, we will work with multiple data sources and combine them when possible to produce models that can be inspected in real-time. Our research has focused on the following contributions: - Effective and feature-preserving simplification of massive point clouds. - Developing normal estimation algorithms explicitly designed for LiDAR data. - Low-stretch panoramic representation for point clouds. - Semantic analysis of street-level imagery for improved multi-view stereo reconstruction. - Color improvement through heuristic techniques and the registration of LiDAR and imagery data. - Efficient and faithful visualization of massive point clouds using image-based techniques.Durant els darrers anys, hi ha hagut un creixement notori en el camp de la digitalització d'edificis en 3D i entorns urbans. La millora substancial tant del maquinari d'escaneig com dels algorismes de reconstrucció ha portat al desenvolupament de representacions d'edificis i ciutats que es poden transmetre i inspeccionar remotament en temps real. Entre les aplicacions que implementen aquestes tecnologies hi ha diversos navegadors GPS i globus virtuals com Google Earth o les eines proporcionades per l'Institut Cartogràfic i Geològic de Catalunya. En particular, en aquesta tesi, conceptualitzem les ciutats com una col·lecció d'edificis individuals. Per tant, ens centrem en el processament individual d'una estructura a la vegada, en lloc del processament a gran escala d'entorns urbans. Avui en dia, hi ha una àmplia diversitat de tecnologies de digitalització i la selecció de l'adequada és clau per a cada aplicació particular. Aproximadament, aquestes tècniques es poden agrupar en tres famílies principals: - Temps de vol (LiDAR terrestre i aeri). - Fotogrametria (imatges a escala de carrer, de satèl·lit i aèries). - Dades vectorials editades per humans (cadastre i altres fonts de mapes). Cadascun d'ells presenta els seus avantatges en termes d'àrea coberta, qualitat de les dades, cost econòmic i esforç de processament. Els dispositius LiDAR muntats en avió i en cotxe són òptims per escombrar àrees enormes, però adquirir i calibrar aquests dispositius no és una tasca trivial. A més, el procés de captura es realitza mitjançant línies d'escaneig, que cal registrar mitjançant GPS i dades inercials. Com a alternativa, els dispositius terrestres de LiDAR són més accessibles, però cobreixen àrees més petites, i la seva estratègia de mostreig sol produir núvols de punts massius amb regions planes sobrerepresentades. Una opció més barata són les imatges a escala de carrer. Es pot fer servir un conjunt dens d'imatges capturades amb una càmera de qualitat mitjana per obtenir reconstruccions prou realistes mitjançant algorismes estèreo d'última generació per produir. Un altre avantatge d'aquest mètode és la captura de dades de color d'alta qualitat. Tanmateix, la informació geomètrica resultant sol ser de baixa qualitat. En aquesta tesi, analitzem en profunditat algunes de les mancances d'aquests mètodes d'adquisició de dades i proposem noves maneres de superar-les. Principalment, ens centrem en les tecnologies que permeten una digitalització d'alta qualitat d'edificis individuals. Es tracta de LiDAR terrestre per obtenir informació geomètrica i imatges a escala de carrer per obtenir informació sobre colors. El nostre objectiu principal és el processament i la millora de representacions urbanes 3D amb molt detall. Per a això, treballarem amb diverses fonts de dades i les combinarem quan sigui possible per produir models que es puguin inspeccionar en temps real. La nostra investigació s'ha centrat en les següents contribucions: - Simplificació eficaç de núvols de punts massius, preservant detalls d'alta resolució. - Desenvolupament d'algoritmes d'estimació normal dissenyats explícitament per a dades LiDAR. - Representació panoràmica de baixa distorsió per a núvols de punts. - Anàlisi semàntica d'imatges a escala de carrer per millorar la reconstrucció estèreo de façanes. - Millora del color mitjançant tècniques heurístiques i el registre de dades LiDAR i imatge. - Visualització eficient i fidel de núvols de punts massius mitjançant tècniques basades en imatges

    Surveying and Three-Dimensional Modeling for Preservation and Structural Analysis of Cultural Heritage

    Get PDF
    Dense point clouds can be used for three important steps in structural analysis, in the field of cultural heritage, regardless of which instrument it was used for acquisition data. Firstly, they allow deriving the geometric part of a finite element (FE) model automatically or semi-automatically. User input is mainly required to complement invisible parts and boundaries of the structure, and to assign meaningful approximate physical parameters. Secondly, FE model obtained from point clouds can be used to estimate better and more precise parameters of the structural analysis, i.e., to train the FE model. Finally, the definition of a correct Level of Detail about the three-dimensional model, deriving from the initial point cloud, can be used to define the limit beyond which the structural analysis is compromised, or anyway less precise. In this work of research, this will be demonstrated using three different case studies of buildings, consisting mainly of masonry, measured through terrestrial laser scanning and photogrammetric acquisitions. This approach is not a typical study for geomatics analysis, but its challenges allow studying benefits and limitations. The results and the proposed approaches could represent a step towards a multidisciplinary approach where Geomatics can play a critical role in the monitoring and civil engineering field. Furthermore, through a geometrical reconstruction, different analyses and comparisons are possible, in order to evaluate how the numerical model is accurate. In fact, the discrepancies between the different results allow to evaluate how, from a geometric and simplified modeling, important details can be lost. This causes, for example, modifications in terms of mass and volume of the structure
    corecore