1,099 research outputs found

    Image-based window detection: an overview

    Get PDF
    Automated segmentation of buildings’ façade and detection of its elements is of high relevance in various fields of research as it, e. g., reduces the effort of 3 D reconstructing existing buildings and even entire cities or may be used for navigation and localization tasks. In recent years, several approaches were made concerning this issue. These can be mainly classified by their input data which are either images or 3 D point clouds. This paper provides a survey of image-based approaches. Particularly, this paper focuses on window detection and therefore groups related papers into the three major detection strategies. We juxtapose grammar based methods, pattern recognition and machine learning and contrast them referring to their generality of application. As we found out machine learning approaches seem most promising for window detection on generic façades and thus we will pursue these in future work

    Low-rank Based Algorithms for Rectification, Repetition Detection and De-noising in Urban Images

    Full text link
    In this thesis, we aim to solve the problem of automatic image rectification and repeated patterns detection on 2D urban images, using novel low-rank based techniques. Repeated patterns (such as windows, tiles, balconies and doors) are prominent and significant features in urban scenes. Detection of the periodic structures is useful in many applications such as photorealistic 3D reconstruction, 2D-to-3D alignment, facade parsing, city modeling, classification, navigation, visualization in 3D map environments, shape completion, cinematography and 3D games. However both of the image rectification and repeated patterns detection problems are challenging due to scene occlusions, varying illumination, pose variation and sensor noise. Therefore, detection of these repeated patterns becomes very important for city scene analysis. Given a 2D image of urban scene, we automatically rectify a facade image and extract facade textures first. Based on the rectified facade texture, we exploit novel algorithms that extract repeated patterns by using Kronecker product based modeling that is based on a solid theoretical foundation. We have tested our algorithms in a large set of images, which includes building facades from Paris, Hong Kong and New York

    High-Level Facade Image Interpretation using Marked Point Processes

    Get PDF
    In this thesis, we address facade image interpretation as one essential ingredient for the generation of high-detailed, semantic meaningful, three-dimensional city-models. Given a single rectified facade image, we detect relevant facade objects such as windows, entrances, and balconies, which yield a description of the image in terms of accurate position and size of these objects. Urban digital three-dimensional reconstruction and documentation is an active area of research with several potential applications, e.g., in the area of digital mapping for navigation, urban planning, emergency management, disaster control or the entertainment industry. A detailed building model which is not just a geometric object enriched with texture, allows for semantic requests as the number of floors or the location of balconies and entrances. Facade image interpretation is one essential step in order to yield such models. In this thesis, we propose the interpretation of facade images by combining evidence for the occurrence of individual object classes which we derive from data, and prior knowledge which guides the image interpretation in its entirety. We present a three-step procedure which generates features that are suited to describe relevant objects, learns a representation that is suited for object detection, and that enables the image interpretation using the results of object detection while incorporating prior knowledge about typical configurations of facade objects, which we learn from training data. According to these three sub-tasks, our major achievements are: We propose a novel method for facade image interpretation based on a marked point process. Therefor, we develop a model for the description of typical configurations of facade objects and propose an image interpretation system which combines evidence derived from data and prior knowledge about typical configurations of facade objects. In order to generate evidence from data, we propose a feature type which we call shapelets. They are scale invariant and provide large distinctiveness for facade objects. Segments of lines, arcs, and ellipses serve as basic features for the generation of shapelets. Therefor, we propose a novel line simplification approach which approximates given pixel-chains by a sequence of lines, circular, and elliptical arcs. Among others, it is based on an adaption to Douglas-Peucker's algorithm, which is based on circles as basic geometric elements We evaluate each step separately. We show the effects of polyline segmentation and simplification on several images with comparable good or even better results, referring to a state-of-the-art algorithm, which proves their large distinctiveness for facade objects. Using shapelets we provide a reasonable classification performance on a challenging dataset, including intra-class variations, clutter, and scale changes. Finally, we show promising results for the facade interpretation system on several datasets and provide a qualitative evaluation which demonstrates the capability of complete and accurate detection of facade objectsHigh-Level Interpretation von Fassaden-Bildern unter Benutzung von Markierten PunktprozessenDas Thema dieser Arbeit ist die Interpretation von Fassadenbildern als wesentlicher Beitrag zur Erstellung hoch detaillierter, semantisch reichhaltiger dreidimensionaler Stadtmodelle. In rektifizierten Einzelaufnahmen von Fassaden detektieren wir relevante Objekte wie Fenster, TĂŒren und Balkone, um daraus eine Bildinterpretation in Form von prĂ€zisen Positionen und GrĂ¶ĂŸen dieser Objekte abzuleiten. Die digitale dreidimensionale Rekonstruktion urbaner Regionen ist ein aktives Forschungsfeld mit zahlreichen Anwendungen, beispielsweise der Herstellung digitaler Kartenwerke fĂŒr Navigation, Stadtplanung, Notfallmanagement, Katastrophenschutz oder die Unterhaltungsindustrie. Detaillierte GebĂ€udemodelle, die nicht nur als geometrische Objekte reprĂ€sentiert und durch eine geeignete Textur visuell ansprechend dargestellt werden, erlauben semantische Anfragen, wie beispielsweise nach der Anzahl der Geschosse oder der Position der Balkone oder EingĂ€nge. Die semantische Interpretation von Fassadenbildern ist ein wesentlicher Schritt fĂŒr die Erzeugung solcher Modelle. In der vorliegenden Arbeit lösen wir diese Aufgabe, indem wir aus Daten abgeleitete Evidenz fĂŒr das Vorkommen einzelner Objekte mit Vorwissen kombinieren, das die Analyse der gesamten Bildinterpretation steuert. Wir prĂ€sentieren dafĂŒr ein dreistufiges Verfahren: Wir erzeugen Bildmerkmale, die fĂŒr die Beschreibung der relevanten Objekte geeignet sind. Wir lernen, auf Basis abgeleiteter Merkmale, eine ReprĂ€sentation dieser Objekte. Schließlich realisieren wir die Bildinterpretation basierend auf der zuvor gelernten ReprĂ€sentation und dem Vorwissen ĂŒber typische Konfigurationen von Fassadenobjekten, welches wir aus Trainingsdaten ableiten. Wir leisten dazu die folgenden wissenschaftlichen BeitrĂ€ge: Wir schlagen eine neuartige Me-thode zur Interpretation von Fassadenbildern vor, die einen sogenannten markierten Punktprozess verwendet. DafĂŒr entwickeln wir ein Modell zur Beschreibung typischer Konfigurationen von Fassadenobjekten und entwickeln ein Bildinterpretationssystem, welches aus Daten abgeleitete Evidenz und a priori Wissen ĂŒber typische Fassadenkonfigurationen kombiniert. FĂŒr die Erzeugung der Evidenz stellen wir Merkmale vor, die wir Shapelets nennen und die skaleninvariant und durch eine ausgesprochene DistinktivitĂ€t im Bezug auf Fassadenobjekte gekennzeichnet sind. Als Basismerkmale fĂŒr die Erzeugung der Shapelets dienen Linien-, Kreis- und Ellipsensegmente. DafĂŒr stellen wir eine neuartige Methode zur Vereinfachung von Liniensegmenten vor, die eine Pixelkette durch eine Sequenz von geraden LinienstĂŒcken und elliptischen Bogensegmenten approximiert. Diese basiert unter anderem auf einer Adaption des Douglas-Peucker Algorithmus, die anstelle gerader LinienstĂŒcke, Bogensegmente als geometrische Basiselemente verwendet. Wir evaluieren jeden dieser drei Teilschritte separat. Wir zeigen Ergebnisse der Liniensegmen-tierung anhand verschiedener Bilder und weisen dabei vergleichbare und teilweise verbesserte Ergebnisse im Vergleich zu bestehende Verfahren nach. FĂŒr die vorgeschlagenen Shapelets weisen wir in der Evaluation ihre diskriminativen Eigenschaften im Bezug auf Fassadenobjekte nach. Wir erzeugen auf einem anspruchsvollen Datensatz von skalenvariablen Fassadenobjekten, mit starker VariabilitĂ€t der Erscheinung innerhalb der Klassen, vielversprechende Klassifikationsergebnisse, die die Verwendbarkeit der gelernten Shapelets fĂŒr die weitere Interpretation belegen. Schließlich zeigen wir Ergebnisse der Interpretation der Fassadenstruktur anhand verschiedener DatensĂ€tze. Die qualitative Evaluation demonstriert die FĂ€higkeit des vorgeschlagenen Lösungsansatzes zur vollstĂ€ndigen und prĂ€zisen Detektion der genannten Fassadenobjekte

    Learning Grammars for Architecture-Specific Facade Parsing

    Get PDF
    International audienceParsing facade images requires optimal handcrafted grammar for a given class of buildings. Such a handcrafted grammar is often designed manually by experts. In this paper, we present a novel framework to learn a compact grammar from a set of ground-truth images. To this end, parse trees of ground-truth annotated images are obtained running existing inference algorithms with a simple, very general grammar. From these parse trees, repeated subtrees are sought and merged together to share derivations and produce a grammar with fewer rules. Furthermore, unsupervised clustering is performed on these rules, so that, rules corresponding to the same complex pattern are grouped together leading to a rich compact grammar. Experimental validation and comparison with the state-of-the-art grammar-based methods on four diff erent datasets show that the learned grammar helps in much faster convergence while producing equal or more accurate parsing results compared to handcrafted grammars as well as grammars learned by other methods. Besides, we release a new dataset of facade images from Paris following the Art-deco style and demonstrate the general applicability and extreme potential of the proposed framework

    Automatic Prediction of Building Age from Photographs

    Full text link
    We present a first method for the automated age estimation of buildings from unconstrained photographs. To this end, we propose a two-stage approach that firstly learns characteristic visual patterns for different building epochs at patch-level and then globally aggregates patch-level age estimates over the building. We compile evaluation datasets from different sources and perform an detailed evaluation of our approach, its sensitivity to parameters, and the capabilities of the employed deep networks to learn characteristic visual age-related patterns. Results show that our approach is able to estimate building age at a surprisingly high level that even outperforms human evaluators and thereby sets a new performance baseline. This work represents a first step towards the automated assessment of building parameters for automated price prediction.Comment: Preprint of paper to appear in ACM International Conference on Multimedia Retrieval (ICMR) 2018 Conferenc

    Holistic Multi-View Building Analysis in the Wild with Projection Pooling

    Get PDF
    We address six different classification tasks related to fine-grained building attributes: construction type, number of floors, pitch and geometry of the roof, facade material, and occupancy class. Tackling such a remote building analysis problem became possible only recently due to growing large-scale datasets of urban scenes. To this end, we introduce a new benchmarking dataset, consisting of 49426 images (top-view and street-view) of 9674 buildings. These photos are further assembled, together with the geometric metadata. The dataset showcases various real-world challenges, such as occlusions, blur, partially visible objects, and a broad spectrum of buildings. We propose a new projection pooling layer, creating a unified, top-view representation of the top-view and the side views in a high-dimensional space. It allows us to utilize the building and imagery metadata seamlessly. Introducing this layer improves classification accuracy -- compared to highly tuned baseline models -- indicating its suitability for building analysis.Comment: Accepted for publication at the 35th AAAI Conference on Artificial Intelligence (AAAI 2021
    • 

    corecore