721 research outputs found

    Hierarchical and Spatial Structures for Interpreting Images of Man-made Scenes Using Graphical Models

    Get PDF
    The task of semantic scene interpretation is to label the regions of an image and their relations into meaningful classes. Such task is a key ingredient to many computer vision applications, including object recognition, 3D reconstruction and robotic perception. It is challenging partially due to the ambiguities inherent to the image data. The images of man-made scenes, e. g. the building facade images, exhibit strong contextual dependencies in the form of the spatial and hierarchical structures. Modelling these structures is central for such interpretation task. Graphical models provide a consistent framework for the statistical modelling. Bayesian networks and random fields are two popular types of the graphical models, which are frequently used for capturing such contextual information. The motivation for our work comes from the belief that we can find a generic formulation for scene interpretation that having both the benefits from random fields and Bayesian networks. It should have clear semantic interpretability. Therefore our key contribution is the development of a generic statistical graphical model for scene interpretation, which seamlessly integrates different types of the image features, and the spatial structural information and the hierarchical structural information defined over the multi-scale image segmentation. It unifies the ideas of existing approaches, e. g. conditional random field (CRF) and Bayesian network (BN), which has a clear statistical interpretation as the maximum a posteriori (MAP) estimate of a multi-class labelling problem. Given the graphical model structure, we derive the probability distribution of the model based on the factorization property implied in the model structure. The statistical model leads to an energy function that can be optimized approximately by either loopy belief propagation or graph cut based move making algorithm. The particular type of the features, the spatial structure, and the hierarchical structure however is not prescribed. In the experiments, we concentrate on terrestrial man-made scenes as a specifically difficult problem. We demonstrate the application of the proposed graphical model on the task of multi-class classification of building facade image regions. The framework for scene interpretation allows for significantly better classification results than the standard classical local classification approach on man-made scenes by incorporating the spatial and hierarchical structures. We investigate the performance of the algorithms on a public dataset to show the relative importance of the information from the spatial structure and the hierarchical structure. As a baseline for the region classification, we use an efficient randomized decision forest classifier. Two specific models are derived from the proposed graphical model, namely the hierarchical CRF and the hierarchical mixed graphical model. We show that these two models produce better classification results than both the baseline region classifier and the flat CRF.Hierarchische und rĂ€umliche Strukturen zur Interpretation von Bildern anthropogener Szenen unter Nutzung graphischer Modelle Ziel der semantischen Bildinterpretation ist es, Bildregionen und ihre gegenseitigen Beziehungen zu kennzeichnen und in sinnvolle Klassen einzuteilen. Dies ist eine der Hauptaufgabe in vielen Bereichen des maschinellen Sehens, wie zum Beispiel der Objekterkennung, 3D Rekonstruktion oder der Wahrnehmung von Robotern. Insbesondere Bilder anthropogener Szenen, wie z.B. Fassadenaufnahmen, sind durch starke rĂ€umliche und hierarchische Strukturen gekennzeichnet. Diese Strukturen zu modellieren ist zentrale Teil der Interpretation, fĂŒr deren statistische Modellierung graphische Modelle ein geeignetes konsistentes Werkzeug darstellen. Bayes Netze und Zufallsfelder sind zwei bekannte und hĂ€ufig genutzte Beispiele fĂŒr graphische Modelle zur Erfassung kontextabhĂ€ngiger Informationen. Die Motivation dieser Arbeit liegt in der ĂŒberzeugung, dass wir eine generische Formulierung der Bildinterpretation mit klarer semantischer Bedeutung finden können, die die Vorteile von Bayes Netzen und Zufallsfeldern verbindet. Der Hauptbeitrag der vorliegenden Arbeit liegt daher in der Entwicklung eines generischen statistischen graphischen Modells zur Bildinterpretation, welches unterschiedlichste Typen von Bildmerkmalen und die rĂ€umlichen sowie hierarchischen Strukturinformationen ĂŒber eine multiskalen Bildsegmentierung integriert. Das Modell vereinheitlicht die existierender Arbeiten zugrunde liegenden Ideen, wie bedingter Zufallsfelder (conditional random field (CRF)) und Bayesnetze (Bayesian network (BN)). Dieses Modell hat eine klare statistische Interpretation als Maximum a posteriori (MAP) SchĂ€tzer eines mehrklassen Zuordnungsproblems. Gegeben die Struktur des graphischen Modells und den dadurch definierten Faktorisierungseigenschaften leiten wir die Wahrscheinlichkeitsverteilung des Modells ab. Dies fĂŒhrt zu einer Energiefunktion, die nĂ€herungsweise optimiert werden kann. Der jeweilige Typ der Bildmerkmale, die rĂ€umliche sowie hierarchische Struktur ist von dieser Formulierung unabhĂ€ngig. Wir zeigen die Anwendung des vorgeschlagenen graphischen Modells anhand der mehrklassen Zuordnung von Bildregionen in Fassadenaufnahmen. Wir demonstrieren, dass das vorgeschlagene Verfahren zur Bildinterpretation, durch die BerĂŒcksichtigung rĂ€umlicher sowie hierarchischer Strukturen, signifikant bessere Klassifikationsergebnisse zeigt, als klassische lokale Klassifikationsverfahren. Die LeistungsfĂ€higkeit des vorgeschlagenen Verfahrens wird anhand eines öffentlich verfĂŒgbarer Datensatzes evaluiert. Zur Klassifikation der Bildregionen nutzen wir ein Verfahren basierend auf einem effizienten Random Forest Klassifikator. Aus dem vorgeschlagenen allgemeinen graphischen Modell werden konkret zwei spezielle Modelle abgeleitet, ein hierarchisches bedingtes Zufallsfeld (hierarchical CRF) sowie ein hierarchisches gemischtes graphisches Modell. Wir zeigen, dass beide Modelle bessere Klassifikationsergebnisse erzeugen als die zugrunde liegenden lokalen Klassifikatoren oder die einfachen bedingten Zufallsfelder

    Improving architectural 3D reconstruction by constrained modelling

    Get PDF
    Institute of Perception, Action and BehaviourThis doctoral thesis presents new techniques for improving the structural quality of automatically-acquired architectural 3D models. Common architectural properties such as parallelism and orthogonality of walls and linear structures are exploited. The locations of features such as planes and 3D lines are extracted from the model by using a probabilistic technique (RANSAC). The relationships between the planes and lines are inferred automatically using a knowledge-based architectural model. A numerical algorithm is then used to optimise the position and orientations of the features taking constraints into account. Small irregularities in the model are removed by projecting the irregularities onto the features. Planes and lines in the resulting model are therefore aligned properly to each other, and so the appearance of the resulting model is improved. Our approach is demonstrated using noisy data from both synthetic and real scenes

    Image-based window detection: an overview

    Get PDF
    Automated segmentation of buildings’ façade and detection of its elements is of high relevance in various fields of research as it, e. g., reduces the effort of 3 D reconstructing existing buildings and even entire cities or may be used for navigation and localization tasks. In recent years, several approaches were made concerning this issue. These can be mainly classified by their input data which are either images or 3 D point clouds. This paper provides a survey of image-based approaches. Particularly, this paper focuses on window detection and therefore groups related papers into the three major detection strategies. We juxtapose grammar based methods, pattern recognition and machine learning and contrast them referring to their generality of application. As we found out machine learning approaches seem most promising for window detection on generic façades and thus we will pursue these in future work

    From CAD models to soft point cloud labels: An automatic annotation pipeline for cheaply supervised 3D semantic segmentation

    Full text link
    We propose a fully automatic annotation scheme which takes a raw 3D point cloud with a set of fitted CAD models as input, and outputs convincing point-wise labels which can be used as cheap training data for point cloud segmentation. Compared to manual annotations, we show that our automatic labels are accurate while drastically reducing the annotation time, and eliminating the need for manual intervention or dataset-specific parameters. Our labeling pipeline outputs semantic classes and soft point-wise object scores which can either be binarized into standard one-hot-encoded labels, thresholded into weak labels with ambiguous points left unlabeled, or used directly as soft labels during training. We evaluate the label quality and segmentation performance of PointNet++ on a dataset of real industrial point clouds and Scan2CAD, a public dataset of indoor scenes. Our results indicate that reducing supervision in areas which are more difficult to label automatically is beneficial, compared to the conventional approach of naively assigning a hard "best guess" label to every point

    A geometrical-based approach to recognise structure of complex interiors

    Get PDF
    3D modelling of building interiors has gained a lot of interest recently, specifically since the rise of Building Information Modeling (BIM). A number of methods have been developed in the past, however most of them are limited to modelling non-complex interiors. 3D laser scanners are the preferred sensor to collect the 3D data, however the cost of state-of-the-art laser scanners are prohibitive to many. Other types of sensors could also be used to generate the 3D data but they have limitations especially when dealing with clutter and occlusions. This research has developed a platform to produce 3D modelling of building interiors while adapting a low-cost, low-level laser scanner to generate the 3D interior data. The PreSuRe algorithm developed here, which introduces a new pipeline in modelling building interiors, combines both novel methods and adapts existing approaches to produce the 3D modelling of various interiors, from sparse room to complex interiors with non-ideal geometrical structure, highly cluttered and occluded. This approach has successfully reconstructed the structure of interiors, with above 96% accuracy, even with high amount of noise data and clutter. The time taken to produce the resulting model is almost real-time, compared to existing techniques which may take hours to generate the reconstruction. The produced model is also equipped with semantic information which differentiates the model from a regular 3D CAD drawing and can be use to assist professionals and experts in related fields

    INVESTIGATION OF POINTNET FOR SEMANTIC SEGMENTATION OF LARGE-SCALE OUTDOOR POINT CLOUDS

    Get PDF
    Semantic segmentation of point clouds is indispensable for 3D scene understanding. Point clouds have credibility for capturing geometry of objects including shape, size, and orientation. Deep learning (DL) has been recognized as the most successful approach for image semantic segmentation. Applied to point clouds, performance of the many DL algorithms degrades, because point clouds are often sparse and have irregular data format. As a result, point clouds are regularly first transformed into voxel grids or image collections. PointNet was the first promising algorithm that feeds point clouds directly into the DL architecture. Although PointNet achieved remarkable performance on indoor point clouds, its performance has not been extensively studied in large-scale outdoor point clouds. So far, we know, no study on large-scale aerial point clouds investigates the sensitivity of the hyper-parameters used in the PointNet. This paper evaluates PointNet’s performance for semantic segmentation through three large-scale Airborne Laser Scanning (ALS) point clouds of urban environments. Reported results show that PointNet has potential in large-scale outdoor scene semantic segmentation. A remarkable limitation of PointNet is that it does not consider local structure induced by the metric space made by its local neighbors. Experiments exhibit PointNet is expressively sensitive to the hyper-parameters like batch-size, block partition and the number of points in a block. For an ALS dataset, we get significant difference between overall accuracies of 67.5% and 72.8%, for the block sizes of 5m×5m and 10m×10m, respectively. Results also discover that the performance of PointNet depends on the selection of input vectors
    • 

    corecore