79 research outputs found

    Semantic 3D Reconstruction with Finite Element Bases

    Full text link
    We propose a novel framework for the discretisation of multi-label problems on arbitrary, continuous domains. Our work bridges the gap between general FEM discretisations, and labeling problems that arise in a variety of computer vision tasks, including for instance those derived from the generalised Potts model. Starting from the popular formulation of labeling as a convex relaxation by functional lifting, we show that FEM discretisation is valid for the most general case, where the regulariser is anisotropic and non-metric. While our findings are generic and applicable to different vision problems, we demonstrate their practical implementation in the context of semantic 3D reconstruction, where such regularisers have proved particularly beneficial. The proposed FEM approach leads to a smaller memory footprint as well as faster computation, and it constitutes a very simple way to enable variable, adaptive resolution within the same model

    Multi-Scale Surface Reconstruction from Images

    Get PDF
    Many surface reconstruction algorithms have been developed to process point data originating from laser scans. Because laser scanning is a very expensive technique and not available to everyone, 3D reconstruction from images (using, e.g., multi-view stereo) is a promising alternative. In recent years a lot of progress has been made in the computer vision domain and nowadays algorithms are capable of reconstructing large 3D scenes from consumer photographs. Whereas laser scans are very controlled and typically only a few scans are taken, images may be subject to more uncontrolled variations. Standard multi-view stereo algorithms give rise to multi-scale data points due to different camera resolutions, focal lengths, or various distances to the object. When reconstructing a surface from this data, the multi-scale property has to be taken into account because the assumption that the points are samples from the true surface might be violated. This thesis presents two surface reconstruction algorithms that take resolution and scale differences into account. In the first approach we model the uncertainty of each sample point according to its footprint, the surface area that was taken into account during multi-view stereo. With an adaptive volumetric resolution, also steered by the footprints of the sample points, we achieve detailed reconstructions even for large-scale scenes. Then, a general wavelet-based surface reconstruction framework is presented. The multi-scale sample points are characterized by a convolution kernel and the points are fused in frequency space while preserving locality. We suggest a specific implementation for 2.5D surfaces that incorporates our theoretic findings about sample points originating from multi-view stereo and shows promising results on real-world data sets. The other part of the thesis analyzes the scale characteristics of patch-based depth reconstruction as used in many (multi-view) stereo techniques. It is driven by the question how the reconstruction preserves surface details or high frequencies. We introduce an intuitive model for the reconstruction process, prove that it yields a linear system and determine the modulation transfer function. This allows us to predict the amplitude loss of high frequencies in connection with the used patch-size and the internal and external camera parameters. Experiments on synthetic and real-world data demonstrate the accuracy of our model but also show the limitations. Finally, we propose a generalization of the model allowing for weighted patch fitting. The reconstructed points can then be described by a convolution of the original surface and we show how weighting the pixels during photo-consistency optimization affects the smoothing kernel. In this way we are able to connect a standard notion of smoothing to multi-view stereo reconstruction. In summary, this thesis provides a profound analysis of patch-based (multi-view) stereo reconstruction and introduces new concepts for surface reconstruction from the resulting multi-scale sample points

    Semantic Mapping of Road Scenes

    Get PDF
    The problem of understanding road scenes has been on the fore-front in the computer vision community for the last couple of years. This enables autonomous systems to navigate and understand the surroundings in which it operates. It involves reconstructing the scene and estimating the objects present in it, such as ‘vehicles’, ‘road’, ‘pavements’ and ‘buildings’. This thesis focusses on these aspects and proposes solutions to address them. First, we propose a solution to generate a dense semantic map from multiple street-level images. This map can be imagined as the bird’s eye view of the region with associated semantic labels for ten’s of kilometres of street level data. We generate the overhead semantic view from street level images. This is in contrast to existing approaches using satellite/overhead imagery for classification of urban region, allowing us to produce a detailed semantic map for a large scale urban area. Then we describe a method to perform large scale dense 3D reconstruction of road scenes with associated semantic labels. Our method fuses the depth-maps in an online fashion, generated from the stereo pairs across time into a global 3D volume, in order to accommodate arbitrarily long image sequences. The object class labels estimated from the street level stereo image sequence are used to annotate the reconstructed volume. Then we exploit the scene structure in object class labelling by performing inference over the meshed representation of the scene. By performing labelling over the mesh we solve two issues: Firstly, images often have redundant information with multiple images describing the same scene. Solving these images separately is slow, where our method is approximately a magnitude faster in the inference stage compared to normal inference in the image domain. Secondly, often multiple images, even though they describe the same scene result in inconsistent labelling. By solving a single mesh, we remove the inconsistency of labelling across the images. Also our mesh based labelling takes into account of the object layout in the scene, which is often ambiguous in the image domain, thereby increasing the accuracy of object labelling. Finally, we perform labelling and structure computation through a hierarchical robust PN Markov Random Field defined on voxels and super-voxels given by an octree. This allows us to infer the 3D structure and the object-class labels in a principled manner, through bounded approximate minimisation of a well defined and studied energy functional. In this thesis, we also introduce two object labelled datasets created from real world data. The 15 kilometre Yotta Labelled dataset consists of 8,000 images per camera view of the roadways of the United Kingdom with a subset of them annotated with object class labels and the second dataset is comprised of ground truth object labels for the publicly available KITTI dataset. Both the datasets are available publicly and we hope will be helpful to the vision research community

    Scene Reconstruction from Multi-Scale Input Data

    Get PDF
    Geometry acquisition of real-world objects by means of 3D scanning or stereo reconstruction constitutes a very important and challenging problem in computer vision. 3D scanners and stereo algorithms usually provide geometry from one viewpoint only, and several of the these scans need to be merged into one consistent representation. Scanner data generally has lower noise levels than stereo methods and the scanning scenario is more controlled. In image-based stereo approaches, the aim is to reconstruct the 3D surface of an object solely from multiple photos of the object. In many cases, the stereo geometry is contaminated with noise and outliers, and exhibits large variations in scale. Approaches that fuse such data into one consistent surface must be resilient to such imperfections. In this thesis, we take a closer look at geometry reconstruction using both scanner data and the more challenging image-based scene reconstruction approaches. In particular, this work focuses on the uncontrolled setting where the input images are not constrained, may be taken with different camera models, under different lighting and weather conditions, and from vastly different points of view. A typical dataset contains many views that observe the scene from an overview perspective, and relatively few views capture small details of the geometry. What results from these datasets are surface samples of the scene with vastly different resolution. As we will show in this thesis, the multi-resolution, or, "multi-scale" nature of the input is a relevant aspect for surface reconstruction, which has rarely been considered in literature yet. Integrating scale as additional information in the reconstruction process can make a substantial difference in surface quality. We develop and study two different approaches for surface reconstruction that are able to cope with the challenges resulting from uncontrolled images. The first approach implements surface reconstruction by fusion of depth maps using a multi-scale hierarchical signed distance function. The hierarchical representation allows fusion of multi-resolution depth maps without mixing geometric information at incompatible scales, which preserves detail in high-resolution regions. An incomplete octree is constructed by incrementally adding triangulated depth maps to the hierarchy, which leads to scattered samples of the multi-resolution signed distance function. A continuous representation of the scattered data is defined by constructing a tetrahedral complex, and a final, highly-adaptive surface is extracted by applying the Marching Tetrahedra algorithm. A second, point-based approach is based on a more abstract, multi-scale implicit function defined as a sum of basis functions. Each input sample contributes a single basis function which is parameterized solely by the sample's attributes, effectively yielding a parameter-free method. Because the scale of each sample controls the size of the basis function, the method automatically adapts to data redundancy for noise reduction and is highly resilient to the quality-degrading effects of low-resolution samples, thus favoring high-resolution surfaces. Furthermore, we present a robust, image-based reconstruction system for surface modeling: MVE, the Multi-View Environment. The implementation provides all steps involved in the pipeline: Calibration and registration of the input images, dense geometry reconstruction by means of stereo, a surface reconstruction step and post-processing, such as remeshing and texturing. In contrast to other software solutions for image-based reconstruction, MVE handles large, uncontrolled, multi-scale datasets as well as input from more controlled capture scenarios. The reason lies in the particular choice of the multi-view stereo and surface reconstruction algorithms. The resulting surfaces are represented using a triangular mesh, which is a piecewise linear approximation to the real surface. The individual triangles are often so small that they barely contribute any geometric information and can be ill-shaped, which can cause numerical problems. A surface remeshing approach is introduced which changes the surface discretization such that more favorable triangles are created. It distributes the vertices of the mesh according to a density function, which is derived from the curvature of the geometry. Such a mesh is better suited for further processing and has reduced storage requirements. We thoroughly compare the developed methods against the state-of-the art and also perform a qualitative evaluation of the two surface reconstruction methods on a wide range of datasets with different properties. The usefulness of the remeshing approach is demonstrated on both scanner and multi-view stereo data

    Leverage of lidar point cloud for segmentation and shape reconstruction

    Get PDF
    Develop a method of annotating 3d sparse data (point cloud) in an efficient way with the help of deep neural network models and user corrections. Take the approach of human-in-the-loop to refine a AI generated fine annotation of the data. Focus on the task of self-driving cars and lidar sensor observations. The model generates a denser representation of the data and refines it by leveraging interactive human 2d annotations.Outgoin

    Scalable 3D Surface Reconstruction by Local Stochastic Fusion of Disparity Maps

    Get PDF
    Digital three-dimensional (3D) models are of significant interest to many application fields, such as medicine, engineering, simulation, and entertainment. Manual creation of 3D models is extremely time-consuming and data acquisition, e.g., through laser sensors, is expensive. In contrast, images captured by cameras mean cheap acquisition and high availability. Significant progress in the field of computer vision already allows for automatic 3D reconstruction using images. Nevertheless, many problems still exist, particularly for big sets of large images. In addition to the complex formulation necessary to solve an ill-posed problem, one has to manage extremely large amounts of data. This thesis targets 3D surface reconstruction using image sets, especially for large-scale, but also for high-accuracy applications. To this end, a processing chain for dense scalable 3D surface reconstruction using large image sets is defined consisting of image registration, disparity estimation, disparity map fusion, and triangulation of point clouds. The main focus of this thesis lies on the fusion and filtering of disparity maps, obtained by Semi-Global Matching, to create accurate 3D point clouds. For unlimited scalability, a Divide and Conquer method is presented that allows for parallel processing of subspaces of the 3D reconstruction space. The method for fusing disparity maps employs local optimization of spatial data. By this means, it avoids complex fusion strategies when merging subspaces. Although the focus is on scalable reconstruction, a high surface quality is obtained by several extensions to state-of-the-art local optimization methods. To this end, the seminal local volumetric optimization method by Curless and Levoy (1996) is interpreted from a probabilistic perspective. From this perspective, the method is extended through Bayesian fusion of spatial measurements with Gaussian uncertainty. Additionally to the generation of an optimal surface, this probabilistic perspective allows for the estimation of surface probabilities. They are used for filtering outliers in 3D space by means of geometric consistency checks. A further improvement of the quality is obtained based on the analysis of the disparity uncertainty. To this end, Total Variation (TV)-based feature classes are defined that are highly correlated with the disparity uncertainty. The correlation function is learned from ground-truth data by means of an Expectation Maximization (EM) approach. Because of the consideration of a statistically estimated disparity error in a probabilistic framework for fusion of spatial data, this can be regarded as a stochastic fusion of disparity maps. In addition, the influence of image registration and polygonization for volumetric fusion is analyzed and used to extend the method. Finally, a multi-resolution strategy is presented that allows for the generation of surfaces from spatial data with a largely varying quality. This method extends state-of-the-art methods by considering the spatial uncertainty of 3D points from stereo data. The evaluation of several well-known and novel datasets demonstrates the potential of the scalable stochastic fusion method. The strength and the weakness of the method are discussed and direction for future research is given.Digitale dreidimensionale (3D) Modelle sind in vielen Anwendungsfeldern, wie Medizin, Ingenieurswesen, Simulation und Unterhaltung von signifikantem Interesse. Eine manuelle Erstellung von 3D-Modellen ist Ă€ußerst zeitaufwendig und die Erfassung der Daten, z.B. durch Lasersensoren, ist teuer. Kamerabilder ermöglichen hingegen preiswerte Aufnahmen und sind gut verfĂŒgbar. Der rasante Fortschritt im Forschungsfeld Computer Vision ermöglicht bereits eine automatische 3D-Rekonstruktion aus Bilddaten. Dennoch besteht weiterhin eine Vielzahl von Problemen, insbesondere bei der Verarbeitung von großen Mengen hochauflösender Bilder. ZusĂ€tzlich zur komplexen Formulierung, die zur Lösung eines schlecht gestellten Problems notwendig ist, besteht die Herausforderung darin, Ă€ußerst große Datenmengen zu verwalten. Diese Arbeit befasst sich mit dem Problem der 3D-OberflĂ€chenrekonstruktion aus Bilddaten, insbesondere fĂŒr sehr große Modelle, aber auch Anwendungen mit hohem Genauigkeitsanforderungen. Zu diesem Zweck wird eine Prozesskette zur dichten skalierbaren 3D-OberflĂ€chenrekonstruktion fĂŒr große Bildmengen definiert, bestehend aus Bildregistrierung, DisparitĂ€tsschĂ€tzung, Fusion von DisparitĂ€tskarten und Triangulation von Punktwolken. Der Schwerpunkt dieser Arbeit liegt auf der Fusion und Filterung von durch Semi-Global Matching generierten DisparitĂ€tskarten zur Bestimmung von genauen 3D-Punktwolken. FĂŒr eine unbegrenzte Skalierbarkeit wird eine Divide and Conquer Methode vorgestellt, welche eine parallele Verarbeitung von TeilrĂ€umen des 3D-Rekonstruktionsraums ermöglicht. Die Methode zur Fusion von DisparitĂ€tskarten basiert auf lokaler Optimierung von 3D Daten. Damit kann eine komplizierte Fusionsstrategie fĂŒr die UnterrĂ€ume vermieden werden. Obwohl der Fokus auf der skalierbaren Rekonstruktion liegt, wird eine hohe OberflĂ€chenqualitĂ€t durch mehrere Erweiterungen von lokalen Optimierungsmodellen erzielt, die dem Stand der Forschung entsprechen. Dazu wird die wegweisende lokale volumetrische Optimierungsmethode von Curless and Levoy (1996) aus einer probabilistischen Perspektive interpretiert. Aus dieser Perspektive wird die Methode durch eine Bayes Fusion von rĂ€umlichen Messungen mit Gaußscher Unsicherheit erweitert. ZusĂ€tzlich zur Bestimmung einer optimalen OberflĂ€che ermöglicht diese probabilistische Fusion die Extraktion von OberflĂ€chenwahrscheinlichkeiten. Diese werden wiederum zur Filterung von Ausreißern mittels geometrischer KonsistenzprĂŒfungen im 3D-Raum verwendet. Eine weitere Verbesserung der QualitĂ€t wird basierend auf der Analyse der DisparitĂ€tsunsicherheit erzielt. Dazu werden Gesamtvariation-basierte Merkmalsklassen definiert, welche stark mit der DisparitĂ€tsunsicherheit korrelieren. Die Korrelationsfunktion wird aus ground-truth Daten mittels eines Expectation Maximization (EM) Ansatzes gelernt. Aufgrund der BerĂŒcksichtigung eines statistisch geschĂ€tzten DisparitĂ€tsfehlers in einem probabilistischem GrundgerĂŒst fĂŒr die Fusion von rĂ€umlichen Daten, kann dies als eine stochastische Fusion von DisparitĂ€tskarten betrachtet werden. Außerdem wird der Einfluss der Bildregistrierung und Polygonisierung auf die volumetrische Fusion analysiert und verwendet, um die Methode zu erweitern. Schließlich wird eine Multi-Resolution Strategie prĂ€sentiert, welche die Generierung von OberflĂ€chen aus rĂ€umlichen Daten mit unterschiedlichster QualitĂ€t ermöglicht. Diese Methode erweitert Methoden, die den Stand der Forschung darstellen, durch die BerĂŒcksichtigung der rĂ€umlichen Unsicherheit von 3D-Punkten aus Stereo Daten. Die Evaluierung von mehreren bekannten und neuen DatensĂ€tzen zeigt das Potential der skalierbaren stochastischen Fusionsmethode auf. StĂ€rken und SchwĂ€chen der Methode werden diskutiert und es wird eine Empfehlung fĂŒr zukĂŒnftige Forschung gegeben

    Neural Kernel Surface Reconstruction

    Full text link
    We present a novel method for reconstructing a 3D implicit surface from a large-scale, sparse, and noisy point cloud. Our approach builds upon the recently introduced Neural Kernel Fields (NKF) representation. It enjoys similar generalization capabilities to NKF, while simultaneously addressing its main limitations: (a) We can scale to large scenes through compactly supported kernel functions, which enable the use of memory-efficient sparse linear solvers. (b) We are robust to noise, through a gradient fitting solve. (c) We minimize training requirements, enabling us to learn from any dataset of dense oriented points, and even mix training data consisting of objects and scenes at different scales. Our method is capable of reconstructing millions of points in a few seconds, and handling very large scenes in an out-of-core fashion. We achieve state-of-the-art results on reconstruction benchmarks consisting of single objects, indoor scenes, and outdoor scenes.Comment: CVPR 202

    NeuS-PIR: Learning Relightable Neural Surface using Pre-Integrated Rendering

    Full text link
    Recent advances in neural implicit fields enables rapidly reconstructing 3D geometry from multi-view images. Beyond that, recovering physical properties such as material and illumination is essential for enabling more applications. This paper presents a new method that effectively learns relightable neural surface using pre-intergrated rendering, which simultaneously learns geometry, material and illumination within the neural implicit field. The key insight of our work is that these properties are closely related to each other, and optimizing them in a collaborative manner would lead to consistent improvements. Specifically, we propose NeuS-PIR, a method that factorizes the radiance field into a spatially varying material field and a differentiable environment cubemap, and jointly learns it with geometry represented by neural surface. Our experiments demonstrate that the proposed method outperforms the state-of-the-art method in both synthetic and real datasets
    • 

    corecore