61 research outputs found

    Distributed bundle adjustment with block-based sparse matrix compression for super large scale datasets

    Full text link
    We propose a distributed bundle adjustment (DBA) method using the exact Levenberg-Marquardt (LM) algorithm for super large-scale datasets. Most of the existing methods partition the global map to small ones and conduct bundle adjustment in the submaps. In order to fit the parallel framework, they use approximate solutions instead of the LM algorithm. However, those methods often give sub-optimal results. Different from them, we utilize the exact LM algorithm to conduct global bundle adjustment where the formation of the reduced camera system (RCS) is actually parallelized and executed in a distributed way. To store the large RCS, we compress it with a block-based sparse matrix compression format (BSMC), which fully exploits its block feature. The BSMC format also enables the distributed storage and updating of the global RCS. The proposed method is extensively evaluated and compared with the state-of-the-art pipelines using both synthetic and real datasets. Preliminary results demonstrate the efficient memory usage and vast scalability of the proposed method compared with the baselines. For the first time, we conducted parallel bundle adjustment using LM algorithm on a real datasets with 1.18 million images and a synthetic dataset with 10 million images (about 500 times that of the state-of-the-art LM-based BA) on a distributed computing system.Comment: camera ready version for ICCV202

    3D Reconstruction using Active Illumination

    Get PDF
    In this thesis we present a pipeline for 3D model acquisition. Generating 3D models of real-world objects is an important task in computer vision with many applications, such as in 3D design, archaeology, entertainment, and virtual or augmented reality. The contribution of this thesis is threefold: we propose a calibration procedure for the cameras, we describe an approach for capturing and processing photometric normals using gradient illuminations in the hardware set-up, and finally we present a multi-view photometric stereo 3D reconstruction method. In order to obtain accurate results using multi-view and photometric stereo reconstruction, the cameras are calibrated geometrically and photometrically. For acquiring data, a light stage is used. This is a hardware set-up that allows to control the illumination during acquisition. The procedure used to generate appropriate illuminations and to process the acquired data to obtain accurate photometric normals is described. The core of the pipeline is a multi-view photometric stereo reconstruction method. In this method, we first generate a sparse reconstruction using the acquired images and computed normals. In the second step, the information from the normal maps is used to obtain a dense reconstruction of an object’s surface. Finally, the reconstructed surface is filtered to remove artifacts introduced by the dense reconstruction step

    Digital Stack Photography and Its Applications

    Get PDF
    <p>This work centers on digital stack photography and its applications.</p><p>A stack of images refer, in a broader sense, to an ensemble of</p><p>associated images taken with variation in one or more than one various </p><p>values in one or more parameters in system configuration or setting.</p><p>An image stack captures and contains potentially more information than</p><p>any of the constituent images. Digital stack photography (DST)</p><p>techniques explore the rich information to render a synthesized image</p><p>that oversteps the limitation in a digital camera's capabilities.</p><p>This work considers in particular two basic DST problems, which had</p><p>been challenging, and their applications. One is high-dynamic-range</p><p>(HDR) imaging of non-stationary dynamic scenes, in which the stacked</p><p>images vary in exposure conditions. The other</p><p>is large scale panorama composition from multiple images. In this</p><p>case, the image components are related to each other by the spatial</p><p>relation among the subdomains of the same scene they covered and</p><p>captured jointly. We consider the non-conventional, practical and</p><p>challenge situations where the spatial overlap among the sub-images is</p><p>sparse (S), irregular in geometry and imprecise from the designed</p><p>geometry (I), and the captured data over the overlap zones are noisy</p><p>(N) or lack of features. We refer to these conditions simply as the</p><p>S.I.N. conditions.</p><p>There are common challenging issues with both problems. For example,</p><p>both faced the dominant problem with image alignment for</p><p>seamless and artifact-free image composition. Our solutions to the</p><p>common problems are manifested differently in each of the particular</p><p>problems, as a result of adaption to the specific properties in each</p><p>type of image ensembles. For the exposure stack, existing</p><p>alignment approaches struggled to overcome three main challenges:</p><p>inconsistency in brightness, large displacement in dynamic scene and</p><p>pixel saturation. We exploit solutions in the following three</p><p>aspects. In the first, we introduce a model that addresses and admits</p><p>changes in both geometric configurations and optical conditions, while</p><p>following the traditional optical flow description. Previous models</p><p>treated these two types of changes one or the other, namely, with</p><p>mutual exclusions. Next, we extend the pixel-based optical flow model</p><p>to a patch-based model. There are two-fold advantages. A patch has</p><p>texture and local content that individual pixels fail to present. It</p><p>also renders opportunities for faster processing, such as via</p><p>two-scale or multiple-scale processing. The extended model is then</p><p>solved efficiently with an EM-like algorithm, which is reliable in the</p><p>presence of large displacement. Thirdly, we present a generative</p><p>model for reducing or eliminating typical artifacts as a side effect</p><p>of an inadequate alignment for clipped pixels. A patch-based texture</p><p>synthesis is combined with the patch-based alignment to achieve an</p><p>artifact free result.</p><p>For large-scale panorama composition under the S.I.N. conditions, we</p><p>have developed an effective solution scheme that significantly reduces</p><p>both processing time and artifacts. Previously existing approaches can</p><p>be roughly categorized as either geometry-based composition or feature</p><p>based composition. In the former approach, one relies on precise</p><p>knowledge of the system geometry, by design and/or calibration. It</p><p>works well with a far-away scene, in which case there is only limited</p><p>variation in projective geometry among the sub-images. However, the</p><p>system geometry is not invariant to physical conditions such as</p><p>thermal variation, stress variation and etc.. The composition with</p><p>this approach is typically done in the spatial space. The other</p><p>approach is more robust to geometric and optical conditions. It works</p><p>surprisingly well with feature-rich and stationary scenes, not well</p><p>with the absence of recognizable features. The composition based on</p><p>feature matching is typically done in the spatial gradient domain. In</p><p>short, both approaches are challenged by the S.I.N. conditions. With</p><p>certain snapshot data sets obtained and contributed by Brady et al, </p><p>these methods either fail in composition or render images with</p><p>visually disturbing artifacts. To overcome the S.I.N. conditions, we</p><p>have reconciled these two approaches and made successful and</p><p>complementary use of both priori and approximate information about</p><p>geometric system configuration and the feature information from the</p><p>image data. We also designed and developed a software architecture</p><p>with careful extraction of primitive function modules that can be</p><p>efficiently implemented and executed in parallel. In addition to a</p><p>much faster processing speed, the resulting images are clear and</p><p>sharper at the overlapping zones, without typical ghosting artifacts.</p>Dissertatio

    Monen näkymän stereon rekonstruktio tietokonegrafiikan sisällön taltiointiin

    Get PDF
    Rendering of photorealistic models is becoming increasingly feasible and important in computer graphics. Due to the high amount of work in creating such models by hand, required by the need of high level of detail in geometry, colors, and animation, it is desirable to automate the model creation. This task is realised by recovering content from photographs that describe the modeled subject from multiple viewpoints. In this thesis, elementary theory on image acquisition and photograph-based reconstruction of geometry and colors is studied and recent research and software for graphics content reconstruction are reviewed. Based on the studied background, a rig is built as a grid of nine off-the-shelf digital cameras, a custom remote shutter trigger, and supporting software. The purpose of the rig is to capture high-quality photographs and video in a reliable and practical manner, to be processed in multi-view reconstruction programs. The camera rig was configured to shoot small static subjects for experiments. The resulting photos and video were then processed directly for the subject's geometry and color, with little or no image enhancements done. Two typical reconstruction software pipelines were described and tested. The rig was found to perform well for several subjects with little subject-specific tuning; issues in the setup were analyzed and further improvements suggested based on the captured test models. Image quality of the cameras was found to be excellent for the task, and most problems arose from uneven lighting and practical issues. The developed rig was found to produce sub-millimeter scale accuracy in geometry and texture of subjects such as human faces. Future work was suggested for lighting, video synchronization and study of state-of-the-art image processing and reconstruction algorithms.Fotorealististen mallien renderöinti on yhä tärkeämpää ja mahdollisempaa tietokonegrafiikassa. Näiden mallien luominen käsityönä on työlästä vaaditun korkean tarkkuuden takia geometrian, värien ja animaation osalta, ja onkin haluttavaa korvata käsityö automatiikalla. Automatisointi voidaan suorittaa taltioimalla sisältö valokuvista, jotka kuvastavat samaa kohdetta useammasta näkymästä. Tässä diplomityössä tarkastellaan kuvantamista ja valokuvapohjaisen geometrian rekonstruktiota geometrian ja värien kannalta ja katsastetaan viimeaikainen tutkimus ja ohjelmistot grafiikkasisällön rekonstruointiin. Taustan pohjalta rakennetaan laitteisto, joka koostuu yhdeksästä valmiina saatavilla olevasta digikamerasta aseteltuna hilaksi, itse tehdystä etälaukaisimesta sekä ohjelmistoista. Laitteiston tarkoituksena on taltioida luotettavalla ja käytännöllisellä tavalla korkealaatuisia valokuvia ja videokuvaa, joita voi käsitellä monen näkymän stereon rekonstruktion tietokonesovelluksissa. Laitteisto säädettiin kuvaamaan pieniä staattisia kohteita kokeita varten. Tuloksena saadusta kuvamateriaalista laskettiin kohteen geometria ja värit ilman mainittavaa kuvanparannuksen käyttöä. Käytiin läpi kaksi tyypillistä rekonstruktio-ohjelmistoa testiksi ja laitteiston havaittiin soveltuvan hyvin useisiin kohteisiin ilman erityistä säätämistä. Kameroiden kuvanlaatu todettiin tehtävään erinomaiseksi ja useimmat haasteet johtuivat epätasaisesta valaistuksesta ja käytännön pulmista. Laitteiston todettiin tuottavan alle millimetriskaalan geometriaa ja pintavärikuvaa ihmiskasvojen kaltaisista kohteista. Jatkotyötä ehdotettiin valaistukseen, videon synkronointiin ja viimeisimpien kuvankäsittely- ja rekonstruktioalgoritmien tutkimiseen

    NASA Tech Briefs, July 1993

    Get PDF
    Topics include: Data Acquisition and Analysis: Electronic Components and Circuits; Electronic Systems; Physical Sciences; Materials; Computer Programs; Mechanics; Machinery; Fabrication Technology; Mathematics and Information Sciences; Life Sciences

    Holistic interpretation of visual data based on topology:semantic segmentation of architectural facades

    Get PDF
    The work presented in this dissertation is a step towards effectively incorporating contextual knowledge in the task of semantic segmentation. To date, the use of context has been confined to the genre of the scene with a few exceptions in the field. Research has been directed towards enhancing appearance descriptors. While this is unarguably important, recent studies show that computer vision has reached a near-human level of performance in relying on these descriptors when objects have stable distinctive surface properties and in proper imaging conditions. When these conditions are not met, humans exploit their knowledge about the intrinsic geometric layout of the scene to make local decisions. Computer vision lags behind when it comes to this asset. For this reason, we aim to bridge the gap by presenting algorithms for semantic segmentation of building facades making use of scene topological aspects. We provide a classification scheme to carry out segmentation and recognition simultaneously.The algorithm is able to solve a single optimization function and yield a semantic interpretation of facades, relying on the modeling power of probabilistic graphs and efficient discrete combinatorial optimization tools. We tackle the same problem of semantic facade segmentation with the neural network approach.We attain accuracy figures that are on-par with the state-of-the-art in a fully automated pipeline.Starting from pixelwise classifications obtained via Convolutional Neural Networks (CNN). These are then structurally validated through a cascade of Restricted Boltzmann Machines (RBM) and Multi-Layer Perceptron (MLP) that regenerates the most likely layout. In the domain of architectural modeling, there is geometric multi-model fitting. We introduce a novel guided sampling algorithm based on Minimum Spanning Trees (MST), which surpasses other propagation techniques in terms of robustness to noise. We make a number of additional contributions such as measure of model deviation which captures variations among fitted models

    Efficient Dense Registration, Segmentation, and Modeling Methods for RGB-D Environment Perception

    Get PDF
    One perspective for artificial intelligence research is to build machines that perform tasks autonomously in our complex everyday environments. This setting poses challenges to the development of perception skills: A robot should be able to perceive its location and objects in its surrounding, while the objects and the robot itself could also be moving. Objects may not only be composed of rigid parts, but could be non-rigidly deformable or appear in a variety of similar shapes. Furthermore, it could be relevant to the task to observe object semantics. For a robot acting fluently and immediately, these perception challenges demand efficient methods. This theses presents novel approaches to robot perception with RGB-D sensors. It develops efficient registration, segmentation, and modeling methods for scene and object perception. We propose multi-resolution surfel maps as a concise representation for RGB-D measurements. We develop probabilistic registration methods that handle rigid scenes, scenes with multiple rigid parts that move differently, and scenes that undergo non-rigid deformations. We use these methods to learn and perceive 3D models of scenes and objects in both static and dynamic environments. For learning models of static scenes, we propose a real-time capable simultaneous localization and mapping approach. It aligns key views in RGB-D video using our rigid registration method and optimizes the pose graph of the key views. The acquired models are then perceived in live images through detection and tracking within a Bayesian filtering framework. An assumption frequently made for environment mapping is that the observed scene remains static during the mapping process. Through rigid multi-body registration, we take advantage of releasing this assumption: Our registration method segments views into parts that move independently between the views and simultaneously estimates their motion. Within simultaneous motion segmentation, localization, and mapping, we separate scenes into objects by their motion. Our approach acquires 3D models of objects and concurrently infers hierarchical part relations between them using probabilistic reasoning. It can be applied for interactive learning of objects and their part decomposition. Endowing robots with manipulation skills for a large variety of objects is a tedious endeavor if the skill is programmed for every instance of an object class. Furthermore, slight deformations of an instance could not be handled by an inflexible program. Deformable registration is useful to perceive such shape variations, e.g., between specific instances of a tool. We develop an efficient deformable registration method and apply it for the transfer of robot manipulation skills between varying object instances. On the object-class level, we segment images using random decision forest classifiers in real-time. The probabilistic labelings of individual images are fused in 3D semantic maps within a Bayesian framework. We combine our object-class segmentation method with simultaneous localization and mapping to achieve online semantic mapping in real-time. The methods developed in this thesis are evaluated in experiments on publicly available benchmark datasets and novel own datasets. We publicly demonstrate several of our perception approaches within integrated robot systems in the mobile manipulation context.Effiziente Dichte Registrierungs-, Segmentierungs- und Modellierungsmethoden für die RGB-D Umgebungswahrnehmung In dieser Arbeit beschäftigen wir uns mit Herausforderungen der visuellen Wahrnehmung für intelligente Roboter in Alltagsumgebungen. Solche Roboter sollen sich selbst in ihrer Umgebung zurechtfinden, und Wissen über den Verbleib von Objekten erwerben können. Die Schwierigkeit dieser Aufgaben erhöht sich in dynamischen Umgebungen, in denen ein Roboter die Bewegung einzelner Teile differenzieren und auch wahrnehmen muss, wie sich diese Teile bewegen. Bewegt sich ein Roboter selbständig in dieser Umgebung, muss er auch seine eigene Bewegung von der Veränderung der Umgebung unterscheiden. Szenen können sich aber nicht nur durch die Bewegung starrer Teile verändern. Auch die Teile selbst können ihre Form in nicht-rigider Weise ändern. Eine weitere Herausforderung stellt die semantische Interpretation von Szenengeometrie und -aussehen dar. Damit intelligente Roboter unmittelbar und flüssig handeln können, sind effiziente Algorithmen für diese Wahrnehmungsprobleme erforderlich. Im ersten Teil dieser Arbeit entwickeln wir effiziente Methoden zur Repräsentation und Registrierung von RGB-D Messungen. Zunächst stellen wir Multi-Resolutions-Oberflächenelement-Karten (engl. multi-resolution surfel maps, MRSMaps) als eine kompakte Repräsentation von RGB-D Messungen vor, die unseren effizienten Registrierungsmethoden zugrunde liegt. Bilder können effizient in dieser Repräsentation aggregiert werde, wobei auch mehrere Bilder aus verschiedenen Blickpunkten integriert werden können, um Modelle von Szenen und Objekte aus vielfältigen Ansichten darzustellen. Für die effiziente, robuste und genaue Registrierung von MRSMaps wird eine Methode vorgestellt, die Rigidheit der betrachteten Szene voraussetzt. Die Registrierung schätzt die Kamerabewegung zwischen den Bildern und gewinnt ihre Effizienz durch die Ausnutzung der kompakten multi-resolutionalen Darstellung der Karten. Die Registrierungsmethode erzielt hohe Bildverarbeitungsraten auf einer CPU. Wir demonstrieren hohe Effizienz, Genauigkeit und Robustheit unserer Methode im Vergleich zum bisherigen Stand der Forschung auf Vergleichsdatensätzen. In einem weiteren Registrierungsansatz lösen wir uns von der Annahme, dass die betrachtete Szene zwischen Bildern statisch ist. Wir erlauben nun, dass sich rigide Teile der Szene bewegen dürfen, und erweitern unser rigides Registrierungsverfahren auf diesen Fall. Unser Ansatz segmentiert das Bild in Bereiche einzelner Teile, die sich unterschiedlich zwischen Bildern bewegen. Wir demonstrieren hohe Segmentierungsgenauigkeit und Genauigkeit in der Bewegungsschätzung unter Echtzeitbedingungen für die Verarbeitung. Schließlich entwickeln wir ein Verfahren für die Wahrnehmung von nicht-rigiden Deformationen zwischen zwei MRSMaps. Auch hier nutzen wir die multi-resolutionale Struktur in den Karten für ein effizientes Registrieren von grob zu fein. Wir schlagen Methoden vor, um aus den geschätzten Deformationen die lokale Bewegung zwischen den Bildern zu berechnen. Wir evaluieren Genauigkeit und Effizienz des Registrierungsverfahrens. Der zweite Teil dieser Arbeit widmet sich der Verwendung unserer Kartenrepräsentation und Registrierungsmethoden für die Wahrnehmung von Szenen und Objekten. Wir verwenden MRSMaps und unsere rigide Registrierungsmethode, um dichte 3D Modelle von Szenen und Objekten zu lernen. Die räumlichen Beziehungen zwischen Schlüsselansichten, die wir durch Registrierung schätzen, werden in einem Simultanen Lokalisierungs- und Kartierungsverfahren (engl. simultaneous localization and mapping, SLAM) gegeneinander abgewogen, um die Blickposen der Schlüsselansichten zu schätzen. Für das Verfolgen der Kamerapose bezüglich der Modelle in Echtzeit, kombinieren wir die Genauigkeit unserer Registrierung mit der Robustheit von Partikelfiltern. Zu Beginn der Posenverfolgung, oder wenn das Objekt aufgrund von Verdeckungen oder extremen Bewegungen nicht weiter verfolgt werden konnte, initialisieren wir das Filter durch Objektdetektion. Anschließend wenden wir unsere erweiterten Registrierungsverfahren für die Wahrnehmung in nicht-rigiden Szenen und für die Übertragung von Objekthandhabungsfähigkeiten von Robotern an. Wir erweitern unseren rigiden Kartierungsansatz auf dynamische Szenen, in denen sich rigide Teile bewegen. Die Bewegungssegmente in Schlüsselansichten werden zueinander in Bezug gesetzt, um Äquivalenz- und Teilebeziehungen von Objekten probabilistisch zu inferieren, denen die Segmente entsprechen. Auch hier liefert unsere Registrierungsmethode die Bewegung der Kamera bezüglich der Objekte, die wir in einem SLAM Verfahren optimieren. Aus diesen Blickposen wiederum können wir die Bewegungssegmente in dichten Objektmodellen vereinen. Objekte einer Klasse teilen oft eine gemeinsame Topologie von funktionalen Elementen, die durch Formkorrespondenzen ermittelt werden kann. Wir verwenden unsere deformierbare Registrierung, um solche Korrespondenzen zu finden und die Handhabung eines Objektes durch einen Roboter auf neue Objektinstanzen derselben Klasse zu übertragen. Schließlich entwickeln wir einen echtzeitfähigen Ansatz, der Kategorien von Objekten in RGB-D Bildern erkennt und segmentiert. Die Segmentierung basiert auf Ensemblen randomisierter Entscheidungsbäume, die Geometrie- und Texturmerkmale zur Klassifikation verwenden. Wir fusionieren Segmentierungen von Einzelbildern einer Szene aus mehreren Ansichten in einer semantischen Objektklassenkarte mit Hilfe unseres SLAM-Verfahrens. Die vorgestellten Methoden werden auf öffentlich verfügbaren Vergleichsdatensätzen und eigenen Datensätzen evaluiert. Einige unserer Ansätze wurden auch in integrierten Robotersystemen für mobile Objekthantierungsaufgaben öffentlich demonstriert. Sie waren ein wichtiger Bestandteil für das Gewinnen der RoboCup-Roboterwettbewerbe in der RoboCup@Home Liga in den Jahren 2011, 2012 und 2013
    corecore