1,100 research outputs found

    ResDepth: Learned Residual Stereo Reconstruction

    Full text link
    We propose an embarrassingly simple but very effective scheme for high-quality dense stereo reconstruction: (i) generate an approximate reconstruction with your favourite stereo matcher; (ii) rewarp the input images with that approximate model; (iii) with the initial reconstruction and the warped images as input, train a deep network to enhance the reconstruction by regressing a residual correction; and (iv) if desired, iterate the refinement with the new, improved reconstruction. The strategy to only learn the residual greatly simplifies the learning problem. A standard Unet without bells and whistles is enough to reconstruct even small surface details, like dormers and roof substructures in satellite images. We also investigate residual reconstruction with less information and find that even a single image is enough to greatly improve an approximate reconstruction. Our full model reduces the mean absolute error of state-of-the-art stereo reconstruction systems by >50%, both in our target domain of satellite stereo and on stereo pairs from the ETH3D benchmark.Comment: updated supplementary materia

    DeepLO: Geometry-Aware Deep LiDAR Odometry

    Full text link
    Recently, learning-based ego-motion estimation approaches have drawn strong interest from studies mostly focusing on visual perception. These groundbreaking works focus on unsupervised learning for odometry estimation but mostly for visual sensors. Compared to images, a learning-based approach using Light Detection and Ranging (LiDAR) has been reported in a few studies where, most often, a supervised learning framework is proposed. In this paper, we propose a novel approach to geometry-aware deep LiDAR odometry trainable via both supervised and unsupervised frameworks. We incorporate the Iterated Closest Point (ICP) algorithm into a deep-learning framework and show the reliability of the proposed pipeline. We provide two loss functions that allow switching between supervised and unsupervised learning depending on the ground-truth validity in the training phase. An evaluation using the KITTI and Oxford RobotCar dataset demonstrates the prominent performance and efficiency of the proposed method when achieving pose accuracy.Comment: 8 page

    ALIGNet: Partial-Shape Agnostic Alignment via Unsupervised Learning

    Full text link
    The process of aligning a pair of shapes is a fundamental operation in computer graphics. Traditional approaches rely heavily on matching corresponding points or features to guide the alignment, a paradigm that falters when significant shape portions are missing. These techniques generally do not incorporate prior knowledge about expected shape characteristics, which can help compensate for any misleading cues left by inaccuracies exhibited in the input shapes. We present an approach based on a deep neural network, leveraging shape datasets to learn a shape-aware prior for source-to-target alignment that is robust to shape incompleteness. In the absence of ground truth alignments for supervision, we train a network on the task of shape alignment using incomplete shapes generated from full shapes for self-supervision. Our network, called ALIGNet, is trained to warp complete source shapes to incomplete targets, as if the target shapes were complete, thus essentially rendering the alignment partial-shape agnostic. We aim for the network to develop specialized expertise over the common characteristics of the shapes in each dataset, thereby achieving a higher-level understanding of the expected shape space to which a local approach would be oblivious. We constrain ALIGNet through an anisotropic total variation identity regularization to promote piecewise smooth deformation fields, facilitating both partial-shape agnosticism and post-deformation applications. We demonstrate that ALIGNet learns to align geometrically distinct shapes, and is able to infer plausible mappings even when the target shape is significantly incomplete. We show that our network learns the common expected characteristics of shape collections, without over-fitting or memorization, enabling it to produce plausible deformations on unseen data during test time.Comment: To be presented at SIGGRAPH Asia 201

    Structured Indoor Modeling

    Get PDF
    In this dissertation, we propose data-driven approaches to reconstruct 3D models for indoor scenes which are represented in a structured way (e.g., a wall is represented by a planar surface and two rooms are connected via the wall). The structured representation of models is more application ready than dense representations (e.g., a point cloud), but poses additional challenges for reconstruction since extracting structures requires high-level understanding about geometries. To address this challenging problem, we explore two common structural regularities of indoor scenes: 1) most indoor structures consist of planar surfaces (planarity), and 2) structural surfaces (e.g., walls and floor) can be represented by a 2D floorplan as a top-down view projection (orthogonality). With breakthroughs in data capturing techniques, we develop automated systems to tackle structured modeling problems, namely piece-wise planar reconstruction and floorplan reconstruction, by learning shape priors (i.e., planarity and orthogonality) from data. With structured representations and production-level quality, the reconstructed models have an immediate impact on many industrial applications

    Unsupervised Bi-directional Flow-based Video Generation from one Snapshot

    Full text link
    Imagining multiple consecutive frames given one single snapshot is challenging, since it is difficult to simultaneously predict diverse motions from a single image and faithfully generate novel frames without visual distortions. In this work, we leverage an unsupervised variational model to learn rich motion patterns in the form of long-term bi-directional flow fields, and apply the predicted flows to generate high-quality video sequences. In contrast to the state-of-the-art approach, our method does not require external flow supervisions for learning. This is achieved through a novel module that performs bi-directional flows prediction from a single image. In addition, with the bi-directional flow consistency check, our method can handle occlusion and warping artifacts in a principled manner. Our method can be trained end-to-end based on arbitrarily sampled natural video clips, and it is able to capture multi-modal motion uncertainty and synthesizes photo-realistic novel sequences. Quantitative and qualitative evaluations over synthetic and real-world datasets demonstrate the effectiveness of the proposed approach over the state-of-the-art methods.Comment: 11 pages, 12 figures. Technical report for a project in progres

    Learning to Predict Indoor Illumination from a Single Image

    Full text link
    We propose an automatic method to infer high dynamic range illumination from a single, limited field-of-view, low dynamic range photograph of an indoor scene. In contrast to previous work that relies on specialized image capture, user input, and/or simple scene models, we train an end-to-end deep neural network that directly regresses a limited field-of-view photo to HDR illumination, without strong assumptions on scene geometry, material properties, or lighting. We show that this can be accomplished in a three step process: 1) we train a robust lighting classifier to automatically annotate the location of light sources in a large dataset of LDR environment maps, 2) we use these annotations to train a deep neural network that predicts the location of lights in a scene from a single limited field-of-view photo, and 3) we fine-tune this network using a small dataset of HDR environment maps to predict light intensities. This allows us to automatically recover high-quality HDR illumination estimates that significantly outperform previous state-of-the-art methods. Consequently, using our illumination estimates for applications like 3D object insertion, we can achieve results that are photo-realistic, which is validated via a perceptual user study

    Robust multimodal dense SLAM

    Get PDF
    To enable increasingly intelligent behaviours, autonomous robots will need to be equipped with a deep understanding of their surrounding environment. It would be particularly desirable if this level of perception could be achieved automatically through the use of vision-based sensing, as passive cameras make a compelling sensor choice for robotic platforms due to their low cost, low weight, and low power consumption. Fundamental to extracting a high-level understanding from a set of 2D images is an understanding of the underlying 3D geometry of the environment. In mobile robotics, the most popular and successful technique for building a representation of 3D geometry from 2D images is Visual Simultaneous Localisation and Mapping (SLAM). While sparse, landmark-based SLAM systems have demonstrated high levels of accuracy and robustness, they are only capable of producing sparse maps. In general, to move beyond simple navigation to scene understanding and interaction, dense 3D reconstructions are required. Dense SLAM systems naturally allow for online dense scene reconstruction, but suffer from a lack of robustness due to the fact that the dense image alignment used in the tracking step has a narrow convergence basin and that the photometric-based depth estimation used in the mapping step is typically poorly constrained due to the presence of occlusions and homogeneous textures. This thesis develops methods that can be used to increase the robustness of dense SLAM by fusing additional sensing modalities into standard dense SLAM pipelines. In particular, this thesis will look at two sensing modalities: acceleration and rotation rate measurements from an inertial measurement unit (IMU) to address the tracking issue, and learned priors on dense reconstructions from deep neural networks (DNNs) to address the mapping issue.Open Acces

    Self-Calibration of Multi-Camera Systems for Vehicle Surround Sensing

    Get PDF
    Multikamerasysteme werden heute bereits in einer Vielzahl von Fahrzeugen und mobilen Robotern eingesetzt. Die Anwendungen reichen dabei von einfachen Assistenzfunktionen wie der Erzeugung einer virtuellen Rundumsicht bis hin zur Umfelderfassung, wie sie für teil- und vollautomatisches Fahren benötigt wird. Damit aus den Kamerabildern metrische Größen wie Distanzen und Winkel abgeleitet werden können und ein konsistentes Umfeldmodell aufgebaut werden kann, muss das Abbildungsverhalten der einzelnen Kameras sowie deren relative Lage zueinander bekannt sein. Insbesondere die Bestimmung der relativen Lage der Kameras zueinander, die durch die extrinsische Kalibrierung beschrieben wird, ist aufwendig, da sie nur im Gesamtverbund erfolgen kann. Darüber hinaus ist zu erwarten, dass es über die Lebensdauer des Fahrzeugs hinweg zu nicht vernachlässigbaren Veränderungen durch äußere Einflüsse kommt. Um den hohen Zeit- und Kostenaufwand einer regelmäßigen Wartung zu vermeiden, ist ein Selbstkalibrierungsverfahren erforderlich, das die extrinsischen Kalibrierparameter fortlaufend nachschätzt. Für die Selbstkalibrierung wird typischerweise das Vorhandensein überlappender Sichtbereiche ausgenutzt, um die extrinsische Kalibrierung auf der Basis von Bildkorrespondenzen zu schätzen. Falls die Sichtbereiche mehrerer Kameras jedoch nicht überlappen, lassen sich die Kalibrierparameter auch aus den relativen Bewegungen ableiten, die die einzelnen Kameras beobachten. Die Bewegung typischer Straßenfahrzeuge lässt dabei jedoch nicht die Bestimmung aller Kalibrierparameter zu. Um die vollständige Schätzung der Parameter zu ermöglichen, lassen sich weitere Bedingungsgleichungen, die sich z.B. aus der Beobachtung der Bodenebene ergeben, einbinden. In dieser Arbeit wird dazu in einer theoretischen Analyse gezeigt, welche Parameter sich aus der Kombination verschiedener Bedingungsgleichungen eindeutig bestimmen lassen. Um das Umfeld eines Fahrzeugs vollständig erfassen zu können, werden typischerweise Objektive, wie zum Beispiel Fischaugenobjektive, eingesetzt, die einen sehr großen Bildwinkel ermöglichen. In dieser Arbeit wird ein Verfahren zur Bestimmung von Bildkorrespondenzen vorgeschlagen, das die geometrischen Verzerrungen, die sich durch die Verwendung von Fischaugenobjektiven und sich stark ändernden Ansichten ergeben, berücksichtigt. Darauf aufbauend stellen wir ein robustes Verfahren zur Nachführung der Parameter der Bodenebene vor. Basierend auf der theoretischen Analyse der Beobachtbarkeit und den vorgestellten Verfahren stellen wir ein robustes, rekursives Kalibrierverfahren vor, das auf einem erweiterten Kalman-Filter aufbaut. Das vorgestellte Kalibrierverfahren zeichnet sich insbesondere durch die geringe Anzahl von internen Parametern, sowie durch die hohe Flexibilität hinsichtlich der einbezogenen Bedingungsgleichungen aus und basiert einzig auf den Bilddaten des Multikamerasystems. In einer umfangreichen experimentellen Auswertung mit realen Daten vergleichen wir die Ergebnisse der auf unterschiedlichen Bedingungsgleichungen und Bewegungsmodellen basierenden Verfahren mit den aus einer Referenzkalibrierung bestimmten Parametern. Die besten Ergebnisse wurden dabei durch die Kombination aller vorgestellten Bedingungsgleichungen erzielt. Anhand mehrerer Beispiele zeigen wir, dass die erreichte Genauigkeit ausreichend für eine Vielzahl von Anwendungen ist

    Cycle-IR: Deep Cyclic Image Retargeting

    Full text link
    Supervised deep learning techniques have achieved great success in various fields due to getting rid of the limitation of handcrafted representations. However, most previous image retargeting algorithms still employ fixed design principles such as using gradient map or handcrafted features to compute saliency map, which inevitably restricts its generality. Deep learning techniques may help to address this issue, but the challenging problem is that we need to build a large-scale image retargeting dataset for the training of deep retargeting models. However, building such a dataset requires huge human efforts. In this paper, we propose a novel deep cyclic image retargeting approach, called Cycle-IR, to firstly implement image retargeting with a single deep model, without relying on any explicit user annotations. Our idea is built on the reverse mapping from the retargeted images to the given input images. If the retargeted image has serious distortion or excessive loss of important visual information, the reverse mapping is unlikely to restore the input image well. We constrain this forward-reverse consistency by introducing a cyclic perception coherence loss. In addition, we propose a simple yet effective image retargeting network (IRNet) to implement the image retargeting process. Our IRNet contains a spatial and channel attention layer, which is able to discriminate visually important regions of input images effectively, especially in cluttered images. Given arbitrary sizes of input images and desired aspect ratios, our Cycle-IR can produce visually pleasing target images directly. Extensive experiments on the standard RetargetMe dataset show the superiority of our Cycle-IR. In addition, our Cycle-IR outperforms the Multiop method and obtains the best result in the user study. Code is available at https://github.com/mintanwei/Cycle-IR.Comment: 12 page
    corecore