1,100 research outputs found
Recommended from our members
Semi-automated Anatomical Labeling and Inter-subject Warping of High-Density Intracranial Recording Electrodes in Electrocorticography.
In this article, we introduce img_pipe, our open source python package for preprocessing of imaging data for use in intracranial electrocorticography (ECoG) and intracranial stereo-EEG analyses. The process of electrode localization, labeling, and warping for use in ECoG currently varies widely across laboratories, and it is usually performed with custom, lab-specific code. This python package aims to provide a standardized interface for these procedures, as well as code to plot and display results on 3D cortical surface meshes. It gives the user an easy interface to create anatomically labeled electrodes that can also be warped to an atlas brain, starting with only a preoperative T1 MRI scan and a postoperative CT scan. We describe the full capabilities of our imaging pipeline and present a step-by-step protocol for users
ResDepth: Learned Residual Stereo Reconstruction
We propose an embarrassingly simple but very effective scheme for
high-quality dense stereo reconstruction: (i) generate an approximate
reconstruction with your favourite stereo matcher; (ii) rewarp the input images
with that approximate model; (iii) with the initial reconstruction and the
warped images as input, train a deep network to enhance the reconstruction by
regressing a residual correction; and (iv) if desired, iterate the refinement
with the new, improved reconstruction. The strategy to only learn the residual
greatly simplifies the learning problem. A standard Unet without bells and
whistles is enough to reconstruct even small surface details, like dormers and
roof substructures in satellite images. We also investigate residual
reconstruction with less information and find that even a single image is
enough to greatly improve an approximate reconstruction. Our full model reduces
the mean absolute error of state-of-the-art stereo reconstruction systems by
>50%, both in our target domain of satellite stereo and on stereo pairs from
the ETH3D benchmark.Comment: updated supplementary materia
DeepLO: Geometry-Aware Deep LiDAR Odometry
Recently, learning-based ego-motion estimation approaches have drawn strong
interest from studies mostly focusing on visual perception. These
groundbreaking works focus on unsupervised learning for odometry estimation but
mostly for visual sensors. Compared to images, a learning-based approach using
Light Detection and Ranging (LiDAR) has been reported in a few studies where,
most often, a supervised learning framework is proposed. In this paper, we
propose a novel approach to geometry-aware deep LiDAR odometry trainable via
both supervised and unsupervised frameworks. We incorporate the Iterated
Closest Point (ICP) algorithm into a deep-learning framework and show the
reliability of the proposed pipeline. We provide two loss functions that allow
switching between supervised and unsupervised learning depending on the
ground-truth validity in the training phase. An evaluation using the KITTI and
Oxford RobotCar dataset demonstrates the prominent performance and efficiency
of the proposed method when achieving pose accuracy.Comment: 8 page
ALIGNet: Partial-Shape Agnostic Alignment via Unsupervised Learning
The process of aligning a pair of shapes is a fundamental operation in
computer graphics. Traditional approaches rely heavily on matching
corresponding points or features to guide the alignment, a paradigm that
falters when significant shape portions are missing. These techniques generally
do not incorporate prior knowledge about expected shape characteristics, which
can help compensate for any misleading cues left by inaccuracies exhibited in
the input shapes. We present an approach based on a deep neural network,
leveraging shape datasets to learn a shape-aware prior for source-to-target
alignment that is robust to shape incompleteness. In the absence of ground
truth alignments for supervision, we train a network on the task of shape
alignment using incomplete shapes generated from full shapes for
self-supervision. Our network, called ALIGNet, is trained to warp complete
source shapes to incomplete targets, as if the target shapes were complete,
thus essentially rendering the alignment partial-shape agnostic. We aim for the
network to develop specialized expertise over the common characteristics of the
shapes in each dataset, thereby achieving a higher-level understanding of the
expected shape space to which a local approach would be oblivious. We constrain
ALIGNet through an anisotropic total variation identity regularization to
promote piecewise smooth deformation fields, facilitating both partial-shape
agnosticism and post-deformation applications. We demonstrate that ALIGNet
learns to align geometrically distinct shapes, and is able to infer plausible
mappings even when the target shape is significantly incomplete. We show that
our network learns the common expected characteristics of shape collections,
without over-fitting or memorization, enabling it to produce plausible
deformations on unseen data during test time.Comment: To be presented at SIGGRAPH Asia 201
Structured Indoor Modeling
In this dissertation, we propose data-driven approaches to reconstruct 3D models for indoor scenes which are represented in a structured way (e.g., a wall is represented by a planar surface and two rooms are connected via the wall). The structured representation of models is more application ready than dense representations (e.g., a point cloud), but poses additional challenges for reconstruction since extracting structures requires high-level understanding about geometries. To address this challenging problem, we explore two common structural regularities of indoor scenes: 1) most indoor structures consist of planar surfaces (planarity), and 2) structural surfaces (e.g., walls and floor) can be represented by a 2D floorplan as a top-down view projection (orthogonality). With breakthroughs in data capturing techniques, we develop automated systems to tackle structured modeling problems, namely piece-wise planar reconstruction and floorplan reconstruction, by learning shape priors (i.e., planarity and orthogonality) from data. With structured representations and production-level quality, the reconstructed models have an immediate impact on many industrial applications
Unsupervised Bi-directional Flow-based Video Generation from one Snapshot
Imagining multiple consecutive frames given one single snapshot is
challenging, since it is difficult to simultaneously predict diverse motions
from a single image and faithfully generate novel frames without visual
distortions. In this work, we leverage an unsupervised variational model to
learn rich motion patterns in the form of long-term bi-directional flow fields,
and apply the predicted flows to generate high-quality video sequences. In
contrast to the state-of-the-art approach, our method does not require external
flow supervisions for learning. This is achieved through a novel module that
performs bi-directional flows prediction from a single image. In addition, with
the bi-directional flow consistency check, our method can handle occlusion and
warping artifacts in a principled manner. Our method can be trained end-to-end
based on arbitrarily sampled natural video clips, and it is able to capture
multi-modal motion uncertainty and synthesizes photo-realistic novel sequences.
Quantitative and qualitative evaluations over synthetic and real-world datasets
demonstrate the effectiveness of the proposed approach over the
state-of-the-art methods.Comment: 11 pages, 12 figures. Technical report for a project in progres
Learning to Predict Indoor Illumination from a Single Image
We propose an automatic method to infer high dynamic range illumination from
a single, limited field-of-view, low dynamic range photograph of an indoor
scene. In contrast to previous work that relies on specialized image capture,
user input, and/or simple scene models, we train an end-to-end deep neural
network that directly regresses a limited field-of-view photo to HDR
illumination, without strong assumptions on scene geometry, material
properties, or lighting. We show that this can be accomplished in a three step
process: 1) we train a robust lighting classifier to automatically annotate the
location of light sources in a large dataset of LDR environment maps, 2) we use
these annotations to train a deep neural network that predicts the location of
lights in a scene from a single limited field-of-view photo, and 3) we
fine-tune this network using a small dataset of HDR environment maps to predict
light intensities. This allows us to automatically recover high-quality HDR
illumination estimates that significantly outperform previous state-of-the-art
methods. Consequently, using our illumination estimates for applications like
3D object insertion, we can achieve results that are photo-realistic, which is
validated via a perceptual user study
Robust multimodal dense SLAM
To enable increasingly intelligent behaviours, autonomous robots will need to be equipped with a deep understanding of their surrounding environment. It would be particularly desirable if this level of perception could be achieved automatically through the use of vision-based sensing, as passive cameras make a compelling sensor choice for robotic platforms due to their low cost, low weight, and low power consumption.
Fundamental to extracting a high-level understanding from a set of 2D images is an understanding of the underlying 3D geometry of the environment. In mobile robotics, the most popular and successful technique for building a representation of 3D geometry from 2D images is Visual Simultaneous Localisation and Mapping (SLAM). While sparse, landmark-based SLAM systems have demonstrated high levels of accuracy and robustness, they are only capable of producing sparse maps. In general, to move beyond simple navigation to scene understanding and interaction, dense 3D reconstructions are required.
Dense SLAM systems naturally allow for online dense scene reconstruction, but suffer from a lack of robustness due to the fact that the dense image alignment used in the tracking step has a narrow convergence basin and that the photometric-based depth estimation used in the mapping step is typically poorly constrained due to the presence of occlusions and homogeneous textures.
This thesis develops methods that can be used to increase the robustness of dense SLAM by fusing additional sensing modalities into standard dense SLAM pipelines. In particular, this thesis will look at two sensing modalities: acceleration and rotation rate measurements from an inertial measurement unit (IMU) to address the tracking issue, and learned priors on dense reconstructions from deep neural networks (DNNs) to address the mapping issue.Open Acces
Self-Calibration of Multi-Camera Systems for Vehicle Surround Sensing
Multikamerasysteme werden heute bereits in einer Vielzahl von Fahrzeugen und mobilen Robotern eingesetzt. Die Anwendungen reichen dabei von einfachen Assistenzfunktionen wie der Erzeugung einer virtuellen Rundumsicht bis hin zur Umfelderfassung, wie sie für teil- und vollautomatisches Fahren benötigt wird. Damit aus den Kamerabildern metrische Größen wie Distanzen und Winkel abgeleitet werden können und ein konsistentes Umfeldmodell aufgebaut werden kann, muss das Abbildungsverhalten der einzelnen Kameras sowie deren relative Lage zueinander bekannt sein.
Insbesondere die Bestimmung der relativen Lage der Kameras zueinander, die durch die extrinsische Kalibrierung beschrieben wird, ist aufwendig, da sie nur im Gesamtverbund erfolgen kann. Darüber hinaus ist zu erwarten, dass es über die Lebensdauer des Fahrzeugs hinweg zu nicht vernachlässigbaren Veränderungen durch äußere Einflüsse kommt. Um den hohen Zeit- und Kostenaufwand einer regelmäßigen Wartung zu vermeiden, ist ein Selbstkalibrierungsverfahren erforderlich, das die extrinsischen Kalibrierparameter fortlaufend nachschätzt.
Für die Selbstkalibrierung wird typischerweise das Vorhandensein überlappender Sichtbereiche ausgenutzt, um die extrinsische Kalibrierung auf der Basis von Bildkorrespondenzen zu schätzen. Falls die Sichtbereiche mehrerer Kameras jedoch nicht überlappen, lassen sich die Kalibrierparameter auch aus den relativen Bewegungen ableiten, die die einzelnen Kameras beobachten. Die Bewegung typischer Straßenfahrzeuge lässt dabei jedoch nicht die Bestimmung aller Kalibrierparameter zu. Um die vollständige Schätzung der Parameter zu ermöglichen, lassen sich weitere Bedingungsgleichungen, die sich z.B. aus der Beobachtung der Bodenebene ergeben, einbinden. In dieser Arbeit wird dazu in einer theoretischen Analyse gezeigt, welche Parameter sich aus der Kombination verschiedener Bedingungsgleichungen eindeutig bestimmen lassen.
Um das Umfeld eines Fahrzeugs vollständig erfassen zu können, werden typischerweise Objektive, wie zum Beispiel Fischaugenobjektive, eingesetzt, die einen sehr großen Bildwinkel ermöglichen. In dieser Arbeit wird ein Verfahren zur Bestimmung von Bildkorrespondenzen vorgeschlagen, das die geometrischen Verzerrungen, die sich durch die Verwendung von Fischaugenobjektiven und sich stark ändernden Ansichten ergeben, berücksichtigt. Darauf aufbauend stellen wir ein robustes Verfahren zur Nachführung der Parameter der Bodenebene vor.
Basierend auf der theoretischen Analyse der Beobachtbarkeit und den vorgestellten Verfahren stellen wir ein robustes, rekursives Kalibrierverfahren vor, das auf einem erweiterten Kalman-Filter aufbaut. Das vorgestellte Kalibrierverfahren zeichnet sich insbesondere durch die geringe Anzahl von internen Parametern, sowie durch die hohe Flexibilität hinsichtlich der einbezogenen Bedingungsgleichungen aus und basiert einzig auf den Bilddaten des Multikamerasystems.
In einer umfangreichen experimentellen Auswertung mit realen Daten vergleichen wir die Ergebnisse der auf unterschiedlichen Bedingungsgleichungen und Bewegungsmodellen basierenden Verfahren mit den aus einer Referenzkalibrierung bestimmten Parametern. Die besten Ergebnisse wurden dabei durch die Kombination aller vorgestellten Bedingungsgleichungen erzielt. Anhand mehrerer Beispiele zeigen wir, dass die erreichte Genauigkeit ausreichend für eine Vielzahl von Anwendungen ist
Cycle-IR: Deep Cyclic Image Retargeting
Supervised deep learning techniques have achieved great success in various
fields due to getting rid of the limitation of handcrafted representations.
However, most previous image retargeting algorithms still employ fixed design
principles such as using gradient map or handcrafted features to compute
saliency map, which inevitably restricts its generality. Deep learning
techniques may help to address this issue, but the challenging problem is that
we need to build a large-scale image retargeting dataset for the training of
deep retargeting models. However, building such a dataset requires huge human
efforts.
In this paper, we propose a novel deep cyclic image retargeting approach,
called Cycle-IR, to firstly implement image retargeting with a single deep
model, without relying on any explicit user annotations. Our idea is built on
the reverse mapping from the retargeted images to the given input images. If
the retargeted image has serious distortion or excessive loss of important
visual information, the reverse mapping is unlikely to restore the input image
well. We constrain this forward-reverse consistency by introducing a cyclic
perception coherence loss. In addition, we propose a simple yet effective image
retargeting network (IRNet) to implement the image retargeting process. Our
IRNet contains a spatial and channel attention layer, which is able to
discriminate visually important regions of input images effectively, especially
in cluttered images. Given arbitrary sizes of input images and desired aspect
ratios, our Cycle-IR can produce visually pleasing target images directly.
Extensive experiments on the standard RetargetMe dataset show the superiority
of our Cycle-IR. In addition, our Cycle-IR outperforms the Multiop method and
obtains the best result in the user study. Code is available at
https://github.com/mintanwei/Cycle-IR.Comment: 12 page
- …