6,670 research outputs found
Learned Multi-Patch Similarity
Estimating a depth map from multiple views of a scene is a fundamental task
in computer vision. As soon as more than two viewpoints are available, one
faces the very basic question how to measure similarity across >2 image
patches. Surprisingly, no direct solution exists, instead it is common to fall
back to more or less robust averaging of two-view similarities. Encouraged by
the success of machine learning, and in particular convolutional neural
networks, we propose to learn a matching function which directly maps multiple
image patches to a scalar similarity score. Experiments on several multi-view
datasets demonstrate that this approach has advantages over methods based on
pairwise patch similarity.Comment: 10 pages, 7 figures, Accepted at ICCV 201
Fast and Accurate Depth Estimation from Sparse Light Fields
We present a fast and accurate method for dense depth reconstruction from
sparsely sampled light fields obtained using a synchronized camera array. In
our method, the source images are over-segmented into non-overlapping compact
superpixels that are used as basic data units for depth estimation and
refinement. Superpixel representation provides a desirable reduction in the
computational cost while preserving the image geometry with respect to the
object contours. Each superpixel is modeled as a plane in the image space,
allowing depth values to vary smoothly within the superpixel area. Initial
depth maps, which are obtained by plane sweeping, are iteratively refined by
propagating good correspondences within an image. To ensure the fast
convergence of the iterative optimization process, we employ a highly parallel
propagation scheme that operates on all the superpixels of all the images at
once, making full use of the parallel graphics hardware. A few optimization
iterations of the energy function incorporating superpixel-wise smoothness and
geometric consistency constraints allows to recover depth with high accuracy in
textured and textureless regions as well as areas with occlusions, producing
dense globally consistent depth maps. We demonstrate that while the depth
reconstruction takes about a second per full high-definition view, the accuracy
of the obtained depth maps is comparable with the state-of-the-art results.Comment: 15 pages, 15 figure
AgriColMap: Aerial-Ground Collaborative 3D Mapping for Precision Farming
The combination of aerial survey capabilities of Unmanned Aerial Vehicles
with targeted intervention abilities of agricultural Unmanned Ground Vehicles
can significantly improve the effectiveness of robotic systems applied to
precision agriculture. In this context, building and updating a common map of
the field is an essential but challenging task. The maps built using robots of
different types show differences in size, resolution and scale, the associated
geolocation data may be inaccurate and biased, while the repetitiveness of both
visual appearance and geometric structures found within agricultural contexts
render classical map merging techniques ineffective. In this paper we propose
AgriColMap, a novel map registration pipeline that leverages a grid-based
multimodal environment representation which includes a vegetation index map and
a Digital Surface Model. We cast the data association problem between maps
built from UAVs and UGVs as a multimodal, large displacement dense optical flow
estimation. The dominant, coherent flows, selected using a voting scheme, are
used as point-to-point correspondences to infer a preliminary non-rigid
alignment between the maps. A final refinement is then performed, by exploiting
only meaningful parts of the registered maps. We evaluate our system using real
world data for 3 fields with different crop species. The results show that our
method outperforms several state of the art map registration and matching
techniques by a large margin, and has a higher tolerance to large initial
misalignments. We release an implementation of the proposed approach along with
the acquired datasets with this paper.Comment: Published in IEEE Robotics and Automation Letters, 201
Combining Features and Semantics for Low-level Computer Vision
Visual perception of depth and motion plays a significant role in understanding and navigating the environment.
Reconstructing outdoor scenes in 3D and estimating the motion from video cameras are of utmost importance for applications like autonomous driving.
The corresponding problems in computer vision have witnessed tremendous progress over the last decades, yet some aspects still remain challenging today. Striking examples are reflecting and textureless surfaces or large motions which cannot be easily recovered using traditional local methods. Further challenges include occlusions, large distortions and difficult lighting conditions. In this thesis, we propose to overcome these challenges by modeling non-local interactions leveraging semantics and contextual information.
Firstly, for binocular stereo estimation, we propose to regularize over larger areas on the image using object-category specific disparity proposals which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image. The disparity proposals encode the fact that objects of certain categories are not arbitrarily shaped but typically exhibit regular structures. We integrate them as non-local regularizer for the challenging object class 'car' into a superpixel-based graphical model and demonstrate its benefits especially in reflective regions.
Secondly, for 3D reconstruction, we leverage the fact that the larger the reconstructed area, the more likely objects of similar type and shape will occur in the scene. This is particularly true for outdoor scenes where buildings and vehicles often suffer from missing texture or reflections, but share similarity in 3D shape. We take advantage of this shape similarity by localizing objects using detectors and jointly reconstructing them while learning a volumetric model of their shape. This allows to reduce noise while completing missing surfaces as objects of similar shape benefit from all observations for the respective category. Evaluations with respect to LIDAR ground-truth on a novel challenging suburban dataset show the advantages of modeling structural dependencies between objects.
Finally, motivated by the success of deep learning techniques in matching problems, we present a method for learning context-aware features for solving optical flow using discrete optimization. Towards this goal, we present an efficient way of training a context network with a large receptive field size on top of a local network using dilated convolutions on patches. We perform feature matching by comparing each pixel in the reference image to every pixel in the target image, utilizing fast GPU matrix multiplication. The matching cost volume from the network's output forms the data term for discrete MAP inference in a pairwise Markov random field. Extensive evaluations reveal the importance of context for feature matching.Die visuelle Wahrnehmung von Tiefe und Bewegung spielt eine wichtige Rolle bei dem VerstĂ€ndnis und der Navigation in unserer Umwelt. Die 3D Rekonstruktion von Szenen im Freien und die SchĂ€tzung der Bewegung von Videokameras sind von gröĂter Bedeutung fĂŒr Anwendungen, wie das autonome Fahren.
Die Erforschung der entsprechenden Probleme des maschinellen Sehens hat in den letzten Jahrzehnten enorme Fortschritte gemacht, jedoch bleiben einige Aspekte heute noch ungelöst. Beispiele hierfĂŒr sind reflektierende und texturlose OberflĂ€chen oder groĂe Bewegungen, bei denen herkömmliche lokale Methoden hĂ€ufig scheitern. Weitere Herausforderungen sind niedrige Bildraten, Verdeckungen, groĂe Verzerrungen und schwierige LichtverhĂ€ltnisse. In dieser Arbeit schlagen wir vor nicht-lokale Interaktionen zu modellieren, die semantische und kontextbezogene Informationen nutzen, um diese Herausforderungen zu meistern.
FĂŒr die binokulare Stereo SchĂ€tzung schlagen wir zuallererst vor zusammenhĂ€ngende Bereiche mit objektklassen-spezifischen DisparitĂ€ts VorschlĂ€gen zu regularisieren, die wir mit inversen Grafik Techniken auf der Grundlage einer spĂ€rlichen DisparitĂ€tsschĂ€tzung und semantischen Segmentierung des Bildes erhalten. Die DisparitĂ€ts VorschlĂ€ge kodieren die Tatsache, dass die GegenstĂ€nde bestimmter Kategorien nicht willkĂŒrlich geformt sind, sondern typischerweise regelmĂ€Ăige Strukturen aufweisen. Wir integrieren sie fĂŒr die komplexe Objektklasse 'Auto' in Form eines nicht-lokalen Regularisierungsterm in ein Superpixel-basiertes grafisches Modell und zeigen die Vorteile vor allem in reflektierenden Bereichen.
Zweitens nutzen wir fĂŒr die 3D-Rekonstruktion die Tatsache, dass mit der GröĂe der rekonstruierten FlĂ€che auch die Wahrscheinlichkeit steigt, Objekte von Ă€hnlicher Art und Form in der Szene zu enthalten. Dies gilt besonders fĂŒr Szenen im Freien, in denen GebĂ€ude und Fahrzeuge oft vorkommen, die unter fehlender Textur oder Reflexionen leiden aber Ă€hnlichkeit in der Form aufweisen. Wir nutzen diese Ă€hnlichkeiten zur Lokalisierung von Objekten mit Detektoren und zur gemeinsamen Rekonstruktion indem ein volumetrisches Modell ihrer Form erlernt wird. Dies ermöglicht auftretendes Rauschen zu reduzieren, wĂ€hrend fehlende FlĂ€chen vervollstĂ€ndigt werden, da Objekte Ă€hnlicher Form von allen Beobachtungen der jeweiligen Kategorie profitieren. Die Evaluierung auf einem neuen, herausfordernden vorstĂ€dtischen Datensatz in Anbetracht von LIDAR-Entfernungsdaten zeigt die Vorteile der Modellierung von strukturellen AbhĂ€ngigkeiten zwischen Objekten.
Zuletzt, motiviert durch den Erfolg von Deep Learning Techniken bei der Mustererkennung, prĂ€sentieren wir eine Methode zum Erlernen von kontextbezogenen Merkmalen zur Lösung des optischen Flusses mittels diskreter Optimierung. Dazu stellen wir eine effiziente Methode vor um zusĂ€tzlich zu einem Lokalen Netzwerk ein Kontext-Netzwerk zu erlernen, das mit Hilfe von erweiterter Faltung auf Patches ein groĂes rezeptives Feld besitzt. FĂŒr das Feature Matching vergleichen wir mit schnellen GPU-Matrixmultiplikation jedes Pixel im Referenzbild mit jedem Pixel im Zielbild. Das aus dem Netzwerk resultierende Matching Kostenvolumen bildet den Datenterm fĂŒr eine diskrete MAP Inferenz in einem paarweisen Markov Random Field. Eine umfangreiche Evaluierung zeigt die Relevanz des Kontextes fĂŒr das Feature Matching
Deep learning in remote sensing: a review
Standing at the paradigm shift towards data-intensive science, machine
learning techniques are becoming increasingly important. In particular, as a
major breakthrough in the field, deep learning has proven as an extremely
powerful tool in many fields. Shall we embrace deep learning as the key to all?
Or, should we resist a 'black-box' solution? There are controversial opinions
in the remote sensing community. In this article, we analyze the challenges of
using deep learning for remote sensing data analysis, review the recent
advances, and provide resources to make deep learning in remote sensing
ridiculously simple to start with. More importantly, we advocate remote sensing
scientists to bring their expertise into deep learning, and use it as an
implicit general model to tackle unprecedented large-scale influential
challenges, such as climate change and urbanization.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin
SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization
In this paper, we introduce Segmentation-Driven Deformation Multi-View Stereo
(SD-MVS), a method that can effectively tackle challenges in 3D reconstruction
of textureless areas. We are the first to adopt the Segment Anything Model
(SAM) to distinguish semantic instances in scenes and further leverage these
constraints for pixelwise patch deformation on both matching cost and
propagation. Concurrently, we propose a unique refinement strategy that
combines spherical coordinates and gradient descent on normals and pixelwise
search interval on depths, significantly improving the completeness of
reconstructed 3D model. Furthermore, we adopt the Expectation-Maximization (EM)
algorithm to alternately optimize the aggregate matching cost and
hyperparameters, effectively mitigating the problem of parameters being
excessively dependent on empirical tuning. Evaluations on the ETH3D
high-resolution multi-view stereo benchmark and the Tanks and Temples dataset
demonstrate that our method can achieve state-of-the-art results with less time
consumption.Comment: 10 pages, 9 figures, published to AAAI202
Keyframe-based monocular SLAM: design, survey, and future directions
Extensive research in the field of monocular SLAM for the past fifteen years
has yielded workable systems that found their way into various applications in
robotics and augmented reality. Although filter-based monocular SLAM systems
were common at some time, the more efficient keyframe-based solutions are
becoming the de facto methodology for building a monocular SLAM system. The
objective of this paper is threefold: first, the paper serves as a guideline
for people seeking to design their own monocular SLAM according to specific
environmental constraints. Second, it presents a survey that covers the various
keyframe-based monocular SLAM systems in the literature, detailing the
components of their implementation, and critically assessing the specific
strategies made in each proposed solution. Third, the paper provides insight
into the direction of future research in this field, to address the major
limitations still facing monocular SLAM; namely, in the issues of illumination
changes, initialization, highly dynamic motion, poorly textured scenes,
repetitive textures, map maintenance, and failure recovery
Euclidean reconstruction of natural underwater scenes using optic imagery sequence
The development of maritime applications require monitoring, studying and preserving of detailed and close observation on the underwater seafloor and objects. Stereo vision offers advanced technologies to build 3D models from 2D still overlapping images in a relatively inexpensive way. However, while image stereo matching is a necessary step in 3D reconstruction procedure, even the most robust dense matching techniques are not guaranteed to work for underwater images due to the challenging aquatic environment. In this thesis, in addition to a detailed introduction and research on the key components of building 3D models from optic images, a robust modified quasi-dense matching algorithm based on correspondence propagation and adaptive least square matching for underwater images is proposed and applied to some typical underwater image datasets. The experiments demonstrate the robustness and good performance of the proposed matching approach
- âŠ