Search CORE

655 research outputs found

Cross-calibration of Time-of-flight and Colour Cameras

Author: Evangelidis Georgios
Hansard Miles
Horaud Radu
Pelorson Quentin
Publication venue
Publication date: 01/01/2014
Field of study

Time-of-flight cameras provide depth information, which is complementary to the photometric appearance of the scene in ordinary images. It is desirable to merge the depth and colour information, in order to obtain a coherent scene representation. However, the individual cameras will have different viewpoints, resolutions and fields of view, which means that they must be mutually calibrated. This paper presents a geometric framework for this multi-view and multi-modal calibration problem. It is shown that three-dimensional projective transformations can be used to align depth and parallax-based representations of the scene, with or without Euclidean reconstruction. A new evaluation procedure is also developed; this allows the reprojection error to be decomposed into calibration and sensor-dependent components. The complete approach is demonstrated on a network of three time-of-flight and six colour cameras. The applications of such a system, to a range of automatic scene-interpretation problems, are discussed.Comment: 18 pages, 12 figures, 3 table

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Queen Mary Research Online

Multi-Scale 3D Scene Flow from Binocular Stereo Sequences

Author: Li Rui
Sclaroff Stan
Publication venue: Boston University Computer Science Department
Publication date: 01/01/2007
Field of study

Scene ﬂow methods estimate the three-dimensional motion ﬁeld for points in the world, using multi-camera video data. Such methods combine multi-view reconstruction with motion estimation. This paper describes an alternative formulation for dense scene ﬂow estimation that provides reliable results using only two cameras by fusing stereo and optical ﬂow estimation into a single coherent framework. Internally, the proposed algorithm generates probability distributions for optical ﬂow and disparity. Taking into account the uncertainty in the intermediate stages allows for more reliable estimation of the 3D scene ﬂow than previous methods allow. To handle the aperture problems inherent in the estimation of optical ﬂow and disparity, a multi-scale method along with a novel region-based technique is used within a regularized solution. This combined approach both preserves discontinuities and prevents over-regularization – two problems commonly associated with the basic multi-scale approaches. Experiments with synthetic and real test data demonstrate the strength of the proposed approach.National Science Foundation (CNS-0202067, IIS-0208876); Office of Naval Research (N00014-03-1-0108

CiteSeerX

Boston University Institutional Repository (OpenBU)

Design strategies for direct multi-scale and multi-orientation feature extraction in the log-polar domain

Author: Chessa Manuela
Sabatini Silvio P.
Solari Fabio
Publication venue
Publication date: 01/01/2012
Field of study

Archivio istituzionale della ricerca - Università di Genova

Open Access Repository

Combining Features and Semantics for Low-level Computer Vision

Author: Güney Fatma
Publication venue: Universität Tübingen
Publication date: 01/01/2018
Field of study

Visual perception of depth and motion plays a significant role in understanding and navigating the environment. Reconstructing outdoor scenes in 3D and estimating the motion from video cameras are of utmost importance for applications like autonomous driving. The corresponding problems in computer vision have witnessed tremendous progress over the last decades, yet some aspects still remain challenging today. Striking examples are reflecting and textureless surfaces or large motions which cannot be easily recovered using traditional local methods. Further challenges include occlusions, large distortions and difficult lighting conditions. In this thesis, we propose to overcome these challenges by modeling non-local interactions leveraging semantics and contextual information. Firstly, for binocular stereo estimation, we propose to regularize over larger areas on the image using object-category specific disparity proposals which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image. The disparity proposals encode the fact that objects of certain categories are not arbitrarily shaped but typically exhibit regular structures. We integrate them as non-local regularizer for the challenging object class 'car' into a superpixel-based graphical model and demonstrate its benefits especially in reflective regions. Secondly, for 3D reconstruction, we leverage the fact that the larger the reconstructed area, the more likely objects of similar type and shape will occur in the scene. This is particularly true for outdoor scenes where buildings and vehicles often suffer from missing texture or reflections, but share similarity in 3D shape. We take advantage of this shape similarity by localizing objects using detectors and jointly reconstructing them while learning a volumetric model of their shape. This allows to reduce noise while completing missing surfaces as objects of similar shape benefit from all observations for the respective category. Evaluations with respect to LIDAR ground-truth on a novel challenging suburban dataset show the advantages of modeling structural dependencies between objects. Finally, motivated by the success of deep learning techniques in matching problems, we present a method for learning context-aware features for solving optical flow using discrete optimization. Towards this goal, we present an efficient way of training a context network with a large receptive field size on top of a local network using dilated convolutions on patches. We perform feature matching by comparing each pixel in the reference image to every pixel in the target image, utilizing fast GPU matrix multiplication. The matching cost volume from the network's output forms the data term for discrete MAP inference in a pairwise Markov random field. Extensive evaluations reveal the importance of context for feature matching.Die visuelle Wahrnehmung von Tiefe und Bewegung spielt eine wichtige Rolle bei dem Verständnis und der Navigation in unserer Umwelt. Die 3D Rekonstruktion von Szenen im Freien und die Schätzung der Bewegung von Videokameras sind von größter Bedeutung für Anwendungen, wie das autonome Fahren. Die Erforschung der entsprechenden Probleme des maschinellen Sehens hat in den letzten Jahrzehnten enorme Fortschritte gemacht, jedoch bleiben einige Aspekte heute noch ungelöst. Beispiele hierfür sind reflektierende und texturlose Oberflächen oder große Bewegungen, bei denen herkömmliche lokale Methoden häufig scheitern. Weitere Herausforderungen sind niedrige Bildraten, Verdeckungen, große Verzerrungen und schwierige Lichtverhältnisse. In dieser Arbeit schlagen wir vor nicht-lokale Interaktionen zu modellieren, die semantische und kontextbezogene Informationen nutzen, um diese Herausforderungen zu meistern. Für die binokulare Stereo Schätzung schlagen wir zuallererst vor zusammenhängende Bereiche mit objektklassen-spezifischen Disparitäts Vorschlägen zu regularisieren, die wir mit inversen Grafik Techniken auf der Grundlage einer spärlichen Disparitätsschätzung und semantischen Segmentierung des Bildes erhalten. Die Disparitäts Vorschläge kodieren die Tatsache, dass die Gegenstände bestimmter Kategorien nicht willkürlich geformt sind, sondern typischerweise regelmäßige Strukturen aufweisen. Wir integrieren sie für die komplexe Objektklasse 'Auto' in Form eines nicht-lokalen Regularisierungsterm in ein Superpixel-basiertes grafisches Modell und zeigen die Vorteile vor allem in reflektierenden Bereichen. Zweitens nutzen wir für die 3D-Rekonstruktion die Tatsache, dass mit der Größe der rekonstruierten Fläche auch die Wahrscheinlichkeit steigt, Objekte von ähnlicher Art und Form in der Szene zu enthalten. Dies gilt besonders für Szenen im Freien, in denen Gebäude und Fahrzeuge oft vorkommen, die unter fehlender Textur oder Reflexionen leiden aber ähnlichkeit in der Form aufweisen. Wir nutzen diese ähnlichkeiten zur Lokalisierung von Objekten mit Detektoren und zur gemeinsamen Rekonstruktion indem ein volumetrisches Modell ihrer Form erlernt wird. Dies ermöglicht auftretendes Rauschen zu reduzieren, während fehlende Flächen vervollständigt werden, da Objekte ähnlicher Form von allen Beobachtungen der jeweiligen Kategorie profitieren. Die Evaluierung auf einem neuen, herausfordernden vorstädtischen Datensatz in Anbetracht von LIDAR-Entfernungsdaten zeigt die Vorteile der Modellierung von strukturellen Abhängigkeiten zwischen Objekten. Zuletzt, motiviert durch den Erfolg von Deep Learning Techniken bei der Mustererkennung, präsentieren wir eine Methode zum Erlernen von kontextbezogenen Merkmalen zur Lösung des optischen Flusses mittels diskreter Optimierung. Dazu stellen wir eine effiziente Methode vor um zusätzlich zu einem Lokalen Netzwerk ein Kontext-Netzwerk zu erlernen, das mit Hilfe von erweiterter Faltung auf Patches ein großes rezeptives Feld besitzt. Für das Feature Matching vergleichen wir mit schnellen GPU-Matrixmultiplikation jedes Pixel im Referenzbild mit jedem Pixel im Zielbild. Das aus dem Netzwerk resultierende Matching Kostenvolumen bildet den Datenterm für eine diskrete MAP Inferenz in einem paarweisen Markov Random Field. Eine umfangreiche Evaluierung zeigt die Relevanz des Kontextes für das Feature Matching

Publikationsserver der Universität Tübingen

Figure-Ground Segmentation Using Multiple Cues

Author: H Ögskolan
H Ögskolan
Kungl Tekniska Högskolan
Peter Nordlund
Peter Nordlund
Publication venue
Publication date: 01/01/1998
Field of study

The theme of this thesis is figure-ground segmentation. We address the problem in the context of a visual observer, e.g. a mobile robot, moving around in the world and capable of shifting its gaze to and fixating on objects in its environment. We are only considering bottom-up processes, how the system can detect and segment out objects because they stand out from their immediate background in some feature dimension. Since that implies that the distinguishing cues can not be predicted, but depend on the scene, the system must rely on multiple cues. The integrated use of multiple cues forms a major theme of the thesis. In particular, we note that an observer in our real environment has access to 3-D cues. Inspired by psychophysical findings about human vision we try to demonstrate their effectiveness in figure-ground segmentation and grouping also in machine vision

CiteSeerX

Perceptual modelling for 2D and 3D

Author: Bosc Emilie
Gautier Josselin
Le Meur Olivier
Ricordel Vincent
Wang Junle
Publication venue: HAL CCSD
Publication date: 01/11/2010
Field of study

Livrable D1.1 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D1.1 du projet

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Object-driven block-based algorithm for the compression of stereo image pairs

Author: Eran Edirisinghe (1257828)
Helmut Bez (1257312)
Jianmin Jiang (488872)
Publication venue
Publication date: 01/01/1999
Field of study

In this paper, we propose a novel object driven, block based algorithm for the compression of stereo image pairs. The algorithm effectively combines the simplicity and adaptability of the existing block based stereo image compression techniques [1-6] with an edge/contour based object extraction technique to determine appropriate compression strategy for various areas of the right image. Extensive experiments carried out support that significant improvements of up to 20% in compression ratio can be achieved by the proposed algorithm, compared with the existing stereo image compression techniques. Yet the reconstructed image quality is maintained at an equivalent level in terms of PSNR values. In terms of visual quality, the right image reconstructed by the proposed algorithm does not incur any noticeable effect compared with the outputs of the best algorithms. The proposed algorithm performs object extraction and matching between the reconstructed left frame and the original right frame to identify those objects that match but are displaced by varying amounts due to binocular parallax. Different coding strategies are then applied separately to internal areas and the bounding areas for each identified object. Based on the mean squared matching error of the internal blocks and a selected threshold, a decision is made whether or not to encode the predictive errors inside these objects. The output bit stream includes entropy coding of object disparity, block disparity and possibly some errors, which fail to meet the threshold requirement in the proposed algorith

Loughborough University Institutional Repository