69 research outputs found
Visually Augmented Navigation in an Unstructured Environment Using a Delayed State History
This paper describes a framework for sensor fusion
of navigation data with camera-based 5 DOF relative pose
measurements for 6 DOF vehicle motion in an unstructured
3D underwater environment. The fundamental goal of this work
is to concurrently sstimate online current vehicle position and
its past trajectory. This goal is framed within the context of
improving mobile robot navigation to support sub-sea science
and exploration. Vehicle trajectory is represented by a history
of poses in an augmented state Kalman filter. Camera spatial
constraints from overlapping imagery provide partial observation
of these posa and are used to enforce consislency and provide a
mechanism for loop-closure. The multi-sensor camera+navigation
framework is shown to have compelling advantages over a
camera-only based approach by 1) improving the robustness of
pairwise image registration, 2) setting the free gauge scale, and
3) allowing for a unconnected camera graph topology. Results
are shown for a real world data set collected by an autonomous
underwater vehicle in an unstructured undersea environment.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/86055/1/reustice-32.pd
Contextual cropping and scaling of TV productions
This is the author's accepted manuscript. The final publication is available at Springer via http://dx.doi.org/10.1007/s11042-011-0804-3. Copyright @ Springer Science+Business Media, LLC 2011.In this paper, an application is presented which automatically adapts SDTV (Standard Definition Television) sports productions to smaller displays through intelligent cropping and scaling. It crops regions of interest of sports productions based on a smart combination of production metadata and systematic video analysis methods. This approach allows a context-based composition of cropped images. It provides a differentiation between the original SD version of the production and the processed one adapted to the requirements for mobile TV. The system has been comprehensively evaluated by comparing the outcome of the proposed method with manually and statically cropped versions, as well as with non-cropped versions. Envisaged is the integration of the tool in post-production and live workflows
THREE-DIMENSIONAL VISION FOR STRUCTURE AND MOTION ESTIMATION
1997/1998Questa tesi, intitolata Visione Tridimensionale per la stima di Struttura e Moto, tratta di tecniche di Visione Artificiale per la stima delle proprietà geometriche del mondo tridimensionale a partire da immagini numeriche. Queste proprietà sono essenziali per il riconoscimento e la classificazione di oggetti, la navigazione di veicoli mobili autonomi, il reverse engineering e la sintesi di ambienti virtuali. In particolare, saranno descritti i moduli coinvolti nel calcolo della struttura della scena a partire dalle immagini, e verranno presentati contributi originali nei seguenti campi. Rettificazione di immagini steroscopiche. Viene presentato un nuovo algoritmo per la rettificazione, il quale trasforma una coppia di immagini stereoscopiche in maniera che punti corrispondenti giacciano su linee orizzontali con lo stesso indice. Prove sperimentali dimostrano il corretto comportamento del metodo, come pure la trascurabile perdita di accuratezza nella ricostruzione tridimensionale quando questa sia ottenuta direttamente dalle immagini rettificate. Calcolo delle corrispondenze in immagini stereoscopiche. Viene analizzato il problema della stereovisione e viene presentato un un nuovo ed efficiente algoritmo per l'identificazione di coppie di punti corrispondenti, capace di calcolare in modo robusto la disparità stereoscopica anche in presenza di occlusioni. L'algoritmo, chiamato SMW, usa uno schema multi-finestra adattativo assieme al controllo di coerenza destra-sinistra per calcolare la disparità e l'incertezza associata. Gli esperimenti condotti con immagini sintetiche e reali mostrano che SMW sortisce un miglioramento in accuratezza ed efficienza rispetto a metodi simili Inseguimento di punti salienti. L'inseguitore di punti salienti di Shi-Tomasi- Kanade viene migliorato introducendo uno schema automatico per lo scarto di punti spuri basato sulla diagnostica robusta dei campioni periferici ( outliers ). Gli esperimenti con immagini sintetiche e reali confermano il miglioramento rispetto al metodo originale, sia qualitativamente che quantitativamente. Ricostruzione non calibrata. Viene presentata una rassegna ragionata dei metodi per la ricostruzione di un modello tridimensionale della scena, a partire da una telecamera che si muove liberamente e di cui non sono noti i parametri interni. Il contributo consiste nel fornire una visione critica e unificata delle più recenti tecniche. Una tale rassegna non esiste ancora in letterarura. Moto tridimensionale. Viene proposto un algoritmo robusto per registrate e calcolare le corrispondenze in due insiemi di punti tridimensionali nei quali vi sia un numero significativo di elementi mancanti. Il metodo, chiamato RICP, sfrutta la stima robusta con la Minima Mediana dei Quadrati per eliminare l'effetto dei campioni periferici. Il confronto sperimentale con una tecnica simile, ICP, mostra la superiore robustezza e affidabilità di RICP.This thesis addresses computer vision techniques estimating geometrie properties of the 3-D world /rom digital images. Such properties are essential for object recognition and classification, mobile robots navigation, reverse engineering and synthesis of virtual environments. In particular, this thesis describes the modules involved in the computation of the structure of a scene given some images, and offers original contributions in the following fields. Stereo pairs rectification. A novel rectification algorithm is presented, which transform a stereo pair in such a way that corresponding points in the two images lie on horizontal lines with the same index. Experimental tests prove the correct behavior of the method, as well as the negligible decrease oLthe accuracy of 3-D reconstruction if performed from the rectified images directly. Stereo matching. The problem of computational stereopsis is analyzed, and a new, efficient stereo matching algorithm addressing robust disparity estimation in the presence of occlusions is presented. The algorithm, called SMW, is an adaptive, multi-window scheme using left-right consistency to compute disparity and its associated uncertainty. Experiments with both synthetic and real stereo pairs show how SMW improves on closely related techniques for both accuracy and efficiency. Features tracking. The Shi-Tomasi-Kanade feature tracker is improved by introducing an automatic scheme for rejecting spurious features, based on robust outlier diagnostics. Experiments with real and synthetic images confirm the improvement over the original tracker, both qualitatively and quantitatively. 111 Uncalibrated vision. A review on techniques for computing a three-dimensional model of a scene from a single moving camera, with unconstrained motion and unknown parameters is presented. The contribution is to give a critical, unified view of some of the most promising techniques. Such review does not yet exist in the literature. 3-D motion. A robust algorithm for registering and finding correspondences in two sets of 3-D points with significant percentages of missing data is proposed. The method, called RICP, exploits LMedS robust estimation to withstand the effect of outliers. Experimental comparison with a closely related technique, ICP, shows RICP's superior robustness and reliability.XI Ciclo1968Versione digitalizzata della tesi di dottorato cartacea
Recommended from our members
Intelligent image cropping and scaling
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 2011.Nowadays, there exist a huge number of end devices with different screen properties for
watching television content, which is either broadcasted or transmitted over the internet.
To allow best viewing conditions on each of these devices, different image formats have
to be provided by the broadcaster. Producing content for every single format is,
however, not applicable by the broadcaster as it is much too laborious and costly.
The most obvious solution for providing multiple image formats is to produce one high resolution format and prepare formats of lower resolution from this. One possibility to do this is to simply scale video images to the resolution of the target image format. Two significant drawbacks are the loss of image details through ownscaling and possibly unused image areas due to letter- or pillarboxes. A preferable solution is to find the contextual most important region in the high-resolution format at first and crop this area with an aspect ratio of the target image format afterwards. On the other hand, defining
the contextual most important region manually is very time consuming. Trying to apply that to live productions would be nearly impossible. Therefore, some approaches exist that automatically define cropping areas. To do so, they extract visual features, like moving reas in a video, and define regions of interest
(ROIs) based on those. ROIs are finally used to define an enclosing cropping area. The
extraction of features is done without any knowledge about the type of content. Hence,
these approaches are not able to distinguish between features that might be important in
a given context and those that are not.
The work presented within this thesis tackles the problem of extracting visual features based on prior knowledge about the content. Such knowledge is fed into the system in form of metadata that is available from TV production environments. Based on the
extracted features, ROIs are then defined and filtered dependent on the analysed
content. As proof-of-concept, this application finally adapts SDTV (Standard Definition Television) sports productions automatically to image formats with lower resolution through intelligent cropping and scaling. If no content information is available, the system can still be applied on any type of content through a default mode. The presented approach is based on the principle of a plug-in system. Each plug-in
represents a method for analysing video content information, either on a low level by
extracting image features or on a higher level by processing extracted ROIs. The
combination of plug-ins is determined by the incoming descriptive production metadata
and hence can be adapted to each type of sport individually. The application has been comprehensively evaluated by comparing the results of the system against alternative cropping methods. This evaluation utilised videos which were manually cropped by a professional video editor, statically cropped videos and simply scaled, non-cropped videos. In addition to and apart from purely subjective evaluations,
the gaze positions of subjects watching sports videos have been measured and compared
to the regions of interest positions extracted by the system
Large-area visually augmented navigation for autonomous underwater vehicles
Submitted to the Joint Program in Applied Ocean Science & Engineering
in partial fulfillment of the requirements for the degree of Doctor of Philosophy
at the Massachusetts Institute of Technology
and the Woods Hole Oceanographic Institution
June 2005This thesis describes a vision-based, large-area, simultaneous localization and mapping (SLAM) algorithm that respects the low-overlap imagery constraints typical of autonomous underwater vehicles (AUVs) while exploiting the inertial sensor information that is routinely available on such platforms. We adopt a systems-level approach exploiting the complementary aspects of inertial sensing and visual perception from a calibrated pose-instrumented platform. This systems-level strategy yields a robust solution to underwater imaging that
overcomes many of the unique challenges of a marine environment (e.g., unstructured terrain, low-overlap imagery, moving light source). Our large-area SLAM algorithm recursively incorporates relative-pose constraints using a view-based representation that exploits exact sparsity in the Gaussian canonical form. This sparsity allows for efficient O(n) update complexity in the number of images composing the view-based map by utilizing recent multilevel relaxation techniques. We show that our algorithmic formulation is inherently sparse unlike other feature-based canonical SLAM algorithms, which impose sparseness via pruning approximations. In particular, we investigate
the sparsification methodology employed by sparse extended information filters (SEIFs)
and offer new insight as to why, and how, its approximation can lead to inconsistencies in
the estimated state errors. Lastly, we present a novel algorithm for efficiently extracting consistent marginal covariances useful for data association from the information matrix. In summary, this thesis advances the current state-of-the-art in underwater visual navigation by demonstrating end-to-end automatic processing of the largest visually navigated dataset to date using data collected from a survey of the RMS Titanic (path length over 3 km and 3100 m2 of mapped area). This accomplishment embodies the summed contributions of this thesis to several current SLAM research issues including scalability, 6 degree of
freedom motion, unstructured environments, and visual perception.This work was funded in part by the CenSSIS ERC of the National Science Foundation
under grant EEC-9986821, in part by the Woods Hole Oceanographic Institution through a
grant from the Penzance Foundation, and in part by a NDSEG Fellowship awarded through
the Department of Defense
Theorems and algorithms for multiple view geometry with applications to electron tomography
The thesis considers both theory and algorithms for geometric computer vision. The framework of the work is built around the application of autonomous transmission electron microscope image registration.
The theoretical part of the thesis first develops a consistent robust estimator that is evaluated in estimating two view geometry with both affine and projective camera models. The uncertainty of the fundamental matrix is similarly estimated robustly, and the previous observation whether the covariance matrix of the fundamental matrix contains disparity information of the scene is explained and its utilization in matching is discussed. For point tracking purposes, a reliable wavelet-based matching technique and two EM algorithms for the maximum likelihood affine reconstruction under missing data are proposed. The thesis additionally discusses identification of degeneracy as well as affine bundle adjustment.
The application part of the thesis considers transmission electron microscope image registration, first with fiducial gold markers and thereafter without markers. Both methods utilize the techniques proposed in the theoretical part of the thesis and, in addition, a graph matching method is proposed for matching gold markers. Conversely, alignment without markers is disposed by tracking interest points of the intensity surface of the images. At the present level of development, the former method is more accurate but the latter is appropriate for situations where fiducial markers cannot be used.
Perhaps the most significant result of the thesis is the proposed robust estimator because of consistence proof and its many application areas, which are not limited to the computer vision field. The other algorithms could be found useful in multiple view applications in computer vision that have to deal with uncertainty, matching, tracking, and reconstruction. From the viewpoint of image registration, the thesis further achieved its aims since two accurate image alignment methods are suggested for obtaining the most exact reconstructions in electron tomography.reviewe
- …