511 research outputs found

    Real-time Visual Flow Algorithms for Robotic Applications

    Get PDF
    Vision offers important sensor cues to modern robotic platforms. Applications such as control of aerial vehicles, visual servoing, simultaneous localization and mapping, navigation and more recently, learning, are examples where visual information is fundamental to accomplish tasks. However, the use of computer vision algorithms carries the computational cost of extracting useful information from the stream of raw pixel data. The most sophisticated algorithms use complex mathematical formulations leading typically to computationally expensive, and consequently, slow implementations. Even with modern computing resources, high-speed and high-resolution video feed can only be used for basic image processing operations. For a vision algorithm to be integrated on a robotic system, the output of the algorithm should be provided in real time, that is, at least at the same frequency as the control logic of the robot. With robotic vehicles becoming more dynamic and ubiquitous, this places higher requirements to the vision processing pipeline. This thesis addresses the problem of estimating dense visual flow information in real time. The contributions of this work are threefold. First, it introduces a new filtering algorithm for the estimation of dense optical flow at frame rates as fast as 800 Hz for 640x480 image resolution. The algorithm follows a update-prediction architecture to estimate dense optical flow fields incrementally over time. A fundamental component of the algorithm is the modeling of the spatio-temporal evolution of the optical flow field by means of partial differential equations. Numerical predictors can implement such PDEs to propagate current estimation of flow forward in time. Experimental validation of the algorithm is provided using high-speed ground truth image dataset as well as real-life video data at 300 Hz. The second contribution is a new type of visual flow named structure flow. Mathematically, structure flow is the three-dimensional scene flow scaled by the inverse depth at each pixel in the image. Intuitively, it is the complete velocity field associated with image motion, including both optical flow and scale-change or apparent divergence of the image. Analogously to optic flow, structure flow provides a robotic vehicle with perception of the motion of the environment as seen by the camera. However, structure flow encodes the full 3D image motion of the scene whereas optic flow only encodes the component on the image plane. An algorithm to estimate structure flow from image and depth measurements is proposed based on the same filtering idea used to estimate optical flow. The final contribution is the spherepix data structure for processing spherical images. This data structure is the numerical back-end used for the real-time implementation of the structure flow filter. It consists of a set of overlapping patches covering the surface of the sphere. Each individual patch approximately holds properties such as orthogonality and equidistance of points, thus allowing efficient implementations of low-level classical 2D convolution based image processing routines such as Gaussian filters and numerical derivatives. These algorithms are implemented on GPU hardware and can be integrated to future Robotic Embedded Vision systems to provide fast visual information to robotic vehicles

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

    Offshore stereo measurements of gravity waves

    Get PDF
    Stereo video techniques are effective for estimating the space-time wave dynamics over an area of the ocean. Indeed, a stereo camera view allows retrieval of both spatial and temporal data whose statistical content is richer than that of time series data retrieved from point wave probes. To prove this, we consider an application of the Wave Acquisition Stereo System (WASS) for the analysis of offshore video measurements of gravity waves in the Northern Adriatic Sea. In particular, we deployed WASS at the oceanographic platform Acqua Alta, off the Venice coast, Italy. Three experimental studies were performed, and the overlapping field of view of the acquired stereo images covered an area of approximately 1100 m2. Analysis of the WASS measurements show that the sea surface can be accurately estimated in space and time together, yielding associated directional spectra and wave statistics that agree well with theoretical models. From the observed wavenumber-frequency spectrum one can also predict the vertical profile of the current flow underneath the wave surface. Finally, future improvements of WASS and applications are discussed

    Structure from motion using omni-directional vision and certainty grids

    Get PDF
    This thesis describes a method to create local maps from an omni-directional vision system (ODVS) mounted on a mobile robot. Range finding is performed by a structure-from-motion method, which recovers the three-dimensional position of objects in the environment from omni-directional images. This leads to map-making, which is accomplished using certainty grids to fuse information from multiple readings into a two-dimensional world model. The system is demonstrated both on noise-free data from a custom-built simulator and on real data from an omni-directional vision system on-board a mobile robot. Finally, to account for the particular error characteristics of a real omni-directional vision sensor, a new sensor model for the certainty grid framework is also created and compared to the traditional sonar sensor model

    Modeling the environment with egocentric vision systems

    Get PDF
    Cada vez más sistemas autónomos, ya sean robots o sistemas de asistencia, están presentes en nuestro día a día. Este tipo de sistemas interactúan y se relacionan con su entorno y para ello necesitan un modelo de dicho entorno. En función de las tareas que deben realizar, la información o el detalle necesario del modelo varía. Desde detallados modelos 3D para sistemas de navegación autónomos, a modelos semánticos que incluyen información importante para el usuario como el tipo de área o qué objetos están presentes. La creación de estos modelos se realiza a través de las lecturas de los distintos sensores disponibles en el sistema. Actualmente, gracias a su pequeño tamaño, bajo precio y la gran información que son capaces de capturar, las cámaras son sensores incluidos en todos los sistemas autónomos. El objetivo de esta tesis es el desarrollar y estudiar nuevos métodos para la creación de modelos del entorno a distintos niveles semánticos y con distintos niveles de precisión. Dos puntos importantes caracterizan el trabajo desarrollado en esta tesis: - El uso de cámaras con punto de vista egocéntrico o en primera persona ya sea en un robot o en un sistema portado por el usuario (wearable). En este tipo de sistemas, las cámaras son solidarias al sistema móvil sobre el que van montadas. En los últimos años han aparecido muchos sistemas de visión wearables, utilizados para multitud de aplicaciones, desde ocio hasta asistencia de personas. - El uso de sistemas de visión omnidireccional, que se distinguen por su gran campo de visión, incluyendo mucha más información en cada imagen que las cámara convencionales. Sin embargo plantean nuevas dificultades debido a distorsiones y modelos de proyección más complejos. Esta tesis estudia distintos tipos de modelos del entorno: - Modelos métricos: el objetivo de estos modelos es crear representaciones detalladas del entorno en las que localizar con precisión el sistema autónomo. Ésta tesis se centra en la adaptación de estos modelos al uso de visión omnidireccional, lo que permite capturar más información en cada imagen y mejorar los resultados en la localización. - Modelos topológicos: estos modelos estructuran el entorno en nodos conectados por arcos. Esta representación tiene menos precisión que la métrica, sin embargo, presenta un nivel de abstracción mayor y puede modelar el entorno con más riqueza. %, por ejemplo incluyendo el tipo de área de cada nodo, la localización de objetos importantes o el tipo de conexión entre los distintos nodos. Esta tesis se centra en la creación de modelos topológicos con información adicional sobre el tipo de área de cada nodo y conexión (pasillo, habitación, puertas, escaleras...). - Modelos semánticos: este trabajo también contribuye en la creación de nuevos modelos semánticos, más enfocados a la creación de modelos para aplicaciones en las que el sistema interactúa o asiste a una persona. Este tipo de modelos representan el entorno a través de conceptos cercanos a los usados por las personas. En particular, esta tesis desarrolla técnicas para obtener y propagar información semántica del entorno en secuencias de imágen

    Real Time UAV Altitude, Attitude and Motion Estimation form Hybrid Stereovision

    Get PDF
    International audienceKnowledge of altitude, attitude and motion is essential for an Unmanned Aerial Vehicle during crit- ical maneuvers such as landing and take-off. In this paper we present a hybrid stereoscopic rig composed of a fisheye and a perspective camera for vision-based navigation. In contrast to classical stereoscopic systems based on feature matching, we propose methods which avoid matching between hybrid views. A plane-sweeping approach is proposed for estimating altitude and de- tecting the ground plane. Rotation and translation are then estimated by decoupling: the fisheye camera con- tributes to evaluating attitude, while the perspective camera contributes to estimating the scale of the trans- lation. The motion can be estimated robustly at the scale, thanks to the knowledge of the altitude. We propose a robust, real-time, accurate, exclusively vision-based approach with an embedded C++ implementation. Although this approach removes the need for any non-visual sensors, it can also be coupled with an Inertial Measurement Unit

    Locating moving objects in car-driving sequences

    Get PDF
    corecore