168 research outputs found

    A Fast and Robust Extrinsic Calibration for RGB-D Camera Networks

    Get PDF
    From object tracking to 3D reconstruction, RGB-Depth (RGB-D) camera networks play an increasingly important role in many vision and graphics applications. Practical applications often use sparsely-placed cameras to maximize visibility, while using as few cameras as possible to minimize cost. In general, it is challenging to calibrate sparse camera networks due to the lack of shared scene features across different camera views. In this paper, we propose a novel algorithm that can accurately and rapidly calibrate the geometric relationships across an arbitrary number of RGB-D cameras on a network. Our work has a number of novel features. First, to cope with the wide separation between different cameras, we establish view correspondences by using a spherical calibration object. We show that this approach outperforms other techniques based on planar calibration objects. Second, instead of modeling camera extrinsic calibration using rigid transformation, which is optimal only for pinhole cameras, we systematically test different view transformation functions including rigid transformation, polynomial transformation and manifold regression to determine the most robust mapping that generalizes well to unseen data. Third, we reformulate the celebrated bundle adjustment procedure to minimize the global 3D reprojection error so as to fine-tune the initial estimates. Finally, our scalable client-server architecture is computationally efficient: the calibration of a five-camera system, including data capture, can be done in minutes using only commodity PCs. Our proposed framework is compared with other state-of-the-arts systems using both quantitative measurements and visual alignment results of the merged point clouds

    Cross-layer Optimized Wireless Video Surveillance

    Get PDF
    A wireless video surveillance system contains three major components, the video capture and preprocessing, the video compression and transmission over wireless sensor networks (WSNs), and the video analysis at the receiving end. The coordination of different components is important for improving the end-to-end video quality, especially under the communication resource constraint. Cross-layer control proves to be an efficient measure for optimal system configuration. In this dissertation, we address the problem of implementing cross-layer optimization in the wireless video surveillance system. The thesis work is based on three research projects. In the first project, a single PTU (pan-tilt-unit) camera is used for video object tracking. The problem studied is how to improve the quality of the received video by jointly considering the coding and transmission process. The cross-layer controller determines the optimal coding and transmission parameters, according to the dynamic channel condition and the transmission delay. Multiple error concealment strategies are developed utilizing the special property of the PTU camera motion. In the second project, the binocular PTU camera is adopted for video object tracking. The presented work studied the fast disparity estimation algorithm and the 3D video transcoding over the WSN for real-time applications. The disparity/depth information is estimated in a coarse-to-fine manner using both local and global methods. The transcoding is coordinated by the cross-layer controller based on the channel condition and the data rate constraint, in order to achieve the best view synthesis quality. The third project is applied for multi-camera motion capture in remote healthcare monitoring. The challenge is the resource allocation for multiple video sequences. The presented cross-layer design incorporates the delay sensitive, content-aware video coding and transmission, and the adaptive video coding and transmission to ensure the optimal and balanced quality for the multi-view videos. In these projects, interdisciplinary study is conducted to synergize the surveillance system under the cross-layer optimization framework. Experimental results demonstrate the efficiency of the proposed schemes. The challenges of cross-layer design in existing wireless video surveillance systems are also analyzed to enlighten the future work. Adviser: Song C

    Cross-layer Optimized Wireless Video Surveillance

    Get PDF
    A wireless video surveillance system contains three major components, the video capture and preprocessing, the video compression and transmission over wireless sensor networks (WSNs), and the video analysis at the receiving end. The coordination of different components is important for improving the end-to-end video quality, especially under the communication resource constraint. Cross-layer control proves to be an efficient measure for optimal system configuration. In this dissertation, we address the problem of implementing cross-layer optimization in the wireless video surveillance system. The thesis work is based on three research projects. In the first project, a single PTU (pan-tilt-unit) camera is used for video object tracking. The problem studied is how to improve the quality of the received video by jointly considering the coding and transmission process. The cross-layer controller determines the optimal coding and transmission parameters, according to the dynamic channel condition and the transmission delay. Multiple error concealment strategies are developed utilizing the special property of the PTU camera motion. In the second project, the binocular PTU camera is adopted for video object tracking. The presented work studied the fast disparity estimation algorithm and the 3D video transcoding over the WSN for real-time applications. The disparity/depth information is estimated in a coarse-to-fine manner using both local and global methods. The transcoding is coordinated by the cross-layer controller based on the channel condition and the data rate constraint, in order to achieve the best view synthesis quality. The third project is applied for multi-camera motion capture in remote healthcare monitoring. The challenge is the resource allocation for multiple video sequences. The presented cross-layer design incorporates the delay sensitive, content-aware video coding and transmission, and the adaptive video coding and transmission to ensure the optimal and balanced quality for the multi-view videos. In these projects, interdisciplinary study is conducted to synergize the surveillance system under the cross-layer optimization framework. Experimental results demonstrate the efficiency of the proposed schemes. The challenges of cross-layer design in existing wireless video surveillance systems are also analyzed to enlighten the future work. Adviser: Song C

    Mapping and Merging Using Sound and Vision : Automatic Calibration and Map Fusion with Statistical Deformations

    Get PDF
    Over the last couple of years both cameras, audio and radio sensors have become cheaper and more common in our everyday lives. Such sensors can be used to create maps of where the sensors are positioned and the appearance of the surroundings. For sound and radio, the process of estimating the sender and receiver positions from time of arrival (TOA) or time-difference of arrival (TDOA) measurements is referred to as automatic calibration. The corresponding process for images is to estimate the camera positions as well as the positions of the objects captured in the images. This is called structure from motion (SfM) or visual simultaneous localisation and mapping (SLAM). In this thesis we present studies on how to create such maps, divided into three parts: to find accurate measurements; robust mapping; and merging of maps.The first part is treated in Paper I and involves finding precise – on a subsample level – TDOA measurements. These types of subsample refinements give a high precision, but are sensitive to noise. We present an explicit expression for the variance of the TDOA estimate and study the impact that noise in the signals has. Exact measurements is an important foundation for creating accurate maps. The second part of this thesis includes Papers II–V and covers the topic of robust self-calibration using one-dimensional signals, such as sound or radio. We estimate both sender and receiver positions using TOA and TDOA measurements. The estimation process is divided in two parts, where the first is specific for TOA or TDOA and involves solving a relaxed version of the problem. The second step is common for different types of problems and involves an upgrade from the relaxed solution to the sought parameters. In this thesis we present numerically stable minimal solvers for both these steps for some different setups with senders and receivers. We also suggest frameworks for how to use these solvers together with RANSAC to achieve systems that are robust to outliers, noise and missing data. Additionally, in the last paper we focus on extending self-calibration results, especially for the sound source path, which often cannot be fully reconstructed immediately. The third part of the thesis, Papers VI–VIII, is concerned with the merging of already estimated maps. We mainly focus on maps created from image data, but the methods are applicable to sparse 3D maps coming from different sensor modalities. Merging of maps can be advantageous if there are several map representations of the same environment, or if there is a need for adding new information to an already existing map. We suggest a compact map representation with a small memory footprint, which we then use to fuse maps efficiently. We suggest one method for fusion of maps that are pre-aligned, and one where we additionally estimate the coordinate system. The merging utilises a compact approximation of the residuals and allows for deformations in the original maps. Furthermore, we present minimal solvers for 3D point matching with statistical deformations – which increases the number of inliers when the original maps contain errors

    Computational Multimedia for Video Self Modeling

    Get PDF
    Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of oneself. This is the idea behind the psychological theory of self-efficacy - you can learn or model to perform certain tasks because you see yourself doing it, which provides the most ideal form of behavior modeling. The effectiveness of VSM has been demonstrated for many different types of disabilities and behavioral problems ranging from stuttering, inappropriate social behaviors, autism, selective mutism to sports training. However, there is an inherent difficulty associated with the production of VSM material. Prolonged and persistent video recording is required to capture the rare, if not existed at all, snippets that can be used to string together in forming novel video sequences of the target skill. To solve this problem, in this dissertation, we use computational multimedia techniques to facilitate the creation of synthetic visual content for self-modeling that can be used by a learner and his/her therapist with a minimum amount of training data. There are three major technical contributions in my research. First, I developed an Adaptive Video Re-sampling algorithm to synthesize realistic lip-synchronized video with minimal motion jitter. Second, to denoise and complete the depth map captured by structure-light sensing systems, I introduced a layer based probabilistic model to account for various types of uncertainties in the depth measurement. Third, I developed a simple and robust bundle-adjustment based framework for calibrating a network of multiple wide baseline RGB and depth cameras

    Visual Perception System for Aerial Manipulation: Methods and Implementations

    Get PDF
    La tecnología se evoluciona a gran velocidad y los sistemas autónomos están empezado a ser una realidad. Las compañías están demandando, cada vez más, soluciones robotizadas para mejorar la eficiencia de sus operaciones. Este también es el caso de los robots aéreos. Su capacidad única de moverse libremente por el aire los hace excelentes para muchas tareas que son tediosas o incluso peligrosas para operadores humanos. Hoy en día, la gran cantidad de sensores y drones comerciales los hace soluciones muy tentadoras. Sin embargo, todavía se requieren grandes esfuerzos de obra humana para customizarlos para cada tarea debido a la gran cantidad de posibles entornos, robots y misiones. Los investigadores diseñan diferentes algoritmos de visión, hardware y sensores para afrontar las diferentes tareas. Actualmente, el campo de la robótica manipuladora aérea está emergiendo con el objetivo de extender la cantidad de aplicaciones que estos pueden realizar. Estas pueden ser entre otras, inspección, mantenimiento o incluso operar válvulas u otras máquinas. Esta tesis presenta un sistema de manipulación aérea y un conjunto de algoritmos de percepción para la automatización de las tareas de manipulación aérea. El diseño completo del sistema es presentado y una serie de frameworks son presentados para facilitar el desarrollo de este tipo de operaciones. En primer lugar, la investigación relacionada con el análisis de objetos para manipulación y planificación de agarre considerando diferentes modelos de objetos es presentado. Dependiendo de estos modelos de objeto, se muestran diferentes algoritmos actuales de análisis de agarre y algoritmos de planificación para manipuladores simples y manipuladores duales. En Segundo lugar, el desarrollo de algoritmos de percepción para detección de objetos y estimación de su posicione es presentado. Estos permiten al sistema identificar objetos de cualquier tipo en cualquier escena para localizarlos para efectuar las tareas de manipulación. Estos algoritmos calculan la información necesaria para los análisis de manipulación descritos anteriormente. En tercer lugar. Se presentan algoritmos de visión para localizar el robot en el entorno al mismo tiempo que se elabora un mapa local, el cual es beneficioso para las tareas de manipulación. Estos mapas se enriquecen con información semántica obtenida en los algoritmos de detección. Por último, se presenta el desarrollo del hardware relacionado con la plataforma aérea, el cual incluye unos manipuladores de bajo peso y la invención de una herramienta para realizar tareas de contacto con superficies rígidas que sirve de estimador de la posición del robot. Todas las técnicas presentadas en esta tesis han sido validadas con extensiva experimentación en plataformas reales.Technology is growing fast, and autonomous systems are becoming a reality. Companies are increasingly demanding robotized solutions to improve the efficiency of their operations. It is also the case for aerial robots. Their unique capability of moving freely in the space makes them suitable for many tasks that are tedious and even dangerous for human operators. Nowadays, the vast amount of sensors and commercial drones makes them highly appealing. However, it is still required a strong manual effort to customize the existing solutions to each particular task due to the number of possible environments, robot designs and missions. Different vision algorithms, hardware devices and sensor setups are usually designed by researchers to tackle specific tasks. Currently, aerial manipulation is being intensively studied to allow aerial robots to extend the number of applications. These could be inspection, maintenance, or even operating valves or other machines. This thesis presents an aerial manipulation system and a set of perception algorithms for the automation aerial manipulation tasks. The complete design of the system is presented and modular frameworks are shown to facilitate the development of these kind of operations. At first, the research about object analysis for manipulation and grasp planning considering different object models is presented. Depend on the model of the objects, different state of art grasping analysis are reviewed and planning algorithms for both single and dual manipulators are shown. Secondly, the development of perception algorithms for object detection and pose estimation are presented. They allows the system to identify many kind of objects in any scene and locate them to perform manipulation tasks. These algorithms produce the necessary information for the manipulation analysis described in the previous paragraph. Thirdly, it is presented how to use vision to localize the robot in the environment. At the same time, local maps are created which can be beneficial for the manipulation tasks. These maps are are enhanced with semantic information from the perception algorithm mentioned above. At last, the thesis presents the development of the hardware of the aerial platform which includes the lightweight manipulators and the invention of a novel tool that allows the aerial robot to operate in contact with static objects. All the techniques presented in this thesis have been validated throughout extensive experimentation with real aerial robotic platforms

    Optical Camera Communications: Principles, Modulations, Potential and Challenges

    Get PDF
    Optical wireless communications (OWC) are emerging as cost-effective and practical solutions to the congested radio frequency-based wireless technologies. As part of OWC, optical camera communications (OCC) have become very attractive, considering recent developments in cameras and the use of fitted cameras in smart devices. OCC together with visible light communications (VLC) is considered within the framework of the IEEE 802.15.7m standardization. OCCs based on both organic and inorganic light sources as well as cameras are being considered for low-rate transmissions and localization in indoor as well as outdoor short-range applications and within the framework of the IEEE 802.15.7m standardization together with VLC. This paper introduces the underlying principles of OCC and gives a comprehensive overview of this emerging technology with recent standardization activities in OCC. It also outlines the key technical issues such as mobility, coverage, interference, performance enhancement, etc. Future research directions and open issues are also presented

    Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs

    Full text link
    Humans are able to form a complex mental model of the environment they move in. This mental model captures geometric and semantic aspects of the scene, describes the environment at multiple levels of abstractions (e.g., objects, rooms, buildings), includes static and dynamic entities and their relations (e.g., a person is in a room at a given time). In contrast, current robots' internal representations still provide a partial and fragmented understanding of the environment, either in the form of a sparse or dense set of geometric primitives (e.g., points, lines, planes, voxels) or as a collection of objects. This paper attempts to reduce the gap between robot and human perception by introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that seamlessly captures metric and semantic aspects of a dynamic environment. A DSG is a layered graph where nodes represent spatial concepts at different levels of abstraction, and edges represent spatio-temporal relations among nodes. Our second contribution is Kimera, the first fully automatic method to build a DSG from visual-inertial data. Kimera includes state-of-the-art techniques for visual-inertial SLAM, metric-semantic 3D reconstruction, object localization, human pose and shape estimation, and scene parsing. Our third contribution is a comprehensive evaluation of Kimera in real-life datasets and photo-realistic simulations, including a newly released dataset, uHumans2, which simulates a collection of crowded indoor and outdoor scenes. Our evaluation shows that Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a complex indoor environment with tens of objects and humans in minutes. Our final contribution shows how to use a DSG for real-time hierarchical semantic path-planning. The core modules in Kimera are open-source.Comment: 34 pages, 25 figures, 9 tables. arXiv admin note: text overlap with arXiv:2002.0628

    Computational Imaging Approach to Recovery of Target Coordinates Using Orbital Sensor Data

    Get PDF
    This dissertation addresses the components necessary for simulation of an image-based recovery of the position of a target using orbital image sensors. Each component is considered in detail, focusing on the effect that design choices and system parameters have on the accuracy of the position estimate. Changes in sensor resolution, varying amounts of blur, differences in image noise level, selection of algorithms used for each component, and lag introduced by excessive processing time all contribute to the accuracy of the result regarding recovery of target coordinates using orbital sensor data. Using physical targets and sensors in this scenario would be cost-prohibitive in the exploratory setting posed, therefore a simulated target path is generated using Bezier curves which approximate representative paths followed by the targets of interest. Orbital trajectories for the sensors are designed on an elliptical model representative of the motion of physical orbital sensors. Images from each sensor are simulated based on the position and orientation of the sensor, the position of the target, and the imaging parameters selected for the experiment (resolution, noise level, blur level, etc.). Post-processing of the simulated imagery seeks to reduce noise and blur and increase resolution. The only information available for calculating the target position by a fully implemented system are the sensor position and orientation vectors and the images from each sensor. From these data we develop a reliable method of recovering the target position and analyze the impact on near-realtime processing. We also discuss the influence of adjustments to system components on overall capabilities and address the potential system size, weight, and power requirements from realistic implementation approaches
    • …
    corecore