740 research outputs found
The flow of baseline estimation using a single omnidirectional camera
Baseline is a distance between two cameras, but we cannot get information from a single camera. Baseline is one of the important parameters to find the depth of objects in stereo image triangulation. The flow of baseline is produced by moving the camera in horizontal axis from its original location. Using baseline estimation, we can determined the depth of an object by using only an omnidirectional camera. This research focus on determining the flow of baseline before calculating the disparity map. To estimate the flow and to tracking the object, we use three and four points in the surface of an object from two different data (panoramic image) that were already chosen. By moving the camera horizontally, we get the tracks of them. The obtained tracks are visually similar. Each track represent the coordinate of each tracking point. Two of four tracks have a graphical representation similar to second order polynomial
Vision Sensors and Edge Detection
Vision Sensors and Edge Detection book reflects a selection of recent developments within the area of vision sensors and edge detection. There are two sections in this book. The first section presents vision sensors with applications to panoramic vision sensors, wireless vision sensors, and automated vision sensor inspection, and the second one shows image processing techniques, such as, image measurements, image transformations, filtering, and parallel computing
Omnidirectional Stereo Vision for Autonomous Vehicles
Environment perception with cameras is an important requirement for many applications for autonomous vehicles and robots. This work presents a stereoscopic omnidirectional camera system for autonomous vehicles which resolves the problem of a limited field of view and provides a 360° panoramic view of the environment. We present a new projection model for these cameras and show that the camera setup overcomes major drawbacks of traditional perspective cameras in many applications
Recommended from our members
A Small Animal Optical Tomographic Imaging System with Omni-Directional, Non-Contact, Angular-Resolved Fluorescence Measurement Capabilities
The overall goal of this thesis is to develop a new non-contact, whole-body, fluorescence molecular tomography system for small animal imaging. Over the past decade, small animal in vivo imaging has led to a better understanding of many human diseases and improved our ability to develop and test new drugs and medical compounds. Among various imaging modalities, optical imaging techniques have emerged as important tools. In particular, fluorescence and bioluminescence imaging systems have opened new ways for visualizing many molecular pathways inside living animals including gene expression and protein functions.
While substantial progress has been made in available prototype and commercial optical imaging systems, there still exist areas for further improvement in the outcome of existing instrumentations. Currently, most small animal optical imaging systems rely on 2D planar imaging that provides limited ability to accurately locate lesions deep inside an animal. Furthermore, most existing tomographic imaging systems use a diffusion model of light propagation, which is of limited accuracy. While more accurate models using the equation of radiative transfer have become available, they have not been widely applied to small animal imaging yet.
To overcome the limitations of existing optical small animal imaging systems, a novel imaging system that makes use of the latest hardware and software advances in the field was developed. At the heart of the system is a new double-conical-mirror-based imaging head that enables a single fixed position camera to capture multi-directional views simultaneously. Therefore, the imaging head provides 360-degree measurement data from an entire animal surface in one step. Another benefit provided by this design is the substantial reduction of multiple back-reflections between the animal and mirror surfaces. These back reflections are common in existing mirror-based imaging heads and tend to degrade the quality of raw measurement data. Furthermore, the conical-mirror design offers the capability to measure angular-resolved data from the animal surface.
To make full use of this capability, a novel equation of radiative transfer-based ray-transfer operator was introduced to map the spatial and angular information of emitted light on the animal surface to the captured image data. As a result, more data points are involved into the image reconstructions, which leads to a higher image resolution. The performance of the imaging system was evaluated through numerical simulations, experiments using a well-defined tissue phantom, and live-animal studies. Finally, the double reflection mirror scheme presented in this dissertation can be cost-effectively employed with all camera-based imaging systems. The shapes and sizes of mirrors can be varied to accommodate imaging of other objects such as larger animals or human body parts, such as the breast, head, or feet
REAL-TIME CAPTURE AND RENDERING OF PHYSICAL SCENE WITH AN EFFICIENTLY CALIBRATED RGB-D CAMERA NETWORK
From object tracking to 3D reconstruction, RGB-Depth (RGB-D) camera networks play an increasingly important role in many vision and graphics applications. With the recent explosive growth of Augmented Reality (AR) and Virtual Reality (VR) platforms, utilizing camera RGB-D camera networks to capture and render dynamic physical space can enhance immersive experiences for users. To maximize coverage and minimize costs, practical applications often use a small number of RGB-D cameras and sparsely place them around the environment for data capturing. While sparse color camera networks have been studied for decades, the problems of extrinsic calibration of and rendering with sparse RGB-D camera networks are less well understood. Extrinsic calibration is difficult because of inappropriate RGB-D camera models and lack of shared scene features. Due to the significant camera noise and sparse coverage of the scene, the quality of rendering 3D point clouds is much lower compared with synthetic models. Adding virtual objects whose rendering depend on the physical environment such as those with reflective surfaces further complicate the rendering pipeline.
In this dissertation, I propose novel solutions to tackle these challenges faced by RGB-D camera systems. First, I propose a novel extrinsic calibration algorithm that can accurately and rapidly calibrate the geometric relationships across an arbitrary number of RGB-D cameras on a network. Second, I propose a novel rendering pipeline that can capture and render, in real-time, dynamic scenes in the presence of arbitrary-shaped reflective virtual objects. Third, I have demonstrated a teleportation application that uses the proposed system to merge two geographically separated 3D captured scenes into the same reconstructed environment.
To provide a fast and robust calibration for a sparse RGB-D camera network, first, the correspondences between different camera views are established by using a spherical calibration object. We show that this approach outperforms other techniques based on planar calibration objects. Second, instead of modeling camera extrinsic using rigid transformation that is optimal only for pinhole cameras, different view transformation functions including rigid transformation, polynomial transformation, and manifold regression are systematically tested to determine the most robust mapping that generalizes well to unseen data. Third, the celebrated bundle adjustment procedure is reformulated to minimize the global 3D projection error so as to fine-tune the initial estimates. To achieve a realistic mirror rendering, a robust eye detector is used to identify the viewer\u27s 3D location and render the reflective scene accordingly. The limited field of view obtained from a single camera is overcome by our calibrated RGB-D camera network system that is scalable to capture an arbitrarily large environment. The rendering is accomplished by raytracing light rays from the viewpoint to the scene reflected by the virtual curved surface. To the best of our knowledge, the proposed system is the first to render reflective dynamic scenes from real 3D data in large environments. Our scalable client-server architecture is computationally efficient - the calibration of a camera network system, including data capture, can be done in minutes using only commodity PCs
3D panoramic imaging for virtual environment construction
The project is concerned with the development of algorithms for the creation of photo-realistic 3D virtual environments, overcoming problems in mosaicing, colour and lighting changes, correspondence search speed and correspondence errors due to lack of surface texture. A number of related new algorithms have been investigated for image stitching, content based colour correction and efficient 3D surface reconstruction. All of the investigations were undertaken by using multiple views from normal digital cameras, web cameras and a ”one-shot” panoramic system. In the process of 3D reconstruction a new interest points based mosaicing method, a new interest points based colour correction method, a new hybrid feature and area based correspondence constraint and a new structured light based 3D reconstruction method have been investigated. The major contributions and results can be summarised as follows: • A new interest point based image stitching method has been proposed and investigated. The robustness of interest points has been tested and evaluated. Interest points have been proved robust to changes in lighting, viewpoint, rotation and scale. • A new interest point based method for colour correction has been proposed and investigated. The results of linear and linear plus affine colour transforms have proved more accurate than traditional diagonal transforms in accurately matching colours in panoramic images. • A new structured light based method for correspondence point based 3D reconstruction has been proposed and investigated. The method has been proved to increase the accuracy of the correspondence search for areas with low texture. Correspondence speed has also been increased with a new hybrid feature and area based correspondence search constraint. • Based on the investigation, a software framework has been developed for image based 3D virtual environment construction. The GUI includes abilities for importing images, colour correction, mosaicing, 3D surface reconstruction, texture recovery and visualisation. • 11 research papers have been published.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Enhancing 3D Visual Odometry with Single-Camera Stereo Omnidirectional Systems
We explore low-cost solutions for efficiently improving the 3D pose estimation problem of a single camera moving in an unfamiliar environment. The visual odometry (VO) task -- as it is called when using computer vision to estimate egomotion -- is of particular interest to mobile robots as well as humans with visual impairments. The payload capacity of small robots like micro-aerial vehicles (drones) requires the use of portable perception equipment, which is constrained by size, weight, energy consumption, and processing power. Using a single camera as the passive sensor for the VO task satisfies these requirements, and it motivates the proposed solutions presented in this thesis.
To deliver the portability goal with a single off-the-shelf camera, we have taken two approaches: The first one, and the most extensively studied here, revolves around an unorthodox camera-mirrors configuration (catadioptrics) achieving a stereo omnidirectional system (SOS). The second approach relies on expanding the visual features from the scene into higher dimensionalities to track the pose of a conventional camera in a photogrammetric fashion. The first goal has many interdependent challenges, which we address as part of this thesis: SOS design, projection model, adequate calibration procedure, and application to VO. We show several practical advantages for the single-camera SOS due to its complete 360-degree stereo views, that other conventional 3D sensors lack due to their limited field of view. Since our omnidirectional stereo (omnistereo) views are captured by a single camera, a truly instantaneous pair of panoramic images is possible for 3D perception tasks. Finally, we address the VO problem as a direct multichannel tracking approach, which increases the pose estimation accuracy of the baseline method (i.e., using only grayscale or color information) under the photometric error minimization as the heart of the “direct” tracking algorithm. Currently, this solution has been tested on standard monocular cameras, but it could also be applied to an SOS.
We believe the challenges that we attempted to solve have not been considered previously with the level of detail needed for successfully performing VO with a single camera as the ultimate goal in both real-life and simulated scenes
On unifying sparsity and geometry for image-based 3D scene representation
Demand has emerged for next generation visual technologies that go beyond conventional 2D imaging. Such technologies should capture and communicate all perceptually relevant three-dimensional information about an environment to a distant observer, providing a satisfying, immersive experience. Camera networks offer a low cost solution to the acquisition of 3D visual information, by capturing multi-view images from different viewpoints. However, the camera's representation of the data is not ideal for common tasks such as data compression or 3D scene analysis, as it does not make the 3D scene geometry explicit. Image-based scene representations fundamentally require a multi-view image model that facilitates extraction of underlying geometrical relationships between the cameras and scene components. Developing new, efficient multi-view image models is thus one of the major challenges in image-based 3D scene representation methods. This dissertation focuses on defining and exploiting a new method for multi-view image representation, from which the 3D geometry information is easily extractable, and which is additionally highly compressible. The method is based on sparse image representation using an overcomplete dictionary of geometric features, where a single image is represented as a linear combination of few fundamental image structure features (edges for example). We construct the dictionary by applying a unitary operator to an analytic function, which introduces a composition of geometric transforms (translations, rotation and anisotropic scaling) to that function. The advantage of this approach is that the features across multiple views can be related with a single composition of transforms. We then establish a connection between image components and scene geometry by defining the transforms that satisfy the multi-view geometry constraint, and obtain a new geometric multi-view correlation model. We first address the construction of dictionaries for images acquired by omnidirectional cameras, which are particularly convenient for scene representation due to their wide field of view. Since most omnidirectional images can be uniquely mapped to spherical images, we form a dictionary by applying motions on the sphere, rotations, and anisotropic scaling to a function that lives on the sphere. We have used this dictionary and a sparse approximation algorithm, Matching Pursuit, for compression of omnidirectional images, and additionally for coding 3D objects represented as spherical signals. Both methods offer better rate-distortion performance than state of the art schemes at low bit rates. The novel multi-view representation method and the dictionary on the sphere are then exploited for the design of a distributed coding method for multi-view omnidirectional images. In a distributed scenario, cameras compress acquired images without communicating with each other. Using a reliable model of correlation between views, distributed coding can achieve higher compression ratios than independent compression of each image. However, the lack of a proper model has been an obstacle for distributed coding in camera networks for many years. We propose to use our geometric correlation model for distributed multi-view image coding with side information. The encoder employs a coset coding strategy, developed by dictionary partitioning based on atom shape similarity and multi-view geometry constraints. Our method results in significant rate savings compared to independent coding. An additional contribution of the proposed correlation model is that it gives information about the scene geometry, leading to a new camera pose estimation method using an extremely small amount of data from each camera. Finally, we develop a method for learning stereo visual dictionaries based on the new multi-view image model. Although dictionary learning for still images has received a lot of attention recently, dictionary learning for stereo images has been investigated only sparingly. Our method maximizes the likelihood that a set of natural stereo images is efficiently represented with selected stereo dictionaries, where the multi-view geometry constraint is included in the probabilistic modeling. Experimental results demonstrate that including the geometric constraints in learning leads to stereo dictionaries that give both better distributed stereo matching and approximation properties than randomly selected dictionaries. We show that learning dictionaries for optimal scene representation based on the novel correlation model improves the camera pose estimation and that it can be beneficial for distributed coding
- …