2,489 research outputs found
Use of Microsoft Kinect in a dual camera setup for action recognition applications
Conventional human action recognition methods use a single light camera to extract all the necessary information needed to perform the recognition. However, the use of a single light camera poses limitations which can not be addressed without a hardware change. In this thesis, we propose a novel approach to the multi camera setup. Our approach utilizes the skeletal pose estimation capabilities of the Microsoft Kinect camera, and uses this estimated pose on the image of the non-depth camera. The approach aims at improving performance of image analysis of multiple camera, which would not be as easy in a typical multiple camera setup. The depth information sharing between the camera is in the form of pose projection, which depends on location awareness between them, where the locations can be found using chessboard pattern calibration techniques. Due to the limitations of pattern calibration, we propose a novel calibration refinement approach to increase the detection distance, and simplify the long calibration process. The two tests performed demonstrate that the pose projection process performs with good accuracy with a successful calibration and good Kinect pose estimation, however not so with a failed one. Three tests were performed to determine the calibration performance. Distance calculations were prone to error with a mean accuracy of 96% under 60cm difference, and dropping drastically beyond that, and a stable orientation calculation with mean accuracy of 97%. Last test also proves that our new refinement approach improves the outcome of the projection significantly with a failed pattern calibration, and allows for almost double the camera difference detection of about 120cm. While the orientation mean calculation accuracy achieved similar results to pattern calibration, the distance was less so at around 92%, however, it did maintain a stable standard deviation, while the pattern calibration increased as distance increased
3D Scanning System for Automatic High-Resolution Plant Phenotyping
Thin leaves, fine stems, self-occlusion, non-rigid and slowly changing
structures make plants difficult for three-dimensional (3D) scanning and
reconstruction -- two critical steps in automated visual phenotyping. Many
current solutions such as laser scanning, structured light, and multiview
stereo can struggle to acquire usable 3D models because of limitations in
scanning resolution and calibration accuracy. In response, we have developed a
fast, low-cost, 3D scanning platform to image plants on a rotating stage with
two tilting DSLR cameras centred on the plant. This uses new methods of camera
calibration and background removal to achieve high-accuracy 3D reconstruction.
We assessed the system's accuracy using a 3D visual hull reconstruction
algorithm applied on 2 plastic models of dicotyledonous plants, 2 sorghum
plants and 2 wheat plants across different sets of tilt angles. Scan times
ranged from 3 minutes (to capture 72 images using 2 tilt angles), to 30 minutes
(to capture 360 images using 10 tilt angles). The leaf lengths, widths, areas
and perimeters of the plastic models were measured manually and compared to
measurements from the scanning system: results were within 3-4% of each other.
The 3D reconstructions obtained with the scanning system show excellent
geometric agreement with all six plant specimens, even plants with thin leaves
and fine stems.Comment: 8 papes, DICTA 201
Data Fusion of Objects Using Techniques Such as Laser Scanning, Structured Light and Photogrammetry for Cultural Heritage Applications
In this paper we present a semi-automatic 2D-3D local registration pipeline
capable of coloring 3D models obtained from 3D scanners by using uncalibrated
images. The proposed pipeline exploits the Structure from Motion (SfM)
technique in order to reconstruct a sparse representation of the 3D object and
obtain the camera parameters from image feature matches. We then coarsely
register the reconstructed 3D model to the scanned one through the Scale
Iterative Closest Point (SICP) algorithm. SICP provides the global scale,
rotation and translation parameters, using minimal manual user intervention. In
the final processing stage, a local registration refinement algorithm optimizes
the color projection of the aligned photos on the 3D object removing the
blurring/ghosting artefacts introduced due to small inaccuracies during the
registration. The proposed pipeline is capable of handling real world cases
with a range of characteristics from objects with low level geometric features
to complex ones
Pedestrian detection and tracking using stereo vision techniques
Automated pedestrian detection, counting and tracking has received significant attention from the computer vision community of late. Many of the person detection techniques described so far in the literature work well in controlled environments, such as laboratory settings with a small number of people. This allows various assumptions to be made that simplify this complex problem. The performance of these techniques, however, tends to deteriorate when presented with unconstrained environments where pedestrian appearances, numbers, orientations, movements, occlusions and lighting conditions violate these convenient assumptions. Recently, 3D stereo information has been proposed as a technique to overcome some of these issues and to guide pedestrian detection. This thesis presents such an approach, whereby after obtaining robust 3D information via a novel disparity estimation technique, pedestrian detection is performed via a 3D point clustering process within a region-growing framework. This clustering process avoids using hard thresholds by using bio-metrically inspired constraints and a number of plan view statistics. This pedestrian detection technique requires no external training and is able to robustly handle challenging real-world unconstrained environments from various camera positions and orientations. In addition, this thesis presents a continuous detect-and-track approach, with additional kinematic constraints and explicit occlusion analysis, to obtain robust temporal tracking of pedestrians over
time. These approaches are experimentally validated using challenging datasets consisting of both synthetic data and real-world sequences gathered from a number of environments. In each case, the techniques are evaluated using both 2D and 3D groundtruth methodologies
Camera Planning and Fusion in a Heterogeneous Camera Network
Wide-area camera networks are becoming more and more common. They have widerange of commercial and military applications from video surveillance to smart home and from traffic monitoring to anti-terrorism. The design of such a camera network is a challenging problem due to the complexity of the environment, self and mutual occlusion of moving objects, diverse sensor properties and a myriad of performance metrics for different applications. In this dissertation, we consider two such challenges: camera planing and camera fusion. Camera planning is to determine the optimal number and placement of cameras for a target cost function. Camera fusion describes the task of combining images collected by heterogenous cameras in the network to extract information pertinent to a target application.
I tackle the camera planning problem by developing a new unified framework based on binary integer programming (BIP) to relate the network design parameters and the performance goals of a variety of camera network tasks. Most of the BIP formulations are NP hard problems and various approximate algorithms have been proposed in the literature. In this dissertation, I develop a comprehensive framework in comparing the entire spectrum of approximation algorithms from Greedy, Markov Chain Monte Carlo (MCMC) to various relaxation techniques. The key contribution is to provide not only a generic formulation of the camera planning problem but also novel approaches to adapt the formulation to powerful approximation schemes including Simulated Annealing (SA) and Semi-Definite Program (SDP). The accuracy, efficiency and scalability of each technique are analyzed and compared in depth. Extensive experimental results are provided to illustrate the strength and weakness of each method.
The second problem of heterogeneous camera fusion is a very complex problem. Information can be fused at different levels from pixel or voxel to semantic objects, with large variation in accuracy, communication and computation costs. My focus is on the geometric transformation of shapes between objects observed at different camera planes. This so-called the geometric fusion approach usually provides the most reliable fusion approach at the expense of high computation and communication costs. To tackle the complexity, a hierarchy of camera models with different levels of complexity was proposed to balance the effectiveness and efficiency of the camera network operation. Then different calibration and registration methods are proposed for each camera model. At last, I provide two specific examples to demonstrate the effectiveness of the model: 1)a fusion system to improve the segmentation of human body in a camera network consisted of thermal and regular visible light cameras and 2) a view dependent rendering system by combining the information from depth and regular cameras to collecting the scene information and generating new views in real time
Online Mutual Foreground Segmentation for Multispectral Stereo Videos
The segmentation of video sequences into foreground and background regions is
a low-level process commonly used in video content analysis and smart
surveillance applications. Using a multispectral camera setup can improve this
process by providing more diverse data to help identify objects despite adverse
imaging conditions. The registration of several data sources is however not
trivial if the appearance of objects produced by each sensor differs
substantially. This problem is further complicated when parallax effects cannot
be ignored when using close-range stereo pairs. In this work, we present a new
method to simultaneously tackle multispectral segmentation and stereo
registration. Using an iterative procedure, we estimate the labeling result for
one problem using the provisional result of the other. Our approach is based on
the alternating minimization of two energy functions that are linked through
the use of dynamic priors. We rely on the integration of shape and appearance
cues to find proper multispectral correspondences, and to properly segment
objects in low contrast regions. We also formulate our model as a frame
processing pipeline using higher order terms to improve the temporal coherence
of our results. Our method is evaluated under different configurations on
multiple multispectral datasets, and our implementation is available online.Comment: Preprint accepted for publication in IJCV (December 2018
Mesh-based 3D Textured Urban Mapping
In the era of autonomous driving, urban mapping represents a core step to let
vehicles interact with the urban context. Successful mapping algorithms have
been proposed in the last decade building the map leveraging on data from a
single sensor. The focus of the system presented in this paper is twofold: the
joint estimation of a 3D map from lidar data and images, based on a 3D mesh,
and its texturing. Indeed, even if most surveying vehicles for mapping are
endowed by cameras and lidar, existing mapping algorithms usually rely on
either images or lidar data; moreover both image-based and lidar-based systems
often represent the map as a point cloud, while a continuous textured mesh
representation would be useful for visualization and navigation purposes. In
the proposed framework, we join the accuracy of the 3D lidar data, and the
dense information and appearance carried by the images, in estimating a
visibility consistent map upon the lidar measurements, and refining it
photometrically through the acquired images. We evaluate the proposed framework
against the KITTI dataset and we show the performance improvement with respect
to two state of the art urban mapping algorithms, and two widely used surface
reconstruction algorithms in Computer Graphics.Comment: accepted at iros 201
- …