733 research outputs found
Automated Complexity-Sensitive Image Fusion
To construct a complete representation of a scene with environmental obstacles such as fog, smoke, darkness, or textural homogeneity, multisensor video streams captured in diferent modalities are considered. A computational method for automatically fusing multimodal image streams into a highly informative and unified stream is proposed. The method consists of the following steps: 1. Image registration is performed to align video frames in the visible band over time, adapting to the nonplanarity of the scene by automatically subdividing the image domain into regions approximating planar patches
2. Wavelet coefficients are computed for each of the input frames in each modality
3. Corresponding regions and points are compared using spatial and temporal information across various scales
4. Decision rules based on the results of multimodal image analysis are used to combine thewavelet coefficients from different modalities
5. The combined wavelet coefficients are inverted to produce an output frame containing useful information gathered from the available modalities
Experiments show that the proposed system is capable of producing fused output containing the characteristics of color visible-spectrum imagery while adding information exclusive to infrared imagery, with attractive visual and informational properties
Visual analysis and synthesis with physically grounded constraints
The past decade has witnessed remarkable progress in image-based, data-driven vision and graphics. However, existing approaches often treat the images as pure 2D signals and not as a 2D projection of the physical 3D world. As a result, a lot of training examples are required to cover sufficiently diverse appearances and inevitably suffer from limited generalization capability. In this thesis, I propose "inference-by-composition" approaches to overcome these limitations by modeling and interpreting visual signals in terms of physical surface, object, and scene. I show how we can incorporate physically grounded constraints such as scene-specific geometry in a non-parametric optimization framework for (1) revealing the missing parts of an image due to removal of a foreground or background element, (2) recovering high spatial frequency details that are not resolvable in low-resolution observations. I then extend the framework from 2D images to handle spatio-temporal visual data (videos). I demonstrate that we can convincingly fill spatio-temporal holes in a temporally coherent fashion by jointly reconstructing the appearance and motion. Compared to existing approaches, our technique can synthesize physically plausible contents even in challenging videos. For visual analysis, I apply stereo camera constraints for discovering multiple approximately linear structures in extremely noisy videos with an ecological application to bird migration monitoring at night. The resulting algorithms are simple and intuitive while achieving state-of-the-art performance without the need of training on an exhaustive set of visual examples
Intelligent Automatic Interpretation of Active Marine Sonar
This dissertation explores the problems raised by the design and
construction of a real-time sonar interpreter operating in a three dimensional
marine context, and then focusses on two major research
issues inherent in sonar interpretation: the treatment of observer
and object motion, and the efficient exploitation of the specularity
of acoustic reflection. The theoretical results derived in these
areas have been tested where appropriate by computer simulation.
In the context of mobile marine robotics, the registration of sensory
data obtained from differing viewpoints is of paramount importance.
Small marine vehicles of the type considered here do not
carry sophisticated navigational equipment, and cannot be held stationary
in the water for any length of time.
The viewpoint registration problem is defined and analysed in
terms of the new problem of motion resolution: the task of resolving
the apparent motion of objects into that part due to the movement of
the observer and that due to the objects' proper motion. Two solutions
to this under constrained problem are presented. The first
presupposes that the observer orientation is known ~ priori so that
only the translational observer motion must be determined. It is
applicable to two and three-dimensional situations. The second solution
determines both the translational and the rotational motion of
the observer, but is restricted to a two-dimensional situation. Both
solutions are based on target
extensively tested in two
tracking techniques, and have
dimensions by computer simulation.
been
The
necessary extensions to deal with full three-dimensional motion are
also discussed.
The second major research issue addressed in this thesis is the
efficient use of specularity. Specular echoes have a high intrinsic
information content because of the alignment conditions necessary for
their generation. In the marine acoustic context they provide a significant
proportion of the information available from an acoustic
ranger. I suggest a new method that uses directly the information
present in specular reflections and the history of the vehicle motion
to classify the specular echo sources and infer the local structure
of the objects bearing them. The method builds on the output of a
motion resolution system. Six distinct types of specular echo source
are described and three properties useful for their discrimination
are discussed. A suitable inference system for the analysis and
classification of specular echo sources is also proposed
Online Structured Learning for Real-Time Computer Vision Gaming Applications
In recent years computer vision has played an increasingly important role in the development of computer games, and it now features as one of the core technologies for many gaming platforms. The work in this thesis addresses three problems in real-time computer vision, all of which are motivated by their potential application to computer games.
We rst present an approach for real-time 2D tracking of arbitrary objects. In common with recent research in this area we incorporate online learning to provide an appearance model which is able to adapt to the target object and its surrounding background during tracking. However, our approach moves beyond the standard framework of tracking using binary classication and instead integrates tracking and learning in a more principled way through the use of structured learning. As well as providing a more powerful framework for adaptive visual object tracking, our approach also outperforms state-of-the-art tracking algorithms on standard datasets. Next we consider the task of keypoint-based object tracking. We take the traditional pipeline of matching keypoints followed by geometric verication and show how this can be embedded into a structured learning framework in order to provide principled adaptivity to a given environment. We also propose an approximation method allowing us to take advantage of recently developed binary image descriptors, meaning our approach is suitable for real-time application even on low-powered portable devices. Experimentally, we clearly see the benet that online adaptation using structured learning can bring to this problem. Finally, we present an approach for approximately recovering the dense 3D structure of a scene which has been mapped by a simultaneous localisation and mapping system. Our approach is guided by the constraints of the low-powered portable hardware we are targeting, and we develop a system which coarsely models the scene using a small number of planes. To achieve this, we frame the task as a structured prediction problem and introduce online learning into our approach to provide adaptivity to a given scene. This allows us to use relatively simple multi-view information coupled with online learning of appearance to efficiently produce coarse reconstructions of a scene
A photogrammetric approach for real-time 3D localization and tracking of pedestrians in monocular infrared imagery
Target tracking within conventional video imagery poses a significant challenge that is increasingly being addressed via complex algorithmic solutions. The complexity of this problem can be fundamentally attributed to the ambiguity associated with actual 3D scene position of a given tracked object in relation to its observed position in 2D image space. We propose an approach that challenges the current trend in complex tracking solutions by addressing this fundamental ambiguity head-on. In contrast to prior work in the field, we leverage the key advantages of thermal-band infrared (IR) imagery for the pedestrian localization to show that robust localization and foreground target separation, afforded via such imagery, facilities accurate 3D position estimation to within the error bounds of conventional Global Position System (GPS) positioning. This work investigates the accuracy of classical photogrammetry, within the context of current target detection and classification techniques, as a means of recovering the true 3D position of pedestrian targets within the scene. Based on photogrammetric estimation of target position, we then illustrate the efficiency of regular Kalman filter based tracking operating on actual 3D pedestrian scene trajectories. We present both a statistical and experimental analysis of the associated errors of this approach in addition to real-time 3D pedestrian tracking using monocular infrared (IR) imagery from a thermal-band camera. © (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only
- …