1,831 research outputs found
Image based visual servoing using bitangent points applied to planar shape alignment
We present visual servoing strategies based on bitangents for aligning planar shapes. In order to acquire bitangents we use convex-hull of a curve. Bitangent points are employed in the construction of a feature vector to be used in visual control. Experimental results obtained on a 7 DOF Mitsubishi PA10 robot, verifies the proposed method
Large Margin Image Set Representation and Classification
In this paper, we propose a novel image set representation and classification
method by maximizing the margin of image sets. The margin of an image set is
defined as the difference of the distance to its nearest image set from
different classes and the distance to its nearest image set of the same class.
By modeling the image sets by using both their image samples and their affine
hull models, and maximizing the margins of the images sets, the image set
representation parameter learning problem is formulated as an minimization
problem, which is further optimized by an expectation -maximization (EM)
strategy with accelerated proximal gradient (APG) optimization in an iterative
algorithm. To classify a given test image set, we assign it to the class which
could provide the largest margin. Experiments on two applications of
video-sequence-based face recognition demonstrate that the proposed method
significantly outperforms state-of-the-art image set classification methods in
terms of both effectiveness and efficiency
Markerless deformation capture of hoverfly wings using multiple calibrated cameras
This thesis introduces an algorithm for the automated deformation capture of hoverfly
wings from multiple camera image sequences. The algorithm is capable of extracting
dense surface measurements, without the aid of fiducial markers, over an arbitrary number
of wingbeats of hovering flight and requires limited manual initialisation. A novel motion
prediction method, called the ânormalised stroke modelâ, makes use of the similarity of adjacent
wing strokes to predict wing keypoint locations, which are then iteratively refined in
a stereo image registration procedure. Outlier removal, wing fitting and further refinement
using independently reconstructed boundary points complete the algorithm. It was tested
on two hovering data sets, as well as a challenging flight manoeuvre. By comparing the
3-d positions of keypoints extracted from these surfaces with those resulting from manual
identification, the accuracy of the algorithm is shown to approach that of a fully manual
approach. In particular, half of the algorithm-extracted keypoints were within 0.17mm of
manually identified keypoints, approximately equal to the error of the manual identification
process. This algorithm is unique among purely image based flapping flight studies in the
level of automation it achieves, and its generality would make it applicable to wing tracking
of other insects
Mesh-based video coding for low bit-rate communications
In this paper, a new method for low bit-rate content-adaptive mesh-based video coding is proposed. Intra-frame coding of this method employs feature map extraction for node distribution at specific threshold levels to achieve higher density placement of initial nodes for regions that contain high frequency features and conversely sparse placement of initial nodes for smooth regions. Insignificant nodes are largely removed using a subsequent node elimination scheme. The Hilbert scan is then applied before quantization and entropy coding to reduce amount of transmitted information. For moving images, both node position and color parameters of only a subset of nodes may change from frame to frame. It is sufficient to transmit only these changed parameters. The proposed method is well-suited for video coding at very low bit rates, as processing results demonstrate that it provides good subjective and objective image quality at a lower number of required bits
Efficient Human Activity Recognition in Large Image and Video Databases
Vision-based human action recognition has attracted considerable interest in recent research for its applications to video surveillance, content-based search, healthcare, and interactive games. Most existing research deals with building informative feature descriptors, designing efficient and robust algorithms, proposing versatile and challenging datasets, and fusing multiple modalities. Often, these approaches build on certain conventions such as the use of motion cues to determine video descriptors, application of off-the-shelf classifiers, and single-factor classification of videos. In this thesis, we deal with important but overlooked issues such as efficiency, simplicity, and scalability of human activity recognition in different application scenarios: controlled video environment (e.g.~indoor surveillance), unconstrained videos (e.g.~YouTube), depth or skeletal data (e.g.~captured by Kinect), and person images (e.g.~Flicker). In particular, we are interested in answering questions like (a) is it possible to efficiently recognize human actions in controlled videos without temporal cues? (b) given that the large-scale unconstrained video data are often of high dimension low sample size (HDLSS) nature, how to efficiently recognize human actions in such data? (c) considering the rich 3D motion information available from depth or motion capture sensors, is it possible to recognize both the actions and the actors using only the motion dynamics of underlying activities? and (d) can motion information from monocular videos be used for automatically determining saliency regions for recognizing actions in still images
Sparse Bayesian information filters for localization and mapping
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution February 2008This thesis formulates an estimation framework for Simultaneous Localization and
Mapping (SLAM) that addresses the problem of scalability in large environments.
We describe an estimation-theoretic algorithm that achieves significant gains in computational
efficiency while maintaining consistent estimates for the vehicle pose and
the map of the environment.
We specifically address the feature-based SLAM problem in which the robot represents
the environment as a collection of landmarks. The thesis takes a Bayesian
approach whereby we maintain a joint posterior over the vehicle pose and feature
states, conditioned upon measurement data. We model the distribution as Gaussian
and parametrize the posterior in the canonical form, in terms of the information
(inverse covariance) matrix. When sparse, this representation is amenable to computationally
efficient Bayesian SLAM filtering. However, while a large majority of the
elements within the normalized information matrix are very small in magnitude, it is
fully populated nonetheless. Recent feature-based SLAM filters achieve the scalability
benefits of a sparse parametrization by explicitly pruning these weak links in an effort
to enforce sparsity. We analyze one such algorithm, the Sparse Extended Information
Filter (SEIF), which has laid much of the groundwork concerning the computational
benefits of the sparse canonical form. The thesis performs a detailed analysis of the
process by which the SEIF approximates the sparsity of the information matrix and
reveals key insights into the consequences of different sparsification strategies. We
demonstrate that the SEIF yields a sparse approximation to the posterior that is inconsistent,
suffering from exaggerated confidence estimates. This overconfidence has
detrimental effects on important aspects of the SLAM process and affects the higher
level goal of producing accurate maps for subsequent localization and path planning.
This thesis proposes an alternative scalable filter that maintains sparsity while
preserving the consistency of the distribution. We leverage insights into the natural
structure of the feature-based canonical parametrization and derive a method that
actively maintains an exactly sparse posterior. Our algorithm exploits the structure
of the parametrization to achieve gains in efficiency, with a computational cost that
scales linearly with the size of the map. Unlike similar techniques that sacrifice
consistency for improved scalability, our algorithm performs inference over a posterior
that is conservative relative to the nominal Gaussian distribution. Consequently, we
preserve the consistency of the pose and map estimates and avoid the effects of an
overconfident posterior.
We demonstrate our filter alongside the SEIF and the standard EKF both in simulation
as well as on two real-world datasets. While we maintain the computational
advantages of an exactly sparse representation, the results show convincingly that
our method yields conservative estimates for the robot pose and map that are nearly
identical to those of the original Gaussian distribution as produced by the EKF, but
at much less computational expense.
The thesis concludes with an extension of our SLAM filter to a complex underwater
environment. We describe a systems-level framework for localization and mapping
relative to a ship hull with an Autonomous Underwater Vehicle (AUV) equipped
with a forward-looking sonar. The approach utilizes our filter to fuse measurements
of vehicle attitude and motion from onboard sensors with data from sonar images of
the hull. We employ the system to perform three-dimensional, 6-DOF SLAM on a
ship hull
Geometrical-based lip-reading using template probabilistic multi-dimension dynamic time warping
By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. In this paper, we present a geometrical-based automatic lip reading system that extracts the lip region from images using conventional techniques, but the contour itself is extracted using a novel application of a combination of border following and convex hull approaches. Classification is carried out using an enhanced dynamic time warping technique that has the ability to operate in multiple dimensions and a template probability technique that is able to compensate for differences in the way words are uttered in the training set. The performance of the new system has been assessed in recognition of the English digits 0 to 9 as available in the CUAVE database. The experimental results obtained from the new approach compared favorably with those of existing lip reading approaches, achieving a word recognition accuracy of up to 71% with the visual information being obtained from estimates of lip height, width and their ratio
- âŠ