2 research outputs found
Pose-invariant, model-based object recognition, using linear combination of views and Bayesian statistics
This thesis presents an in-depth study on the problem of object recognition, and in particular the detection
of 3-D objects in 2-D intensity images which may be viewed from a variety of angles. A solution to this
problem remains elusive to this day, since it involves dealing with variations in geometry, photometry
and viewing angle, noise, occlusions and incomplete data. This work restricts its scope to a particular
kind of extrinsic variation; variation of the image due to changes in the viewpoint from which the object
is seen.
A technique is proposed and developed to address this problem, which falls into the category of
view-based approaches, that is, a method in which an object is represented as a collection of a small
number of 2-D views, as opposed to a generation of a full 3-D model. This technique is based on the
theoretical observation that the geometry of the set of possible images of an object undergoing 3-D rigid
transformations and scaling may, under most imaging conditions, be represented by a linear combination
of a small number of 2-D views of that object. It is therefore possible to synthesise a novel image of an
object given at least two existing and dissimilar views of the object, and a set of linear coefficients that
determine how these views are to be combined in order to synthesise the new image.
The method works in conjunction with a powerful optimization algorithm, to search and recover the
optimal linear combination coefficients that will synthesize a novel image, which is as similar as possible
to the target, scene view. If the similarity between the synthesized and the target images is above some
threshold, then an object is determined to be present in the scene and its location and pose are defined,
in part, by the coefficients. The key benefits of using this technique is that because it works directly
with pixel values, it avoids the need for problematic, low-level feature extraction and solution of the
correspondence problem. As a result, a linear combination of views (LCV) model is easy to construct
and use, since it only requires a small number of stored, 2-D views of the object in question, and the
selection of a few landmark points on the object, the process which is easily carried out during the offline,
model building stage. In addition, this method is general enough to be applied across a variety of
recognition problems and different types of objects.
The development and application of this method is initially explored looking at two-dimensional
problems, and then extending the same principles to 3-D. Additionally, the method is evaluated across
synthetic and real-image datasets, containing variations in the objects’ identity and pose. Future work on
possible extensions to incorporate a foreground/background model and lighting variations of the pixels
are examined
Integrated shape and pose modelling
Flexible Shape Models (FSMs), have been widely used for modelling shape variations of deformable objects [4]. A major limitation of this approach is that it assumes a near fronto-parallel view. Recently, representing images as a Linear Combination of Views (LCV) has become a popular approach for modelling 3D pose variations in a 2D image context (e.g. [7]). This technique, however, only works for images of rigid objects. Apart from explicit 3D model approaches, most previous models that can cope with both shape and pose variations have either used a relationship between pose and shape parameters (e.g. [3]) or modelled the pose variations as variations in shape (e.g. FSMs [4]). However, variations of an object’s pose and shape are extrinsic and intrinsic degrees of freedom, respectively. The two should, therefore, not be confounded since they are independent and, in general, not correlated. We have, therefore, developed an integrated approach that uses a coupled-view FSM [3] to represent the shape of the entire face as seen from two different view-points, and uses the LCV technique to deal with pose variations. A preliminary comparison is made to the conventional FSM [4].