Search CORE

2 research outputs found

Pose-invariant, model-based object recognition, using linear combination of views and Bayesian statistics

Author: Zografos V.
Publication venue: UCL (University College London)
Publication date: 01/01/2009
Field of study

This thesis presents an in-depth study on the problem of object recognition, and in particular the detection of 3-D objects in 2-D intensity images which may be viewed from a variety of angles. A solution to this problem remains elusive to this day, since it involves dealing with variations in geometry, photometry and viewing angle, noise, occlusions and incomplete data. This work restricts its scope to a particular kind of extrinsic variation; variation of the image due to changes in the viewpoint from which the object is seen. A technique is proposed and developed to address this problem, which falls into the category of view-based approaches, that is, a method in which an object is represented as a collection of a small number of 2-D views, as opposed to a generation of a full 3-D model. This technique is based on the theoretical observation that the geometry of the set of possible images of an object undergoing 3-D rigid transformations and scaling may, under most imaging conditions, be represented by a linear combination of a small number of 2-D views of that object. It is therefore possible to synthesise a novel image of an object given at least two existing and dissimilar views of the object, and a set of linear coefficients that determine how these views are to be combined in order to synthesise the new image. The method works in conjunction with a powerful optimization algorithm, to search and recover the optimal linear combination coefficients that will synthesize a novel image, which is as similar as possible to the target, scene view. If the similarity between the synthesized and the target images is above some threshold, then an object is determined to be present in the scene and its location and pose are defined, in part, by the coefficients. The key benefits of using this technique is that because it works directly with pixel values, it avoids the need for problematic, low-level feature extraction and solution of the correspondence problem. As a result, a linear combination of views (LCV) model is easy to construct and use, since it only requires a small number of stored, 2-D views of the object in question, and the selection of a few landmark points on the object, the process which is easily carried out during the offline, model building stage. In addition, this method is general enough to be applied across a variety of recognition problems and different types of objects. The development and application of this method is initially explored looking at two-dimensional problems, and then extending the same principles to 3-D. Additionally, the method is evaluated across synthetic and real-image datasets, containing variations in the objects’ identity and pose. Future work on possible extensions to incorporate a foreground/background model and lighting variations of the pixels are examined

CiteSeerX

UCL Discovery

Integrated shape and pose modelling

Author: Bernard F. Buxton
M. Benjamin Dias
Publication venue
Publication date: 01/01/2002
Field of study

Flexible Shape Models (FSMs), have been widely used for modelling shape variations of deformable objects [4]. A major limitation of this approach is that it assumes a near fronto-parallel view. Recently, representing images as a Linear Combination of Views (LCV) has become a popular approach for modelling 3D pose variations in a 2D image context (e.g. [7]). This technique, however, only works for images of rigid objects. Apart from explicit 3D model approaches, most previous models that can cope with both shape and pose variations have either used a relationship between pose and shape parameters (e.g. [3]) or modelled the pose variations as variations in shape (e.g. FSMs [4]). However, variations of an object’s pose and shape are extrinsic and intrinsic degrees of freedom, respectively. The two should, therefore, not be confounded since they are independent and, in general, not correlated. We have, therefore, developed an integrated approach that uses a coupled-view FSM [3] to represent the shape of the entire face as seen from two different view-points, and uses the LCV technique to deal with pose variations. A preliminary comparison is made to the conventional FSM [4].

CiteSeerX

Crossref