20 research outputs found
Recommended from our members
Automatic Multilevel Feature Abstraction in Adaptable Machine Vision Systems
Vision is a complex task which can be accomplished with apparent ease by biological systems, but for which the design of artificial systems is difficult. Although machine vision systems can be successfully designed for a specific task, under certain conditions, they are likely to fail if circumstances change. This was the motivation for the research into ways in which systems can be self-designing and adaptable to new visual tasks. The research was conducted in three vital areas of concern for machine vision systems.
The first area is finding a suitable architecture for forming an appropriate representation for the current task. The research investigated the application of Hypernetworks theory to building a multilevel, generally-applicable representation, through repeated application of a fundamental 'self-similarity' principle, that parts of objects assembled under a particular relation at one level, form whole objects at the next. Results show that this is potentially a powerful approach for autonomously generating an adaptable system-architecture suitable for multiple visual tasks.
The second area is the autonomous extraction of suitable low-level features, which the research investigated through random generation of minimally-constrained pixel-configurations and algorithmic generation of homogeneous and heterogeneous polygons. The results suggest that, despite the simplicity of the features making them vulnerable to image transformations, these are promising approaches worth developing further.
The third area is automatic feature selection. The research explored management of 'dimensionality' and of 'combinatorial explosion', as well as how to locate relevant features at multiple representation levels, in the context of 'emergence' of structure. Results indicate that this approach can find useful 'intermediate-level' constructs through analysis of the connectivity of the simplices representing objects at higher levels.
The research concludes that the proposed novel approaches to tackling the above issues, in particular the application of hypernetworks to the formation of multilevel representations and the resulting emergence of higher-level structure, is fruitful
3D Object Recognition Based On Constrained 2D Views
The aim of the present work was to build a novel 3D object recognition system capable of classifying
man-made and natural objects based on single 2D views. The approach to this problem
has been one motivated by recent theories on biological vision and multiresolution analysis. The
project's objectives were the implementation of a system that is able to deal with simple 3D
scenes and constitutes an engineering solution to the problem of 3D object recognition, allowing
the proposed recognition system to operate in a practically acceptable time frame.
The developed system takes further the work on automatic classification of marine phytoplank-
(ons, carried out at the Centre for Intelligent Systems, University of Plymouth. The thesis discusses
the main theoretical issues that prompted the fundamental system design options. The
principles and the implementation of the coarse data channels used in the system are described.
A new multiresolution representation of 2D views is presented, which provides the classifier
module of the system with coarse-coded descriptions of the scale-space distribution of potentially
interesting features. A multiresolution analysis-based mechanism is proposed, which directs
the system's attention towards potentially salient features. Unsupervised similarity-based
feature grouping is introduced, which is used in coarse data channels to yield feature signatures
that are not spatially coherent and provide the classifier module with salient descriptions of object
views. A simple texture descriptor is described, which is based on properties of a special wavelet
transform.
The system has been tested on computer-generated and natural image data sets, in conditions
where the inter-object similarity was monitored and quantitatively assessed by human subjects,
or the analysed objects were very similar and their discrimination constituted a difficult task even
for human experts. The validity of the above described approaches has been proven. The studies
conducted with various statistical and artificial neural network-based classifiers have shown that
the system is able to perform well in all of the above mentioned situations. These investigations
also made possible to take further and generalise a number of important conclusions drawn during
previous work carried out in the field of 2D shape (plankton) recognition, regarding the behaviour
of multiple coarse data channels-based pattern recognition systems and various classifier
architectures.
The system possesses the ability of dealing with difficult field-collected images of objects and
the techniques employed by its component modules make possible its extension to the domain
of complex multiple-object 3D scene recognition. The system is expected to find immediate applicability
in the field of marine biota classification
Human inspired pattern recognition via local invariant features
Vision is increasingly becoming a vital element in the manufacturing industry. As complex as it already is, vision is becoming even more difficult to implement in a pattern recognition environment as it converges toward the level of what humans visualize. Relevant brain work technologies are allowing vision systems to add capability and tasks that were long reserved for humans. The ability to recognize patterns like humans do is a good goal in terms of performance metrics for manufacturing activities. To achieve this goal, we created a neural network that achieves pattern recognition analogous to the human visual cortex using high quality keypoints by optimizing the scale space and pairing keypoints with edges as input into the model. This research uses the Taguchi Design of Experiments approach to find optimal values for the SIFT parameters with respect to finding correct matches between images that vary in rotation and scale. The approach used the Taguchi L18 matrix to determine the optimal parameter set. The performance obtained from SIFT using the optimal solution was compared with the performance from the original SIFT algorithm parameters. It is shown that correct matches between an original image and a scaled, rotated, or scaled and rotated version of that image improves by 17% using the optimal values of the SIFT. A human inspired approach was used to create a CMAC based neural network capable of pattern recognition. A comparison of 3 object, 30 object, and 50 object scenes were examined using edge and optimized SIFT based features as inputs and produced extensible results from 3 to 50 objects based on classification performance. The classification results prove that we achieve a high level of pattern recognition that ranged from 96.1% to 100% for objects under consideration. The result is a pattern recognition model capable of locally based classification based on invariant information without the need for sets of information that include input sensory data that is not necessarily invariant (background data, raw pixel data, viewpoint angles) that global models rely on in pattern recognition
The Role of Knowledge in Visual Shape Representation
This report shows how knowledge about the visual world can be built into a shape representation in the form of a descriptive vocabulary making explicit the important geometrical relationships comprising objects' shapes. Two computational tools are offered: (1) Shapestokens are placed on a Scale-Space Blackboard, (2) Dimensionality-reduction captures deformation classes in configurations of tokens. Knowledge lies in the token types and deformation classes tailored to the constraints and regularities ofparticular shape worlds. A hierarchical shape vocabulary has been implemented supporting several later visual tasks in the two-dimensional shape domain of the dorsal fins of fishes
Recommended from our members
Recognition by directed attention to recursively partitioned images
A learning/recognition model (and instantiating program) is described which recursively combines the learning paradigms of conceptual clustering (Michalski, 1980) and learning-from-examples to resolve the ambiguities of real-world recognition. The model is based on neuropsychological and psychological evidence that the visual system is analytic, hierarchical, and composed of a parallel/serial dichotomy (many, see conclusions by Crick, 1984). Emulating the experimental evidence, parallel processes in the model decompose the image into components and cluster the constituents in much the same way as the image processing technique known as moment analysis (Alt, 1962). Serial, attentive mechanisms then reassemble the decompositions by investigating spatial relationships between components. The use of attentive mechanisms extends the moment analysis technique to handle alterations in structure and solves the contention problem created by combining the two learning paradigms. The contention results from a disagreement between the teacher and the model on what constitutes the salient features at the highest level of the symbol. There are four cases ZBT must handle, two of which result from the disagreement with the teacher. The parallel/serial dichotomy represents a vertical/horizontal tradeoff between the invariant and variant features of a domain. The resultant learned hierarchy allows ZBT to recognize structural differences while avoiding problems of exponential growth
Image categorisation using parallel network constructs: an emulation of early human colour processing and context evaluation
PhD ThesisTraditional geometric scene analysis cannot attempt to address the understanding of human vision.
Instead it adopts an algorithmic approach, concentrating on geometric model fitting. Human vision,
however, is both quick and accurate but very little is known about how the recognition of objects is
performed with such speed and efficiency. It is thought that there must be some process both for coding
and storage which can account for these characteristics. In this thesis a more strict emulation of human
vision, based on work derived from medical psychology and other fields, is proposed. Human beings
must store perceptual information from which to make comparisons, derive structures and classify
objects. It is widely thought by cognitive psychologists that some form of symbolic representation
is inherent in this storage. Here a mathematical syntax is defined to perform this kind of symbolic
description. The symbolic structures must be capable of manipulation and a set of operators is defined
for this purpose. The early visual cortex and geniculate body are both inherently parallel in operation
and simple in structure. A broadly connectionist emulation of this kind of structure is described,
using independent computing elements, which can perform segmentation, re-colouring and generation
of the base elements of the description syntax. Primal colour information is then collected by a second
network which forms the visual topology, colouring and position information of areas in the image as
well as a full description of the scene in terms of a more complex symbolic set. The idea of different
visual contexts is introduced and a model is proposed for the accumulation of context rules. This
model is then applied to a database of natural images.EPSRC CASE award:
Neural Computer Sciences,Southampton
From surfaces to objects : Recognizing objects using surface information and object models.
This thesis describes research on recognizing partially obscured objects using
surface information like Marr's 2D sketch ([MAR82]) and surface-based geometrical
object models. The goal of the recognition process is to produce a fully
instantiated object hypotheses, with either image evidence for each feature or
explanations for their absence, in terms of self or external occlusion.
The central point of the thesis is that using surface information should be
an important part of the image understanding process. This is because surfaces
are the features that directly link perception to the objects perceived (for
normal "camera-like" sensing) and because surfaces make explicit information
needed to understand and cope with some visual problems (e.g. obscured features).
Further, because surfaces are both the data and model primitive, detailed
recognition can be made both simpler and more complete.
Recognition input is a surface image, which represents surface orientation and
absolute depth. Segmentation criteria are proposed for forming surface patches
with constant curvature character, based on surface shape discontinuities which
become labeled segmentation- boundaries.
Partially obscured object surfaces are reconstructed using stronger surface based
constraints. Surfaces are grouped to form surface clusters, which are 3D
identity-independent solids that often correspond to model primitives. These are
used here as a context within which to select models and find all object features.
True three-dimensional properties of image boundaries, surfaces and surface
clusters are directly estimated using the surface data.
Models are invoked using a network formulation, where individual nodes
represent potential identities for image structures. The links between nodes are
defined by generic and structural relationships. They define indirect evidence relationships
for an identity. Direct evidence for the identities comes from the data
properties. A plausibility computation is defined according to the constraints inherent
in the evidence types. When a node acquires sufficient plausibility, the
model is invoked for the corresponding image structure.Objects are primarily represented using a surface-based geometrical model.
Assemblies are formed from subassemblies and surface primitives, which are
defined using surface shape and boundaries. Variable affixments between assemblies
allow flexibly connected objects.
The initial object reference frame is estimated from model-data surface relationships,
using correspondences suggested by invocation. With the reference
frame, back-facing, tangential, partially self-obscured, totally self-obscured and
fully visible image features are deduced. From these, the oriented model is used
for finding evidence for missing visible model features. IT no evidence is found,
the program attempts to find evidence to justify the features obscured by an unrelated
object. Structured objects are constructed using a hierarchical synthesis
process.
Fully completed hypotheses are verified using both existence and identity
constraints based on surface evidence.
Each of these processes is defined by its computational constraints and are
demonstrated on two test images. These test scenes are interesting because they
contain partially and fully obscured object features, a variety of surface and solid
types and flexibly connected objects. All modeled objects were fully identified
and analyzed to the level represented in their models and were also acceptably
spatially located.
Portions of this work have been reported elsewhere ([FIS83], [FIS85a], [FIS85b],
[FIS86]) by the author
A study of spatial data models and their application to selecting information from pictorial databases
People have always used visual techniques to locate information in the space
surrounding them. However with the advent of powerful computer systems and
user-friendly interfaces it has become possible to extend such techniques to stored
pictorial information. Pictorial database systems have in the past primarily used
mathematical or textual search techniques to locate specific pictures contained
within such databases. However these techniques have largely relied upon complex
combinations of numeric and textual queries in order to find the required
pictures. Such techniques restrict users of pictorial databases to expressing what is
in essence a visual query in a numeric or character based form. What is required
is the ability to express such queries in a form that more closely matches the user's
visual memory or perception of the picture required. It is suggested in this thesis
that spatial techniques of search are important and that two of the most important
attributes of a picture are the spatial positions and the spatial relationships of
objects contained within such pictures. It is further suggested that a database
management system which allows users to indicate the nature of their query by
visually placing iconic representations of objects on an interface in spatially
appropriate positions, is a feasible method by which pictures might be found from
a pictorial database. This thesis undertakes a detailed study of spatial techniques
using a combination of historical evidence, psychological conclusions and practical
examples to demonstrate that the spatial metaphor is an important concept and that
pictures can be readily found by visually specifying the spatial positions and
relationships between objects contained within them
Visual scene recognition with biologically relevant generative models
This research focuses on developing visual object categorization methodologies that are based on machine learning techniques and biologically inspired generative models of visual scene recognition. Modelling the statistical variability in visual patterns, in the space of features extracted from them by an appropriate low level signal processing technique, is an important matter of investigation for both humans and machines. To study this problem, we have examined in detail two recent probabilistic models of vision: a simple multivariate Gaussian model as suggested by (Karklin & Lewicki, 2009) and a restricted Boltzmann machine (RBM) proposed by (Hinton, 2002). Both the models have been widely used for visual object classification and scene analysis tasks before. This research highlights that these models on their own are not plausible enough to perform the classification task, and suggests Fisher kernel as a means of inducing discrimination into these models for classification power. Our empirical results on standard benchmark data sets reveal that the classification performance of these generative models could be significantly boosted near to the state of the art performance, by drawing a Fisher kernel from compact generative models that computes the data labels in a fraction of total computation time. We compare the proposed technique with other distance based and kernel based classifiers to show how computationally efficient the Fisher kernels are. To the best of our knowledge, Fisher kernel has not been drawn from the RBM before, so the work presented in the thesis is novel in terms of its idea and application to vision problem