82 research outputs found
Recommended from our members
Recognition by directed attention to recursively partitioned images
A learning/recognition model (and instantiating program) is described which recursively combines the learning paradigms of conceptual clustering (Michalski, 1980) and learning-from-examples to resolve the ambiguities of real-world recognition. The model is based on neuropsychological and psychological evidence that the visual system is analytic, hierarchical, and composed of a parallel/serial dichotomy (many, see conclusions by Crick, 1984). Emulating the experimental evidence, parallel processes in the model decompose the image into components and cluster the constituents in much the same way as the image processing technique known as moment analysis (Alt, 1962). Serial, attentive mechanisms then reassemble the decompositions by investigating spatial relationships between components. The use of attentive mechanisms extends the moment analysis technique to handle alterations in structure and solves the contention problem created by combining the two learning paradigms. The contention results from a disagreement between the teacher and the model on what constitutes the salient features at the highest level of the symbol. There are four cases ZBT must handle, two of which result from the disagreement with the teacher. The parallel/serial dichotomy represents a vertical/horizontal tradeoff between the invariant and variant features of a domain. The resultant learned hierarchy allows ZBT to recognize structural differences while avoiding problems of exponential growth
Time-frequency shift-tolerance and counterpropagation network with applications to phoneme recognition
Human speech signals are inherently multi-component non-stationary signals. Recognition schemes for classification of non-stationary signals generally require some kind of temporal alignment to be performed. Examples of techniques used for temporal alignment include hidden Markov models and dynamic time warping. Attempts to incorporate temporal alignment into artificial neural networks have resulted in the construction of time-delay neural networks. The nonstationary nature of speech requires a signal representation that is dependent on time. Time-frequency signal analysis is an extension of conventional time-domain and frequency-domain analysis methods. Researchers have reported on the effectiveness of time-frequency representations to reveal the time-varying nature of speech. In this thesis, a recognition scheme is developed for temporal-spectral alignment of nonstationary signals by performing preprocessing on the time-frequency distributions of the speech phonemes. The resulting representation is independent of any amount of time-frequency shift and is time-frequency shift-tolerant (TFST). The proposed scheme does not require time alignment of the signals and has the additional merit of providing spectral alignment, which may have importance in recognition of speech from different speakers. A modification to the counterpropagation network is proposed that is suitable for phoneme recognition. The modified network maintains the simplicity and competitive mechanism of the counterpropagation network and has additional benefits of fast learning and good modelling accuracy. The temporal-spectral alignment recognition scheme and modified counterpropagation network are applied to the recognition task of speech phonemes. Simulations show that the proposed scheme has potential in the classification of speech phonemes which have not been aligned in time. To facilitate the research, an environment to perform time-frequency signal analysis and recognition using artificial neural networks was developed. The environment provides tools for time-frequency signal analysis and simulations of of the counterpropagation network
Recommended from our members
Geometric Transformation Techniques for Digital Images: A Survey
This survey presents a wide collection of algorithms for the geometric transformation of digital images. Efficient image transformation algorithms are critically important to the remote sensing, medical imaging, computer vision, and computer graphics communities. We review the growth of this field and compare all the described algorithms. Since this subject is interdisciplinary, emphasis is placed on the unification of the terminology, motivation, and contributions of each technique to yield a single coherent framework. This paper attempts to serve a dual role as a survey and a tutorial. It is comprehensive in scope and detailed in style. The primary focus centers on the three components that comprise all geometric transformations: spatial transformations, resampling, and antialiasing. In addition, considerable attention is directed to the dramatic progress made in the development of separable algorithms. The text is supplemented with numerous examples and an extensive bibliography
Recommended from our members
Neural network techniques for position and scale invariant image classification
This research is concerned with the application of neural network techniques to the problems of classifying images in a manner that is invariant to changes in position and scale. In addition to the goal of invariant classification, the network has to classify the objects in a hierarchical manner, in which complex features are constructed from simpler features, and use unsupervised learning. The resultant hierarchical structure should be able to classify the image by having an internal representation that models the structure of the image.
After finding existing neural network techniques unsuitable, a new type of neural network was developed that differed from the conventional multi-layer perceptron type of architecture. This network was constructed from neurons that were grouped into feature detectors.These neurons were taught in an unsupervised manner that used a technique based on Kohonen learning.A number of novel techniques were developed to improve the learning and classification performance of the network.
The network was able to retain the spatial relationship of the classified features; this inherent property resulted in the capability for position and scale invariant classification. As a consequence, an additional invariance filter was not required. In addition to achieving the invariance property, the developed techniques enabled multiple objects in an image to be classified.
When the network had learned the spatial relationships between the lower level features, names could be assigned to the identified features. As part of the classification process, th e system was able to identify the positions of the classified features in all layers of the network.
A software model of an artificial retina was used to test the grey scale classification performance of the network and to assess the response of the retina to changes in brightness.
Like the Neocognitron, the resulting network was developed solely for image classification. Although the Neocognitron is not designed for scale or position invariance, it was chosen for comparison purposes because it has structural similarities and the ability to accommodates light changes in the image.
This type of network could be used as the basis for a 2D-scene analysis neural network, in which the inherent parallelism of the neural network would provide simultaneous classification of the objects in the image
Directional edge and texture representations for image processing
An efficient representation for natural images is of fundamental importance in image processing and analysis. The commonly used separable transforms such as wavelets axe not best suited for images due to their inability to exploit directional regularities such as edges and oriented textural patterns; while most of the recently proposed directional schemes cannot represent these two types of features in a unified transform. This thesis focuses on the development of directional representations for images which can capture both edges and textures in a multiresolution manner. The thesis first considers the problem of extracting linear features with the multiresolution Fourier transform (MFT). Based on a previous MFT-based linear feature model, the work extends the extraction method into the situation when the image is corrupted by noise. The problem is tackled by the combination of a "Signal+Noise" frequency model, a refinement stage and a robust classification scheme. As a result, the MFT is able to perform linear feature analysis on noisy images on which previous methods failed. A new set of transforms called the multiscale polar cosine transforms (MPCT) are also proposed in order to represent textures. The MPCT can be regarded as real-valued MFT with similar basis functions of oriented sinusoids. It is shown that the transform can represent textural patches more efficiently than the conventional Fourier basis. With a directional best cosine basis, the MPCT packet (MPCPT) is shown to be an efficient representation for edges and textures, despite its high computational burden. The problem of representing edges and textures in a fixed transform with less complexity is then considered. This is achieved by applying a Gaussian frequency filter, which matches the disperson of the magnitude spectrum, on the local MFT coefficients. This is particularly effective in denoising natural images, due to its ability to preserve both types of feature. Further improvements can be made by employing the information given by the linear feature extraction process in the filter's configuration. The denoising results compare favourably against other state-of-the-art directional representations
Recommended from our members
Automatic Multilevel Feature Abstraction in Adaptable Machine Vision Systems
Vision is a complex task which can be accomplished with apparent ease by biological systems, but for which the design of artificial systems is difficult. Although machine vision systems can be successfully designed for a specific task, under certain conditions, they are likely to fail if circumstances change. This was the motivation for the research into ways in which systems can be self-designing and adaptable to new visual tasks. The research was conducted in three vital areas of concern for machine vision systems.
The first area is finding a suitable architecture for forming an appropriate representation for the current task. The research investigated the application of Hypernetworks theory to building a multilevel, generally-applicable representation, through repeated application of a fundamental 'self-similarity' principle, that parts of objects assembled under a particular relation at one level, form whole objects at the next. Results show that this is potentially a powerful approach for autonomously generating an adaptable system-architecture suitable for multiple visual tasks.
The second area is the autonomous extraction of suitable low-level features, which the research investigated through random generation of minimally-constrained pixel-configurations and algorithmic generation of homogeneous and heterogeneous polygons. The results suggest that, despite the simplicity of the features making them vulnerable to image transformations, these are promising approaches worth developing further.
The third area is automatic feature selection. The research explored management of 'dimensionality' and of 'combinatorial explosion', as well as how to locate relevant features at multiple representation levels, in the context of 'emergence' of structure. Results indicate that this approach can find useful 'intermediate-level' constructs through analysis of the connectivity of the simplices representing objects at higher levels.
The research concludes that the proposed novel approaches to tackling the above issues, in particular the application of hypernetworks to the formation of multilevel representations and the resulting emergence of higher-level structure, is fruitful
Recommended from our members
On the capture and representation of fonts
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The commercial need to capture, process and represent the shape and form of an outline has lead to the development of a number of spline routines. These use a mathematical curve format that approximates the contours of a given shape. The modelled outline lends itself to be used on, and for, a variety of purposes. These include graphic screens, laser printers and numerically controlled machines. The latter can be employed for cutting foil, metal. plastic and stone. One of the most widely used software design packages has been the lKARUS system. This, developed by URW of Hamburg (Gennany), employs a number of mathematical descriptions that facilitate the process of both modelling and representation of font characters. It uses a variety of curve formats, including Bezier cubics, general conics and parabolics. The work reported in this dissertation focuses on developing improved techniques, primarily. for the lKARUS system. This includes two algorithms
which allow a Bezier cubic description, two for a general conic representation and, yet another, two for the parabolic case. In addition, a number of algorithms are presented which promote conversions between these mathematical forms; for example, Bezier cubics to a general conic form. Furthennore, algorithms are developed to assist the process of rasterising both cubic and quadratic arcs.This study was partly funded by the Science and Education Research Council (SERC)
- …