22 research outputs found

    Maximum weighted likelihood via rival penalized EM for density mixture clustering with automatic model selection

    Full text link

    Configuring the radial basis function neural network

    Get PDF
    The most important factor in configuring an optimum radial basis function (RBF) network is the training of neural units in the hidden layer. Many algorithms have been proposed, e.g., competitive learning (CL), to train the hidden units. CL suffers from producing dead-units. The other major factor Which was ignored in the past is the appropriate selection of the number of neural units in the hidden layer. The frequency sensitive competitive learning (FSCL) algorithm was proposed to alleviate the problem of dead-units, but it does not alleviate the latter problem. The rival penalized competitive learning (RPCL) algorithm is an improved version of the FSCL algorithm, which does solve the latter problem provided that a larger number of initial neural units are assigned. It is, however, very sensitive to the learning rate. This thesis proposes a new algorithm called the scattering-based clustering (SBC) algorithm, in which the FSCL algorithm is first applied to let the neural units converge. Then scatter matrices of the clustered data are used to compute the sphericity for each k, where k is the number of clusters. The optimum number of neural units to be used in the hidden layer is then obtained. The properties of the scatter matrices and sphericity are analytically discussed. A comparative study is done among different learning algorithms on training the RBF network. The result shows that the SBC algorithm outperforms the others

    A Batch Rival Penalized Expectation-Maximization Algorithm for Gaussian Mixture Clustering with Automatic Model Selection

    Get PDF
    Within the learning framework of maximum weighted likelihood (MWL) proposed by Cheung, 2004 and 2005, this paper will develop a batch Rival Penalized Expectation-Maximization (RPEM) algorithm for density mixture clustering provided that all observations are available before the learning process. Compared to the adaptive RPEM algorithm in Cheung, 2004 and 2005, this batch RPEM need not assign the learning rate analogous to the Expectation-Maximization (EM) algorithm (Dempster et al., 1977), but still preserves the capability of automatic model selection. Further, the convergence speed of this batch RPEM is faster than the EM and the adaptive RPEM in general. The experiments show the superior performance of the proposed algorithm on the synthetic data and color image segmentation

    Investigations on number selection for finite mixture models and clustering analysis.

    Get PDF
    by Yiu Ming Cheung.Thesis (M.Phil.)--Chinese University of Hong Kong, 1997.Includes bibliographical references (leaves 92-99).Abstract --- p.iAcknowledgement --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.1.1 --- Bayesian YING-YANG Learning Theory and Number Selec- tion Criterion --- p.5Chapter 1.2 --- General Motivation --- p.6Chapter 1.3 --- Contributions of the Thesis --- p.6Chapter 1.4 --- Other Related Contributions --- p.7Chapter 1.4.1 --- A Fast Number Detection Approach --- p.7Chapter 1.4.2 --- Application of RPCL to Prediction Models for Time Series Forecasting --- p.7Chapter 1.4.3 --- Publications --- p.8Chapter 1.5 --- Outline of the Thesis --- p.8Chapter 2 --- Open Problem: How Many Clusters? --- p.11Chapter 3 --- Bayesian YING-YANG Learning Theory: Review and Experiments --- p.17Chapter 3.1 --- Briefly Review of Bayesian YING-YANG Learning Theory --- p.18Chapter 3.2 --- Number Selection Criterion --- p.20Chapter 3.3 --- Experiments --- p.23Chapter 3.3.1 --- Experimental Purposes and Data Sets --- p.23Chapter 3.3.2 --- Experimental Results --- p.23Chapter 4 --- Conditions of Number Selection Criterion --- p.39Chapter 4.1 --- Alternative Condition of Number Selection Criterion --- p.40Chapter 4.2 --- Conditions of Special Hard-cut Criterion --- p.45Chapter 4.2.1 --- Criterion Conditions in Two-Gaussian Case --- p.45Chapter 4.2.2 --- Criterion Conditions in k*-Gaussian Case --- p.59Chapter 4.3 --- Experimental Results --- p.60Chapter 4.3.1 --- Purpose and Data Sets --- p.60Chapter 4.3.2 --- Experimental Results --- p.63Chapter 4.4 --- Discussion --- p.63Chapter 5 --- Application of Number Selection Criterion to Data Classification --- p.80Chapter 5.1 --- Unsupervised Classification --- p.80Chapter 5.1.1 --- Experiments --- p.81Chapter 5.2 --- Supervised Classification --- p.82Chapter 5.2.1 --- RBF Network --- p.85Chapter 5.2.2 --- Experiments --- p.86Chapter 6 --- Conclusion and Future Work --- p.89Chapter 6.1 --- Conclusion --- p.89Chapter 6.2 --- Future Work --- p.90Bibliography --- p.92Chapter A --- A Number Detection Approach for Equal-and-Isotropic Variance Clusters --- p.100Chapter A.1 --- Number Detection Approach --- p.100Chapter A.2 --- Demonstration Experiments --- p.102Chapter A.3 --- Remarks --- p.105Chapter B --- RBF Network with RPCL Approach --- p.106Chapter B.l --- Introduction --- p.106Chapter B.2 --- Normalized RBF net and Extended Normalized RBF Net --- p.108Chapter B.3 --- Demonstration --- p.110Chapter B.4 --- Remarks --- p.113Chapter C --- Adaptive RPCL-CLP Model for Financial Forecasting --- p.114Chapter C.1 --- Introduction --- p.114Chapter C.2 --- Extraction of Input Patterns and Outputs --- p.115Chapter C.3 --- RPCL-CLP Model --- p.116Chapter C.3.1 --- RPCL-CLP Architecture --- p.116Chapter C.3.2 --- Training Stage of RPCL-CLP --- p.117Chapter C.3.3 --- Prediction Stage of RPCL-CLP --- p.122Chapter C.4 --- Adaptive RPCL-CLP Model --- p.122Chapter C.4.1 --- Data Pre-and-Post Processing --- p.122Chapter C.4.2 --- Architecture and Implementation --- p.122Chapter C.5 --- Computer Experiments --- p.125Chapter C.5.1 --- Data Sets and Experimental Purpose --- p.125Chapter C.5.2 --- Experimental Results --- p.126Chapter C.6 --- Conclusion --- p.134Chapter D --- Publication List --- p.135Chapter D.1 --- Publication List --- p.13

    A perceptual learning model to discover the hierarchical latent structure of image collections

    Get PDF
    Biology has been an unparalleled source of inspiration for the work of researchers in several scientific and engineering fields including computer vision. The starting point of this thesis is the neurophysiological properties of the human early visual system, in particular, the cortical mechanism that mediates learning by exploiting information about stimuli repetition. Repetition has long been considered a fundamental correlate of skill acquisition andmemory formation in biological aswell as computational learning models. However, recent studies have shown that biological neural networks have differentways of exploiting repetition in forming memory maps. The thesis focuses on a perceptual learning mechanism called repetition suppression, which exploits the temporal distribution of neural activations to drive an efficient neural allocation for a set of stimuli. This explores the neurophysiological hypothesis that repetition suppression serves as an unsupervised perceptual learning mechanism that can drive efficient memory formation by reducing the overall size of stimuli representation while strengthening the responses of the most selective neurons. This interpretation of repetition is different from its traditional role in computational learning models mainly to induce convergence and reach training stability, without using this information to provide focus for the neural representations of the data. The first part of the thesis introduces a novel computational model with repetition suppression, which forms an unsupervised competitive systemtermed CoRe, for Competitive Repetition-suppression learning. The model is applied to generalproblems in the fields of computational intelligence and machine learning. Particular emphasis is placed on validating the model as an effective tool for the unsupervised exploration of bio-medical data. In particular, it is shown that the repetition suppression mechanism efficiently addresses the issues of automatically estimating the number of clusters within the data, as well as filtering noise and irrelevant input components in highly dimensional data, e.g. gene expression levels from DNA Microarrays. The CoRe model produces relevance estimates for the each covariate which is useful, for instance, to discover the best discriminating bio-markers. The description of the model includes a theoretical analysis using Huber’s robust statistics to show that the model is robust to outliers and noise in the data. The convergence properties of themodel also studied. It is shown that, besides its biological underpinning, the CoRe model has useful properties in terms of asymptotic behavior. By exploiting a kernel-based formulation for the CoRe learning error, a theoretically sound motivation is provided for the model’s ability to avoid local minima of its loss function. To do this a necessary and sufficient condition for global error minimization in vector quantization is generalized by extending it to distance metrics in generic Hilbert spaces. This leads to the derivation of a family of kernel-based algorithms that address the local minima issue of unsupervised vector quantization in a principled way. The experimental results show that the algorithm can achieve a consistent performance gain compared with state-of-the-art learning vector quantizers, while retaining a lower computational complexity (linear with respect to the dataset size). Bridging the gap between the low level representation of the visual content and the underlying high-level semantics is a major research issue of current interest. The second part of the thesis focuses on this problem by introducing a hierarchical and multi-resolution approach to visual content understanding. On a spatial level, CoRe learning is used to pool together the local visual patches by organizing them into perceptually meaningful intermediate structures. On the semantical level, it provides an extension of the probabilistic Latent Semantic Analysis (pLSA) model that allows discovery and organization of the visual topics into a hierarchy of aspects. The proposed hierarchical pLSA model is shown to effectively address the unsupervised discovery of relevant visual classes from pictorial collections, at the same time learning to segment the image regions containing the discovered classes. Furthermore, by drawing on a recent pLSA-based image annotation system, the hierarchical pLSA model is extended to process and representmulti-modal collections comprising textual and visual data. The results of the experimental evaluation show that the proposed model learns to attach textual labels (available only at the level of the whole image) to the discovered image regions, while increasing the precision/ recall performance with respect to flat, pLSA annotation model

    A combined method based on CNN architecture for variation-resistant facial recognition

    Get PDF
    Identifying individuals from a facial image is a technique that forms part of computer vision and is used in various fields such as security, digital biometrics, smartphones, and banking. However, it can prove difficult due to the complexity of facial structure and the presence of variations that can affect the results. To overcome this difficulty, in this paper, we propose a combined approach that aims to improve the accuracy and robustness of facial recognition in the presence of variations. To this end, two datasets (ORL and UMIST) are used to train our model. We then began with the image pre-processing phase, which consists in applying a histogram equalization operation to adjust the gray levels over the entire image surface to improve quality and enhance the detection of features in each image. Next, the least important features are eliminated from the images using the Principal Component Analysis (PCA) method. Finally, the pre-processed images are subjected to a neural network architecture (CNN) consisting of multiple convolution layers and fully connected layers. Our simulation results show a high performance of our approach, with accuracy rates of up to 99.50% for the ORL dataset and 100% for the UMIST dataset

    Fuzzy clustering for content-based indexing in multimedia databases.

    Get PDF
    Yue Ho-Yin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 129-137).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Problem Definition --- p.7Chapter 1.2 --- Contributions --- p.8Chapter 1.3 --- Thesis Organization --- p.10Chapter 2 --- Literature Review --- p.11Chapter 2.1 --- "Content-based Retrieval, Background and Indexing Problem" --- p.11Chapter 2.1.1 --- Feature Extraction --- p.12Chapter 2.1.2 --- Nearest-neighbor Search --- p.13Chapter 2.1.3 --- Content-based Indexing Methods --- p.15Chapter 2.2 --- Indexing Problems --- p.25Chapter 2.3 --- Data Clustering Methods for Indexing --- p.26Chapter 2.3.1 --- Probabilistic Clustering --- p.27Chapter 2.3.2 --- Possibilistic Clustering --- p.34Chapter 3 --- Fuzzy Clustering Algorithms --- p.37Chapter 3.1 --- Fuzzy Competitive Clustering --- p.38Chapter 3.2 --- Sequential Fuzzy Competitive Clustering --- p.40Chapter 3.3 --- Experiments --- p.43Chapter 3.3.1 --- Experiment 1: Data set with different number of samples --- p.44Chapter 3.3.2 --- Experiment 2: Data set on different dimensionality --- p.46Chapter 3.3.3 --- Experiment 3: Data set with different number of natural clusters inside --- p.55Chapter 3.3.4 --- Experiment 4: Data set with different noise level --- p.56Chapter 3.3.5 --- Experiment 5: Clusters with different geometry size --- p.60Chapter 3.3.6 --- Experiment 6: Clusters with different number of data instances --- p.67Chapter 3.3.7 --- Experiment 7: Performance on real data set --- p.71Chapter 3.4 --- Discussion --- p.72Chapter 3.4.1 --- "Differences Between FCC, SFCC, and Others Clustering Algorithms" --- p.72Chapter 3.4.2 --- Variations on SFCC --- p.75Chapter 3.4.3 --- Why SFCC? --- p.75Chapter 4 --- Hierarchical Indexing based on Natural Clusters Information --- p.77Chapter 4.1 --- The Hierarchical Approach --- p.77Chapter 4.2 --- The Sequential Fuzzy Competitive Clustering Binary Tree (SFCC- b-tree) --- p.79Chapter 4.2.1 --- Data Structure of SFCC-b-tree --- p.80Chapter 4.2.2 --- Tree Building of SFCC-b-Tree --- p.82Chapter 4.2.3 --- Insertion of SFCC-b-tree --- p.83Chapter 4.2.4 --- Deletion of SFCC-b-Tree --- p.84Chapter 4.2.5 --- Searching in SFCC-b-Tree --- p.84Chapter 4.3 --- Experiments --- p.88Chapter 4.3.1 --- Experimental Setting --- p.88Chapter 4.3.2 --- Experiment 8: Test for different leaf node sizes --- p.90Chapter 4.3.3 --- Experiment 9: Test for different dimensionality --- p.97Chapter 4.3.4 --- Experiment 10: Test for different sizes of data sets --- p.104Chapter 4.3.5 --- Experiment 11: Test for different data distributions --- p.109Chapter 4.4 --- Summary --- p.113Chapter 5 --- A Case Study on SFCC-b-tree --- p.114Chapter 5.1 --- Introduction --- p.114Chapter 5.2 --- Data Collection --- p.115Chapter 5.3 --- Data Pre-processing --- p.116Chapter 5.4 --- Experimental Results --- p.119Chapter 5.5 --- Summary --- p.121Chapter 6 --- Conclusion --- p.122Chapter 6.1 --- An Efficiency Formula --- p.122Chapter 6.1.1 --- Motivation --- p.122Chapter 6.1.2 --- Regression Model --- p.123Chapter 6.1.3 --- Discussion --- p.124Chapter 6.2 --- Future Directions --- p.127Chapter 6.3 --- Conclusion --- p.128Bibliography --- p.12

    Unsupervised Selection and Estimation of Non-Gaussian Mixtures for High Dimensional Data Analysis

    Get PDF
    Lately, the enormous generation of databases in almost every aspect of life has created a great demand for new, powerful tools for turning data into useful information. Therefore, researchers were encouraged to explore and develop new machine learning ideas and methods. Mixture models are one of the machine learning techniques receiving considerable attention due to their ability to handle efficiently and effectively multidimensional data. Generally, four critical issues have to be addressed when adopting mixture models in high dimensional spaces: (1) choice of the probability density functions, (2) estimation of the mixture parameters, (3) automatic determination of the number of components M in the mixture, and (4) determination of what features best discriminate among the different components. The main goal of this thesis is to summarize all these challenging interrelated problems in one unified model. In most of the applications, the Gaussian density is used in mixture modeling of data. Although a Gaussian mixture may provide a reasonable approximation to many real-world distributions, it is certainly not always the best approximation especially in computer vision and image processing applications where we often deal with non-Gaussian data. Therefore, we propose to use three highly flexible distributions: the generalized Gaussian distribution (GGD), the asymmetric Gaussian distribution (AGD), and the asymmetric generalized Gaussian distribution (AGGD). We are motivated by the fact that these distributions are able to fit many distributional shapes and then can be considered as a useful class of flexible models to address several problems and applications involving measurements and features having well-known marked deviation from the Gaussian shape. Recently, researches have shown that model selection and parameter learning are highly dependent and should be performed simultaneously. For this purpose, many approaches have been suggested. The vast majority of these approaches can be classified, from a computational point of view, into two classes: deterministic and stochastic methods. Deterministic methods estimate the model parameters for a set of candidate models using the Expectation-Maximization (EM) framework, then choose the model that maximizes a model selection criterion. Stochastic methods such as Markov chain Monte Carlo (MCMC) can be used in order to sample from the full a posteriori distribution with M considered unknown. Hence, in this thesis, we propose three learning techniques capable of automatically determining model complexity while learning its parameters. First, we incorporate a Minimum Message Length (MML) penalty in the model learning step performed using the EM algorithm. Our second approach employs the Rival Penalized EM (RPEM) algorithm which is able to select an appropriate number of densities by fading out the redundant densities from a density mixture. Last but not least, we incorporate the nonparametric aspect of mixture models by assuming a countably infinite number of components and using Markov Chain Monte Carlo (MCMC) simulations for the estimation of the posterior distributions. Hence, the difficulty of choosing the appropriate number of clusters is sidestepped by assuming that there are an infinite number of mixture components. Another essential issue in the case of statistical modeling in general and finite mixtures in particular is feature selection (i.e. identification of the relevant or discriminative features describing the data) especially in the case of high-dimensional data. Indeed, feature selection has been shown to be a crucial step in several image processing, computer vision and pattern recognition applications not only because it speeds up learning but also because it improves model accuracy and generalization. Moreover, the learning of the mixture parameters ( i.e. both model selection and parameters estimation) is greatly affected by the quality of the features used. Hence, in this thesis, we are trying to solve the feature selection problem in unsupervised learning by casting it as an estimation problem, thus avoiding any combinatorial search. Finally, the effectiveness of our approaches is evaluated by applying them to different computer vision and image processing applications
    corecore