    A Novel Hybrid Dimensionality Reduction Method using Support Vector Machines and Independent Component Analysis

    Due to the increasing demand for high dimensional data analysis from various applications such as electrocardiogram signal analysis and gene expression analysis for cancer detection, dimensionality reduction becomes a viable process to extracts essential information from data such that the high-dimensional data can be represented in a more condensed form with much lower dimensionality to both improve classification accuracy and reduce computational complexity. Conventional dimensionality reduction methods can be categorized into stand-alone and hybrid approaches. The stand-alone method utilizes a single criterion from either supervised or unsupervised perspective. On the other hand, the hybrid method integrates both criteria. Compared with a variety of stand-alone dimensionality reduction methods, the hybrid approach is promising as it takes advantage of both the supervised criterion for better classification accuracy and the unsupervised criterion for better data representation, simultaneously. However, several issues always exist that challenge the efficiency of the hybrid approach, including (1) the difficulty in finding a subspace that seamlessly integrates both criteria in a single hybrid framework, (2) the robustness of the performance regarding noisy data, and (3) nonlinear data representation capability. This dissertation presents a new hybrid dimensionality reduction method to seek projection through optimization of both structural risk (supervised criterion) from Support Vector Machine (SVM) and data independence (unsupervised criterion) from Independent Component Analysis (ICA). The projection from SVM directly contributes to classification performance improvement in a supervised perspective whereas maximum independence among features by ICA construct projection indirectly achieving classification accuracy improvement due to better intrinsic data representation in an unsupervised perspective. For linear dimensionality reduction model, I introduce orthogonality to interrelate both projections from SVM and ICA while redundancy removal process eliminates a part of the projection vectors from SVM, leading to more effective dimensionality reduction. The orthogonality-based linear hybrid dimensionality reduction method is extended to uncorrelatedness-based algorithm with nonlinear data representation capability. In the proposed approach, SVM and ICA are integrated into a single framework by the uncorrelated subspace based on kernel implementation. Experimental results show that the proposed approaches give higher classification performance with better robustness in relatively lower dimensions than conventional methods for high-dimensional datasets

    Monitoring Nonlinear and Non-Gaussian Processes Using Gaussian Mixture Model-Based Weighted Kernel Independent Component Analysis

    Advances in variational Bayesian nonlinear blind source separation

    Linear data analysis methods such as factor analysis (FA), independent component analysis (ICA) and blind source separation (BSS) as well as state-space models such as the Kalman filter model are used in a wide range of applications. In many of these, linearity is just a convenient approximation while the underlying effect is nonlinear. It would therefore be more appropriate to use nonlinear methods. In this work, nonlinear generalisations of FA and ICA/BSS are presented. The methods are based on a generative model, with a multilayer perceptron (MLP) network to model the nonlinearity from the latent variables to the observations. The model is estimated using variational Bayesian learning. The variational Bayesian method is well-suited for the nonlinear data analysis problems. The approach is also theoretically interesting, as essentially the same method is used in several different fields and can be derived from several different starting points, including statistical physics, information theory, Bayesian statistics, and information geometry. These complementary views can provide benefits for interpretation of the operation of the learning method and its results. Much of the work presented in this thesis consists of improvements that make the nonlinear factor analysis and blind source separation methods faster and more stable, while being applicable to other learning problems as well. The improvements include methods to accelerate convergence of alternating optimisation algorithms such as the EM algorithm and an improved approximation of the moments of a nonlinear transform of a multivariate probability distribution. These improvements can be easily applied to other models besides FA and ICA/BSS, such as nonlinear state-space models. A specialised version of the nonlinear factor analysis method for post-nonlinear mixtures is presented as well.reviewe

    Low-Density Cluster Separators for Large, High-Dimensional, Mixed and Non-Linearly Separable Data.

    The location of groups of similar observations (clusters) in data is a well-studied problem, and has many practical applications. There are a wide range of approaches to clustering, which rely on different definitions of similarity, and are appropriate for datasets with different characteristics. Despite a rich literature, there exist a number of open problems in clustering, and limitations to existing algorithms. This thesis develops methodology for clustering high-dimensional, mixed datasets with complex clustering structures, using low-density cluster separators that bi-partition datasets using cluster boundaries that pass through regions of minimal density, separating regions of high probability density, associated with clusters. The bi-partitions arising from a succession of minimum density cluster separators are combined using divisive hierarchical and partitional algorithms, to locate a complete clustering, while estimating the number of clusters. The proposed algorithms locate cluster separators using one-dimensional arbitrarily oriented subspaces, circumventing the challenges associated with clustering in high-dimensional spaces. This requires continuous observations; thus, to extend the applicability of the proposed algorithms to mixed datasets, methods for producing an appropriate continuous representation of datasets containing non-continuous features are investigated. The exact evaluation of the density intersected by a cluster boundary is restricted to linear separators. This limitation is lifted by a non-linear mapping of the original observations into a feature space, in which a linear separator permits the correct identification of non-linearly separable clusters in the original dataset. In large, high-dimensional datasets, searching for one-dimensional subspaces, which result in a minimum density separator is computationally expensive. Therefore, a computationally efficient approach to low-density cluster separation using approximately optimal projection directions is proposed, which searches over a collection of one-dimensional random projections for an appropriate subspace for cluster identification. The proposed approaches produce high-quality partitions, that are competitive with well-established and state-of-the-art algorithms

    Kernel methods for measuring independence

    We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prove that when the RKHSs are universal, both functionals are zero if and only if the random variables are pairwise independent. We also show that the kernel mutual information is an upper bound near independence on the Parzen window estimate of the mutual information. Analogous results apply for two correlation-based dependence functionals introduced earlier: we show the kernel canonical correlation and the kernel generalised variance to be independence measures for universal kernels, and prove the latter to be an upper bound on the mutual information near independence. The performance of the kernel dependence functionals in measuring independence is verified in the context of independent component analysis

    Recognizing Faces -- An Approach Based on Gabor Wavelets

    As a hot research topic over the last 25 years, face recognition still seems to be a difficult and largely problem. Distortions caused by variations in illumination, expression and pose are the main challenges to be dealt with by researchers in this field. Efficient recognition algorithms, robust against such distortions, are the main motivations of this research. Based on a detailed review on the background and wide applications of Gabor wavelet, this powerful and biologically driven mathematical tool is adopted to extract features for face recognition. The features contain important local frequency information and have been proven to be robust against commonly encountered distortions. To reduce the computation and memory cost caused by the large feature dimension, a novel boosting based algorithm is proposed and successfully applied to eliminate redundant features. The selected features are further enhanced by kernel subspace methods to handle the nonlinear face variations. The efficiency and robustness of the proposed algorithm is extensively tested using the ORL, FERET and BANCA databases. To normalize the scale and orientation of face images, a generalized symmetry measure based algorithm is proposed for automatic eye location. Without the requirement of a training process, the method is simple, fast and fully tested using thousands of images from the BioID and BANCA databases. An automatic user identification system, consisting of detection, recognition and user management modules, has been developed. The system can effectively detect faces from real video streams, identify them and retrieve corresponding user information from the application database. Different detection and recognition algorithms can also be easily integrated into the framework

    Cluster-Based Supervised Classification

    Functional-input metamodeling: an application to coastal flood early warning

    Les inondations en général affectent plus de personnes que tout autre catastrophe. Au cours de la dernière décennie du 20ème siècle, plus de 1.5 milliard de personnes ont été affectées. Afin d'atténuer l'impact de ce type de catastrophe, un effort scientifique significatif a été consacré à la constitution de codes de simulation numériques pour la gestion des risques. Les codes disponibles permettent désormais de modéliser correctement les événements d'inondation côtière à une résolution assez élevée. Malheureusement, leur utilisation est fortement limitée pour l'alerte précoce, avec une simulation de quelques heures de dynamique maritime prenant plusieurs heures à plusieurs jours de temps de calcul. Cette thèse fait partie du projet ANR RISCOPE, qui vise à remédier cette limitation en construisant des métamodèles pour substituer les codes hydrodynamiques coûteux en temps de calcul. En tant qu'exigence particulière de cette application, le métamodèle doit être capable de traiter des entrées fonctionnelles correspondant à des conditions maritimes variant dans le temps. À cette fin, nous nous sommes concentrés sur les métamodèles de processus Gaussiens, développés à l'origine pour des entrées scalaires, mais maintenant disponibles aussi pour des entrées fonctionnelles. La nature des entrées a donné lieu à un certain nombre de questions sur la bonne façon de les représenter dans le métamodèle: (i) quelles entrées fonctionnelles méritent d'être conservées en tant que prédicteurs, (ii) quelle méthode de réduction de dimension (e.g., B-splines, PCA, PLS) est idéale, (iii) quelle est une dimension de projection appropriée, et (iv) quelle est une distance adéquate pour mesurer les similitudes entre les points d'entrée fonctionnels dans la fonction de covariance. Certaines de ces caractéristiques - appelées ici paramètres structurels - du modèle et d'autres telles que la famille de covariance (e.g., Gaussien, Matérn 5/2) sont souvent arbitrairement choisies a priori. Comme nous l'avons montré à travers des expériences, ces décisions peuvent avoir un fort impact sur la capacité de prédiction du métamodèle. Ainsi, sans perdre de vue notre but de contribuer à l'amélioration de l'alerte précoce des inondations côtières, nous avons entrepris la construction d'une méthodologie efficace pour définir les paramètres structurels du modèle. Comme première solution, nous avons proposé une approche d'exploration basée sur la Méthodologie de Surface de Réponse. Elle a été utilisé efficacement pour configurer le métamodèle requis pour une fonction de test analytique, ainsi que pour une version simplifiée du code étudié dans RISCOPE. Bien que relativement simple, la méthodologie proposée a pu trouver des configurations de métamodèles de capacité de prédiction élevée avec des économies allant jusqu'à 76.7% et 38.7% du temps de calcul utilisé par une approche d'exploration exhaustive dans les deux cas étudiés. La solution trouvée par notre méthodologie était optimale dans la plupart des cas. Nous avons développé plus tard un deuxième prototype basé sur l'Optimisation par Colonies de Fourmis. Cette nouvelle approche est supérieure en termes de temps de solution et de flexibilité sur les configurations du modèle qu'elle permet d'explorer. Cette méthode explore intelligemment l'espace de solution et converge progressivement vers la configuration optimale. La collection d'outils statistiques utilisés dans cette thèse a motivé le développement d'un package R appelé funGp. Celui-ci est maintenant disponible dans GitHub et sera soumis prochainement au CRAN. Dans un travail indépendant, nous avons étudié l'estimation des paramètres de covariance d'un processus Gaussien transformé par Maximum de Vraisemblance (MV) et Validation Croisée. Nous avons montré la consistance et la normalité asymptotique des deux estimateurs. Dans le cas du MV, ces résultats peuvent être interprétés comme une preuve de robustesse du MV Gaussien dans le cas de processus non Gaussiens.Currently, floods in general affect more people than any other hazard. In just the last decade of the 20th century, more than 1.5 billion were affected. In the seek to mitigate the impact of this type of hazard, strong scientific effort has been devoted to the constitution of computer codes that could be used as risk management tools. Available computer models now allow properly modelling coastal flooding events at a fairly high resolution. Unfortunately, their use is strongly prohibitive for early warning, with a simulation of few hours of maritime dynamics taking several hours to days of processing time, even on multi-processor clusters. This thesis is part of the ANR RISCOPE project, which aims at addressing this limitation by means of surrogate modeling of the hydrodynamic computer codes. As a particular requirement of this application, the metamodel should be able to deal with functional inputs corresponding to time varying maritime conditions. To this end, we focused on Gaussian process metamodels, originally developed for scalar inputs, but now available also for functional inputs. The nature of the inputs gave rise to a number of questions about the proper way to represent them in the metamodel: (i) which functional inputs are worth keeping as predictors, (ii) which dimension reduction method (e.g., B-splines, PCA, PLS) is ideal, (iii) which is a suitable projection dimension, and given our choice to work with Gaussian process metamodels, also the question of (iv) which is a convenient distance to measure similarities between functional input points within the kernel function. Some of these characteristics - hereon called structural parameters - of the model and some others such as the family of kernel (e.g., Gaussian, Matérn 5/2) are often arbitrarily chosen a priori. Sometimes, those are selected based on other studies. As one may intuit and has been shown by us through experiments, those decisions could have a strong impact on the prediction capability of the resulting model. Thus, without losing sight of our final goal of contributing to the improvement of coastal flooding early warning, we undertook the construction of an efficient methodology to set up the structural parameters of the model. As a first solution, we proposed an exploration approach based on the Response Surface Methodology. It was effectively used to tune the metamodel for an analytic toy function, as well as for a simplified version of the code studied in RISCOPE. While relatively simple, the proposed methodology was able to find metamodel configurations of high prediction capability with savings of up to 76.7% and 38.7% of the time spent by an exhaustive search approach in the analytic case and coastal flooding case, respectively. The solution found by our methodology was optimal in most cases. We developed later a second prototype based on Ant Colony Optimization (ACO). This new approach is more powerful in terms of solution time and flexibility in the features of the model allowed to be explored. The ACO based method smartly samples the solution space and progressively converges towards the optimal configuration. The collection of statistical tools used for metamodeling in this thesis motivated the development of the funGp R package, which is now available in GitHub and about to be submitted to CRAN. In an independent work, we studied the estimation of the covariance parameters of a Transformed Gaussian Process by Maximum Likelihood (ML) and Cross Validation. We showed that both estimators are consistent and asymptotically normal. In the case of ML, these results can be interpreted as a proof of robustness of Gaussian ML in the case of non-Gaussian processes