708 research outputs found

    Data-Adaptive Kernel Support Vector Machine

    Get PDF
    In this thesis, we propose the data-adaptive kernel Support Vector Machine (SVM), a new method with a data-driven scaling kernel function based on real data sets. This two-stage approach of kernel function scaling can enhance the accuracy of a support vector machine, especially when the data are imbalanced. Followed by the standard SVM procedure in the first stage, the proposed method locally adapts the kernel function to data locations based on the skewness of the class outcomes. In the second stage, the decision rule is constructed with the data-adaptive kernel function and is used as the classifier. This process enlarges the magnification effect directly on the Riemannian manifold within the feature space rather than the input space. The proposed data-adaptive kernel SVM technique is applied in the binary classification, and is extended to the multi-class situations when imbalance is a main concern. We conduct extensive simulation studies to assess the performance of the proposed methods, and the prostate cancer image study is employed as an illustration. The data-adaptive kernel is further applied in feature selection process. We propose the data-adaptive kernel-penalized SVM, a new method of simultaneous feature selection and classification by penalizing data-adaptive kernels in SVMs. Instead of penalizing the standard cost function of SVMs in the usual way, the penalty will be directly added to the dual objective function that contains the data-adaptive kernel. Classification results with sparse features selected can be obtained simultaneously. Different penalty terms in the data-adaptive kernel-penalized SVM will be compared. The oracle property of the estimator is examined. We conduct extensive simulation studies to assess the performance of all the proposed methods, and employ the method on a breast cancer data set as an illustration. The data-adaptive kernel is further applied in feature selection process. We propose the data-adaptive kernel-penalized SVM, a new method of simultaneous feature selection and classification by penalizing data-adaptive kernels in SVMs. Instead of penalizing the standard cost function of SVMs in the usual way, the penalty will be directly added to the dual objective function that contains the data-adaptive kernel. Classification results with sparse features selected can be obtained simultaneously. Different penalty terms in the data-adaptive kernel-penalized SVM will be compared. The oracle property of the estimator is examined. We conduct extensive simulation studies to assess the performance of all the proposed methods, and employ the method on a breast cancer data set as an illustration

    A Feature Selection Method for Multivariate Performance Measures

    Full text link
    Feature selection with specific multivariate performance measures is the key to the success of many applications, such as image retrieval and text classification. The existing feature selection methods are usually designed for classification error. In this paper, we propose a generalized sparse regularizer. Based on the proposed regularizer, we present a unified feature selection framework for general loss functions. In particular, we study the novel feature selection paradigm by optimizing multivariate performance measures. The resultant formulation is a challenging problem for high-dimensional data. Hence, a two-layer cutting plane algorithm is proposed to solve this problem, and the convergence is presented. In addition, we adapt the proposed method to optimize multivariate measures for multiple instance learning problems. The analyses by comparing with the state-of-the-art feature selection methods show that the proposed method is superior to others. Extensive experiments on large-scale and high-dimensional real world datasets show that the proposed method outperforms l1l_1-SVM and SVM-RFE when choosing a small subset of features, and achieves significantly improved performances over SVMperf^{perf} in terms of F1F_1-score

    Algorithmes d'estimation pour la classification parcimonieuse

    Get PDF
    Cette thèse traite du développement d'algorithmes d'estimation en haute dimension. Ces algorithmes visent à résoudre des problèmes de discrimination et de classification, notamment, en incorporant un mécanisme de sélection des variables pertinentes. Les contributions de cette thèse se concrétisent par deux algorithmes, GLOSS pour la discrimination et Mix-GLOSS pour la classification. Tous les deux sont basés sur le résolution d'une régression régularisée de type "optimal scoring" avec une formulation quadratique de la pénalité group-Lasso qui encourage l'élimination des descripteurs non-significatifs. Les fondements théoriques montrant que la régression de type "optimal scoring" pénalisée avec un terme "group-Lasso" permet de résoudre un problème d'analyse discriminante linéaire ont été développés ici pour la première fois. L'adaptation de cette théorie pour la classification avec l'algorithme EM n'est pas nouvelle, mais elle n'a jamais été détaillée précisément pour les pénalités qui induisent la parcimonie. Cette thèse démontre solidement que l'utilisation d'une régression de type "optimal scoring" pénalisée avec un terme "group-Lasso" à l'intérieur d'une boucle EM est possible. Nos algorithmes ont été testés avec des bases de données réelles et artificielles en haute dimension avec des résultats probants en terme de parcimonie, et ce, sans compromettre la performance du classifieur.This thesis deals with the development of estimation algorithms with embedded feature selection the context of high dimensional data, in the supervised and unsupervised frameworks. The contributions of this work are materialized by two algorithms, GLOSS for the supervised domain and Mix-GLOSS for unsupervised counterpart. Both algorithms are based on the resolution of optimal scoring regression regularized with a quadratic formulation of the group-Lasso penalty which encourages the removal of uninformative features. The theoretical foundations that prove that a group-Lasso penalized optimal scoring regression can be used to solve a linear discriminant analysis bave been firstly developed in this work. The theory that adapts this technique to the unsupervised domain by means of the EM algorithm is not new, but it has never been clearly exposed for a sparsity-inducing penalty. This thesis solidly demonstrates that the utilization of group-Lasso penalized optimal scoring regression inside an EM algorithm is possible. Our algorithms have been tested with real and artificial high dimensional databases with impressive resuits from the point of view of the parsimony without compromising prediction performances.COMPIEGNE-BU (601592101) / SudocSudocFranceF

    Flexible Graph-based Learning with Applications to Genetic Data Analysis

    Get PDF
    With the abundance of increasingly complex and high dimensional data in many scientific disciplines, graphical models have become an extremely useful statistical tool to explore data structures. In this dissertation, we study graphical models from two perspectives: i) to enhance supervised learning, classification in particular, and ii) graphical model estimation for specific data types. For classification, the optimal classifier is often connected with the feature structure within each class. In the first project, starting from the Gaussian population scenario, we aim to find an approach to utilize the graphical structure information of the features in classification. With respect to graphical models, many existing graphical estimation methods have been proposed based on a homogeneous Gaussian population. Due to the Gaussian assumption, these methods may not be suitable for many typical genetic data. For instance, the gene expression data may come from individuals of multiple populations with possibly distinct graphical structures. Another instance would be the single cell RNA-sequencing data, which are featured by substantial sample dependence and zero-inflation. In the second and the third project, we propose multiple graphical model estimation methods for these scenarios respectively. In particular, two dependent count-data graphical models are introduced for the latter case. Both numerical and theoretical studies are performed to demonstrate the effectiveness of these methods.Doctor of Philosoph

    A New Regularized Matrix Discriminant Analysis (R-MDA) Enabled Human-Centered EEG Monitoring Systems

    Get PDF
    The wider use of wearable devices for electroencephalogram (EEG) data capturing providesa very useful way for the monitoring and self-management of human health. However, the large volumesof data with high dimensions cause computational complexity in EEG data processing and pose a greatchallenge to the use of wearable EEG devices in healthcare. This paper proposes a new approach to extract thestructural information of EEG data and tackle the curse of dimensionality of the EEG data. A set of methodsfor dimensionality reduction (DR)-like linear discriminant analysis (LDA) and their improved methodshave been developed for EEG processing in the literature. However, the existing LDA-related methodssuffer from the singularity problem or expensive computational cost, and none of existing methods takeinto consideration the structure of the projection matrix, which is crucial for the extraction of the structuralinformation of the EEG data. In this paper, a new method called a regularized matrix discriminant analysis(R-MDA) is proposed for EEG feature representation and DR. In the R-MDA, the EEG data are representedas a data matrix, and projection vectors are reshaped to be a set of projection matrices stacking together. Byreformulating the LDA as a least-square formulation and imposing specified constraint on each projectionmatrix, the new R-MDA has been constructed to effectively reduce EEG dimensions and capturing thestructural information of the EEG data. Experimental results demonstrate that this new R-MDA outperformsthe existing LDA-related methods, including achieving improved accuracy with significant DR of the EEGdata. This offers an effective way to enable wearable EEG devices be applicable in human-centered healthmonitorin

    Fully Bayesian Logistic Regression with Hyper-Lasso Priors for High-dimensional Feature Selection

    Full text link
    High-dimensional feature selection arises in many areas of modern science. For example, in genomic research we want to find the genes that can be used to separate tissues of different classes (e.g. cancer and normal) from tens of thousands of genes that are active (expressed) in certain tissue cells. To this end, we wish to fit regression and classification models with a large number of features (also called variables, predictors). In the past decade, penalized likelihood methods for fitting regression models based on hyper-LASSO penalization have received increasing attention in the literature. However, fully Bayesian methods that use Markov chain Monte Carlo (MCMC) are still in lack of development in the literature. In this paper we introduce an MCMC (fully Bayesian) method for learning severely multi-modal posteriors of logistic regression models based on hyper-LASSO priors (non-convex penalties). Our MCMC algorithm uses Hamiltonian Monte Carlo in a restricted Gibbs sampling framework; we call our method Bayesian logistic regression with hyper-LASSO (BLRHL) priors. We have used simulation studies and real data analysis to demonstrate the superior performance of hyper-LASSO priors, and to investigate the issues of choosing heaviness and scale of hyper-LASSO priors.Comment: 33 pages. arXiv admin note: substantial text overlap with arXiv:1308.469

    REC: Fast sparse regression-based multicategory classification

    Get PDF
    Recent advance in technology enables researchers to gather and store enormous data sets with ultra high dimensionality. In bioinformatics, microarray and next generation sequencing technologies can produce data with tens of thousands of predictors of biomarkers. On the other hand, the corresponding sample sizes are often limited. For classification problems, to predict new observations with high accuracy, and to better understand the effect of predictors on classification, it is desirable, and often necessary, to train the classifier with variable selection. In the literature, sparse regularized classification techniques have been popular due to the ability of simultaneous classification and variable selection. Despite its success, such a sparse penalized method may have low computational speed, when the dimension of the problem is ultra high. To overcome this challenge, we propose a new sparse REgression based multicategory Classifier (REC). Our method uses a simplex to represent different categories of the classification problem. A major advantage of REC is that the optimization can be decoupled into smaller independent sparse penalized regression problems, and hence solved by using parallel computing. Consequently, REC enjoys an extraordinarily fast computational speed. Moreover, REC is able to provide class conditional probability estimation. Simulated examples and applications on microarray and next generation sequencing data suggest that REC is very competitive when compared to several existing methods
    • …
    corecore