28 research outputs found

    Knowledge-constrained projection of high-dimensional data

    Get PDF
    Projection of high-dimensional data is usually done by reducing dimensionality of the data and transforming the data to the latent space. We created synthetic data to simulate real gene-expression datasets and we tested methods on both synthetic and real data. With this work we address the visualization of our data through implementation of regularized singular value decomposition (SVD) for biclustering using L0-norm and L1-norm. Additional knowledge is introduced to the model through regularization with the two prior adjacency matrices. We show that L0-norm SVD and L1-norm SVD give better results than standard SVD

    Knowledge-constrained projection of high-dimensional data

    Get PDF
    Projection of high-dimensional data is usually done by reducing dimensionality of the data and transforming the data to the latent space. We created synthetic data to simulate real gene-expression datasets and we tested methods on both synthetic and real data. With this work we address the visualization of our data through implementation of regularized singular value decomposition (SVD) for biclustering using L0-norm and L1-norm. Additional knowledge is introduced to the model through regularization with the two prior adjacency matrices. We show that L0-norm SVD and L1-norm SVD give better results than standard SVD

    Statistical methods for learning sparse features

    Get PDF
    With the fast development of networking, data storage, and the data collection capacity, big data are now rapidly expanding in all science and engineering domains. When dealing with such data, it is appealing if we can extract the hidden sparse structure of the data since sparse structures allow us to understand and interpret the information better. The aim of this thesis is to develop algorithms that can extract such hidden sparse structures of the data in the context of both supervised learning and unsupervised learning. In chapter 1, this thesis first examines the limitation of the classical Fisher Discriminant Analysis (FDA), a supervised dimension reduction algorithm for multi-class classification problems. This limitation has been discussed by Cui (2012), and she has proposed a new objective function in her thesis, which is named Complementary Dimension Analysis (CDA) since each sequentially added new dimension boosts the discriminative power of the reduced space. A couple of extensions of CDA are discussed in this thesis, including sparse CDA (sCDA) in which the reduced subspace involves only a small fraction of the features, and Local CDA (LCDA) that handles multimodal data more appropriately by taking the local structure of the data into consideration. A combination of sCDA and LCDA is shown to work well with real examples and can return sparse directions from data with subtle local structures. In chapter 2, this thesis considers the problem of matrix decomposition that arises in many real applications such as gene repressive identification and context mining. The goal is to retrieve a multi- layer low-rank sparse decomposition from a high dimensional data matrix. Existing algorithms are all sequential algorithms, that is, the first layer is estimated, and then remaining layers are estimated one by one, by conditioning on the previous layers. As discussed in this thesis, such sequential approaches have some limitations. A new algorithm is proposed to address those limitations, where all the layers are solved simultaneously instead of sequentially. The proposed algorithm in chapter 2 is based on a complete data matrix. In many real applications and cross-validation procedures, one needs to work with a data matrix with missing values. How to operate the proposed matrix decomposition algorithm when there exist missing values is the main focus of chapter 3. The proposed solution seems to be slightly different from some existing work such as penalized matrix decomposition (PMD). In chapter 4, this thesis considers a Bayesian approach to sparse principal component analysis (PCA). An efficient algorithm, which is based on a hybrid of Expectation-Maximization (EM) and Variational-Bayes (VB), is proposed and it can be shown to achieve selection consistency when both p and n go to infinity. Empirical studies have demonstrated the competitive performance of the proposed algorithm

    Singular Value Decomposition for High Dimensional Data

    Get PDF
    Singular value decomposition is a widely used tool for dimension reduction in multivariate analysis. However, when used for statistical estimation in high-dimensional low rank matrix models, singular vectors of the noise-corrupted matrix are inconsistent for their counterparts of the true mean matrix. We suppose the true singular vectors have sparse representations in a certain basis. We propose an iterative thresholding algorithm that can estimate the subspaces spanned by leading left and right singular vectors and also the true mean matrix optimally under Gaussian assumption. We further turn the algorithm into a practical methodology that is fast, data-driven and robust to heavy-tailed noises. Simulations and a real data example further show its competitive performance. The dissertation contains two chapters. For the ease of the delivery, Chapter 1 is dedicated to the description and the study of the practical methodology and Chapter 2 states and proves the theoretical property of the algorithm under Gaussian noise
    corecore