1,322 research outputs found

    Joint Dimensionality Reduction for Two Feature Vectors

    Full text link
    Many machine learning problems, especially multi-modal learning problems, have two sets of distinct features (e.g., image and text features in news story classification, or neuroimaging data and neurocognitive data in cognitive science research). This paper addresses the joint dimensionality reduction of two feature vectors in supervised learning problems. In particular, we assume a discriminative model where low-dimensional linear embeddings of the two feature vectors are sufficient statistics for predicting a dependent variable. We show that a simple algorithm involving singular value decomposition can accurately estimate the embeddings provided that certain sample complexities are satisfied, without specifying the nonlinear link function (regressor or classifier). The main results establish sample complexities under multiple settings. Sample complexities for different link functions only differ by constant factors.Comment: 19 pages, 3 figure

    Angle-Based Joint and Individual Variation Explained

    Full text link
    Integrative analysis of disparate data blocks measured on a common set of experimental subjects is a major challenge in modern data analysis. This data structure naturally motivates the simultaneous exploration of the joint and individual variation within each data block resulting in new insights. For instance, there is a strong desire to integrate the multiple genomic data sets in The Cancer Genome Atlas to characterize the common and also the unique aspects of cancer genetics and cell biology for each source. In this paper we introduce Angle-Based Joint and Individual Variation Explained capturing both joint and individual variation within each data block. This is a major improvement over earlier approaches to this challenge in terms of a new conceptual understanding, much better adaption to data heterogeneity and a fast linear algebra computation. Important mathematical contributions are the use of score subspaces as the principal descriptors of variation structure and the use of perturbation theory as the guide for variation segmentation. This leads to an exploratory data analysis method which is insensitive to the heterogeneity among data blocks and does not require separate normalization. An application to cancer data reveals different behaviors of each type of signal in characterizing tumor subtypes. An application to a mortality data set reveals interesting historical lessons. Software and data are available at GitHub .Comment: arXiv admin note: text overlap with arXiv:1512.0406

    Learning relevant features for statistical inference

    Full text link
    Given two views of data, we consider the problem of finding the features of one view which can be most faithfully inferred from the other. We find that these are also the most correlated variables in the sense of deep canonical correlation analysis (DCCA). Moreover, we show that these variables can be used to construct a non-parametric representation of the implied joint probability distribution, which can be thought of as a classical version of the Schmidt decomposition of quantum states. This representation can be used to compute the expectations of functions over one view of data conditioned on the other, such as Bayesian estimators and their standard deviations. We test the approach using inference on occluded MNIST images, and show that our representation contains multiple modes. Surprisingly, when applied to supervised learning (one dataset consists of labels), this approach automatically provides regularization and faster convergence compared to the cross-entropy objective. We also explore using this approach to discover salient independent variables of a single dataset.Comment: Changes resulting from ICLR2020 submission and review. The presentation now accounts for the close connection to previous work on deep CC

    Disturbance Grassmann Kernels for Subspace-Based Learning

    Full text link
    In this paper, we focus on subspace-based learning problems, where data elements are linear subspaces instead of vectors. To handle this kind of data, Grassmann kernels were proposed to measure the space structure and used with classifiers, e.g., Support Vector Machines (SVMs). However, the existing discriminative algorithms mostly ignore the instability of subspaces, which would cause the classifiers misled by disturbed instances. Thus we propose considering all potential disturbance of subspaces in learning processes to obtain more robust classifiers. Firstly, we derive the dual optimization of linear classifiers with disturbance subject to a known distribution, resulting in a new kernel, Disturbance Grassmann (DG) kernel. Secondly, we research into two kinds of disturbance, relevant to the subspace matrix and singular values of bases, with which we extend the Projection kernel on Grassmann manifolds to two new kernels. Experiments on action data indicate that the proposed kernels perform better compared to state-of-the-art subspace-based methods, even in a worse environment.Comment: This paper include 3 figures, 10 pages, and has been accpeted to SIGKDD'1

    Online Low-Rank Subspace Learning from Incomplete Data: A Bayesian View

    Full text link
    Extracting the underlying low-dimensional space where high-dimensional signals often reside has long been at the center of numerous algorithms in the signal processing and machine learning literature during the past few decades. At the same time, working with incomplete (partly observed) large scale datasets has recently been commonplace for diverse reasons. This so called {\it big data era} we are currently living calls for devising online subspace learning algorithms that can suitably handle incomplete data. Their envisaged objective is to {\it recursively} estimate the unknown subspace by processing streaming data sequentially, thus reducing computational complexity, while obviating the need for storing the whole dataset in memory. In this paper, an online variational Bayes subspace learning algorithm from partial observations is presented. To account for the unawareness of the true rank of the subspace, commonly met in practice, low-rankness is explicitly imposed on the sought subspace data matrix by exploiting sparse Bayesian learning principles. Moreover, sparsity, {\it simultaneously} to low-rankness, is favored on the subspace matrix by the sophisticated hierarchical Bayesian scheme that is adopted. In doing so, the proposed algorithm becomes adept in dealing with applications whereby the underlying subspace may be also sparse, as, e.g., in sparse dictionary learning problems. As shown, the new subspace tracking scheme outperforms its state-of-the-art counterparts in terms of estimation accuracy, in a variety of experiments conducted on simulated and real data

    The Role of Principal Angles in Subspace Classification

    Full text link
    Subspace models play an important role in a wide range of signal processing tasks, and this paper explores how the pairwise geometry of subspaces influences the probability of misclassification. When the mismatch between the signal and the model is vanishingly small, the probability of misclassification is determined by the product of the sines of the principal angles between subspaces. When the mismatch is more significant, the probability of misclassification is determined by the sum of the squares of the sines of the principal angles. Reliability of classification is derived in terms of the distribution of signal energy across principal vectors. Larger principal angles lead to smaller classification error, motivating a linear transform that optimizes principal angles. The transform presented here (TRAIT) preserves some specific characteristic of each individual class, and this approach is shown to be complementary to a previously developed transform (LRT) that enlarges inter-class distance while suppressing intra-class dispersion. Theoretical results are supported by demonstration of superior classification accuracy on synthetic and measured data even in the presence of significant model mismatch

    Scaling Gaussian Process Regression with Derivatives

    Full text link
    Gaussian processes (GPs) with derivatives are useful in many applications, including Bayesian optimization, implicit surface reconstruction, and terrain reconstruction. Fitting a GP to function values and derivatives at nn points in dd dimensions requires linear solves and log determinants with an n(d+1)×n(d+1){n(d+1) \times n(d+1)} positive definite matrix -- leading to prohibitive O(n3d3)\mathcal{O}(n^3d^3) computations for standard direct methods. We propose iterative solvers using fast O(nd)\mathcal{O}(nd) matrix-vector multiplications (MVMs), together with pivoted Cholesky preconditioning that cuts the iterations to convergence by several orders of magnitude, allowing for fast kernel learning and prediction. Our approaches, together with dimensionality reduction, enables Bayesian optimization with derivatives to scale to high-dimensional problems and large evaluation budgets.Comment: Appears at Advances in Neural Information Processing Systems 32 (NIPS), 201

    Sweep Distortion Removal from THz Images via Blind Demodulation

    Full text link
    Heavy sweep distortion induced by alignments and inter-reflections of layers of a sample is a major burden in recovering 2D and 3D information in time resolved spectral imaging. This problem cannot be addressed by conventional denoising and signal processing techniques as it heavily depends on the physics of the acquisition. Here we propose and implement an algorithmic framework based on low-rank matrix recovery and alternating minimization that exploits the forward model for THz acquisition. The method allows recovering the original signal in spite of the presence of temporal-spatial distortions. We address a blind-demodulation problem, where based on several observations of the sample texture modulated by an undesired sweep pattern, the two classes of signals are separated. The performance of the method is examined in both synthetic and experimental data, and the successful reconstructions are demonstrated. The proposed general scheme can be implemented to advance inspection and imaging applications in THz and other time-resolved sensing modalities

    Learning Latent Variable Gaussian Graphical Models

    Full text link
    Gaussian graphical models (GGM) have been widely used in many high-dimensional applications ranging from biological and financial data to recommender systems. Sparsity in GGM plays a central role both statistically and computationally. Unfortunately, real-world data often does not fit well to sparse graphical models. In this paper, we focus on a family of latent variable Gaussian graphical models (LVGGM), where the model is conditionally sparse given latent variables, but marginally non-sparse. In LVGGM, the inverse covariance matrix has a low-rank plus sparse structure, and can be learned in a regularized maximum likelihood framework. We derive novel parameter estimation error bounds for LVGGM under mild conditions in the high-dimensional setting. These results complement the existing theory on the structural learning, and open up new possibilities of using LVGGM for statistical inference.Comment: To appear in The 31st International Conference on Machine Learning (ICML 2014

    Low-Rank Modeling and Its Applications in Image Analysis

    Full text link
    Low-rank modeling generally refers to a class of methods that solve problems by representing variables of interest as low-rank matrices. It has achieved great success in various fields including computer vision, data mining, signal processing and bioinformatics. Recently, much progress has been made in theories, algorithms and applications of low-rank modeling, such as exact low-rank matrix recovery via convex programming and matrix completion applied to collaborative filtering. These advances have brought more and more attentions to this topic. In this paper, we review the recent advance of low-rank modeling, the state-of-the-art algorithms, and related applications in image analysis. We first give an overview to the concept of low-rank modeling and challenging problems in this area. Then, we summarize the models and algorithms for low-rank matrix recovery and illustrate their advantages and limitations with numerical experiments. Next, we introduce a few applications of low-rank modeling in the context of image analysis. Finally, we conclude this paper with some discussions.Comment: To appear in ACM Computing Survey
    • …
    corecore