1,322 research outputs found
Joint Dimensionality Reduction for Two Feature Vectors
Many machine learning problems, especially multi-modal learning problems,
have two sets of distinct features (e.g., image and text features in news story
classification, or neuroimaging data and neurocognitive data in cognitive
science research). This paper addresses the joint dimensionality reduction of
two feature vectors in supervised learning problems. In particular, we assume a
discriminative model where low-dimensional linear embeddings of the two feature
vectors are sufficient statistics for predicting a dependent variable. We show
that a simple algorithm involving singular value decomposition can accurately
estimate the embeddings provided that certain sample complexities are
satisfied, without specifying the nonlinear link function (regressor or
classifier). The main results establish sample complexities under multiple
settings. Sample complexities for different link functions only differ by
constant factors.Comment: 19 pages, 3 figure
Angle-Based Joint and Individual Variation Explained
Integrative analysis of disparate data blocks measured on a common set of
experimental subjects is a major challenge in modern data analysis. This data
structure naturally motivates the simultaneous exploration of the joint and
individual variation within each data block resulting in new insights. For
instance, there is a strong desire to integrate the multiple genomic data sets
in The Cancer Genome Atlas to characterize the common and also the unique
aspects of cancer genetics and cell biology for each source. In this paper we
introduce Angle-Based Joint and Individual Variation Explained capturing both
joint and individual variation within each data block. This is a major
improvement over earlier approaches to this challenge in terms of a new
conceptual understanding, much better adaption to data heterogeneity and a fast
linear algebra computation. Important mathematical contributions are the use of
score subspaces as the principal descriptors of variation structure and the use
of perturbation theory as the guide for variation segmentation. This leads to
an exploratory data analysis method which is insensitive to the heterogeneity
among data blocks and does not require separate normalization. An application
to cancer data reveals different behaviors of each type of signal in
characterizing tumor subtypes. An application to a mortality data set reveals
interesting historical lessons. Software and data are available at GitHub
.Comment: arXiv admin note: text overlap with arXiv:1512.0406
Learning relevant features for statistical inference
Given two views of data, we consider the problem of finding the features of
one view which can be most faithfully inferred from the other. We find that
these are also the most correlated variables in the sense of deep canonical
correlation analysis (DCCA). Moreover, we show that these variables can be used
to construct a non-parametric representation of the implied joint probability
distribution, which can be thought of as a classical version of the Schmidt
decomposition of quantum states. This representation can be used to compute the
expectations of functions over one view of data conditioned on the other, such
as Bayesian estimators and their standard deviations. We test the approach
using inference on occluded MNIST images, and show that our representation
contains multiple modes. Surprisingly, when applied to supervised learning (one
dataset consists of labels), this approach automatically provides
regularization and faster convergence compared to the cross-entropy objective.
We also explore using this approach to discover salient independent variables
of a single dataset.Comment: Changes resulting from ICLR2020 submission and review. The
presentation now accounts for the close connection to previous work on deep
CC
Disturbance Grassmann Kernels for Subspace-Based Learning
In this paper, we focus on subspace-based learning problems, where data
elements are linear subspaces instead of vectors. To handle this kind of data,
Grassmann kernels were proposed to measure the space structure and used with
classifiers, e.g., Support Vector Machines (SVMs). However, the existing
discriminative algorithms mostly ignore the instability of subspaces, which
would cause the classifiers misled by disturbed instances. Thus we propose
considering all potential disturbance of subspaces in learning processes to
obtain more robust classifiers. Firstly, we derive the dual optimization of
linear classifiers with disturbance subject to a known distribution, resulting
in a new kernel, Disturbance Grassmann (DG) kernel. Secondly, we research into
two kinds of disturbance, relevant to the subspace matrix and singular values
of bases, with which we extend the Projection kernel on Grassmann manifolds to
two new kernels. Experiments on action data indicate that the proposed kernels
perform better compared to state-of-the-art subspace-based methods, even in a
worse environment.Comment: This paper include 3 figures, 10 pages, and has been accpeted to
SIGKDD'1
Online Low-Rank Subspace Learning from Incomplete Data: A Bayesian View
Extracting the underlying low-dimensional space where high-dimensional
signals often reside has long been at the center of numerous algorithms in the
signal processing and machine learning literature during the past few decades.
At the same time, working with incomplete (partly observed) large scale
datasets has recently been commonplace for diverse reasons. This so called {\it
big data era} we are currently living calls for devising online subspace
learning algorithms that can suitably handle incomplete data. Their envisaged
objective is to {\it recursively} estimate the unknown subspace by processing
streaming data sequentially, thus reducing computational complexity, while
obviating the need for storing the whole dataset in memory. In this paper, an
online variational Bayes subspace learning algorithm from partial observations
is presented. To account for the unawareness of the true rank of the subspace,
commonly met in practice, low-rankness is explicitly imposed on the sought
subspace data matrix by exploiting sparse Bayesian learning principles.
Moreover, sparsity, {\it simultaneously} to low-rankness, is favored on the
subspace matrix by the sophisticated hierarchical Bayesian scheme that is
adopted. In doing so, the proposed algorithm becomes adept in dealing with
applications whereby the underlying subspace may be also sparse, as, e.g., in
sparse dictionary learning problems. As shown, the new subspace tracking scheme
outperforms its state-of-the-art counterparts in terms of estimation accuracy,
in a variety of experiments conducted on simulated and real data
The Role of Principal Angles in Subspace Classification
Subspace models play an important role in a wide range of signal processing
tasks, and this paper explores how the pairwise geometry of subspaces
influences the probability of misclassification. When the mismatch between the
signal and the model is vanishingly small, the probability of misclassification
is determined by the product of the sines of the principal angles between
subspaces. When the mismatch is more significant, the probability of
misclassification is determined by the sum of the squares of the sines of the
principal angles. Reliability of classification is derived in terms of the
distribution of signal energy across principal vectors. Larger principal angles
lead to smaller classification error, motivating a linear transform that
optimizes principal angles. The transform presented here (TRAIT) preserves some
specific characteristic of each individual class, and this approach is shown to
be complementary to a previously developed transform (LRT) that enlarges
inter-class distance while suppressing intra-class dispersion. Theoretical
results are supported by demonstration of superior classification accuracy on
synthetic and measured data even in the presence of significant model mismatch
Scaling Gaussian Process Regression with Derivatives
Gaussian processes (GPs) with derivatives are useful in many applications,
including Bayesian optimization, implicit surface reconstruction, and terrain
reconstruction. Fitting a GP to function values and derivatives at points
in dimensions requires linear solves and log determinants with an positive definite matrix -- leading to prohibitive
computations for standard direct methods. We propose
iterative solvers using fast matrix-vector multiplications
(MVMs), together with pivoted Cholesky preconditioning that cuts the iterations
to convergence by several orders of magnitude, allowing for fast kernel
learning and prediction. Our approaches, together with dimensionality
reduction, enables Bayesian optimization with derivatives to scale to
high-dimensional problems and large evaluation budgets.Comment: Appears at Advances in Neural Information Processing Systems 32
(NIPS), 201
Sweep Distortion Removal from THz Images via Blind Demodulation
Heavy sweep distortion induced by alignments and inter-reflections of layers
of a sample is a major burden in recovering 2D and 3D information in time
resolved spectral imaging. This problem cannot be addressed by conventional
denoising and signal processing techniques as it heavily depends on the physics
of the acquisition. Here we propose and implement an algorithmic framework
based on low-rank matrix recovery and alternating minimization that exploits
the forward model for THz acquisition. The method allows recovering the
original signal in spite of the presence of temporal-spatial distortions. We
address a blind-demodulation problem, where based on several observations of
the sample texture modulated by an undesired sweep pattern, the two classes of
signals are separated. The performance of the method is examined in both
synthetic and experimental data, and the successful reconstructions are
demonstrated. The proposed general scheme can be implemented to advance
inspection and imaging applications in THz and other time-resolved sensing
modalities
Learning Latent Variable Gaussian Graphical Models
Gaussian graphical models (GGM) have been widely used in many
high-dimensional applications ranging from biological and financial data to
recommender systems. Sparsity in GGM plays a central role both statistically
and computationally. Unfortunately, real-world data often does not fit well to
sparse graphical models. In this paper, we focus on a family of latent variable
Gaussian graphical models (LVGGM), where the model is conditionally sparse
given latent variables, but marginally non-sparse. In LVGGM, the inverse
covariance matrix has a low-rank plus sparse structure, and can be learned in a
regularized maximum likelihood framework. We derive novel parameter estimation
error bounds for LVGGM under mild conditions in the high-dimensional setting.
These results complement the existing theory on the structural learning, and
open up new possibilities of using LVGGM for statistical inference.Comment: To appear in The 31st International Conference on Machine Learning
(ICML 2014
Low-Rank Modeling and Its Applications in Image Analysis
Low-rank modeling generally refers to a class of methods that solve problems
by representing variables of interest as low-rank matrices. It has achieved
great success in various fields including computer vision, data mining, signal
processing and bioinformatics. Recently, much progress has been made in
theories, algorithms and applications of low-rank modeling, such as exact
low-rank matrix recovery via convex programming and matrix completion applied
to collaborative filtering. These advances have brought more and more
attentions to this topic. In this paper, we review the recent advance of
low-rank modeling, the state-of-the-art algorithms, and related applications in
image analysis. We first give an overview to the concept of low-rank modeling
and challenging problems in this area. Then, we summarize the models and
algorithms for low-rank matrix recovery and illustrate their advantages and
limitations with numerical experiments. Next, we introduce a few applications
of low-rank modeling in the context of image analysis. Finally, we conclude
this paper with some discussions.Comment: To appear in ACM Computing Survey
- …