27 research outputs found
Flexible sampling of discrete data correlations without the marginal distributions
Learning the joint dependence of discrete variables is a fundamental problem
in machine learning, with many applications including prediction, clustering
and dimensionality reduction. More recently, the framework of copula modeling
has gained popularity due to its modular parametrization of joint
distributions. Among other properties, copulas provide a recipe for combining
flexible models for univariate marginal distributions with parametric families
suitable for potentially high dimensional dependence structures. More
radically, the extended rank likelihood approach of Hoff (2007) bypasses
learning marginal models completely when such information is ancillary to the
learning task at hand as in, e.g., standard dimensionality reduction problems
or copula parameter estimation. The main idea is to represent data by their
observable rank statistics, ignoring any other information from the marginals.
Inference is typically done in a Bayesian framework with Gaussian copulas, and
it is complicated by the fact this implies sampling within a space where the
number of constraints increases quadratically with the number of data points.
The result is slow mixing when using off-the-shelf Gibbs sampling. We present
an efficient algorithm based on recent advances on constrained Hamiltonian
Markov chain Monte Carlo that is simple to implement and does not require
paying for a quadratic cost in sample size.Comment: An overhauled version of the experimental section moved to the main
paper. Old experimental section moved to supplementary materia
Residual Component Analysis
Probabilistic principal component analysis (PPCA) seeks a low dimensional
representation of a data set in the presence of independent spherical Gaussian
noise, Sigma = (sigma^2)*I. The maximum likelihood solution for the model is an
eigenvalue problem on the sample covariance matrix. In this paper we consider
the situation where the data variance is already partially explained by other
factors, e.g. covariates of interest, or temporal correlations leaving some
residual variance. We decompose the residual variance into its components
through a generalized eigenvalue problem, which we call residual component
analysis (RCA). We show that canonical covariates analysis (CCA) is a special
case of our algorithm and explore a range of new algorithms that arise from the
framework. We illustrate the ideas on a gene expression time series data set
and the recovery of human pose from silhouette
Probabilistic Super-Resolution of Solar Magnetograms: Generating Many Explanations and Measuring Uncertainties
Machine learning techniques have been successfully applied to super-resolution tasks on natural images where visually pleasing results are sufficient. However in many scientific domains this is not adequate and estimations of errors and uncertainties are crucial. To address this issue we propose a Bayesian framework that decomposes uncertainties into epistemic and aleatoric uncertainties. We test the validity of our approach by super-resolving images of the Sun's magnetic field and by generating maps measuring the range of possible high resolution explanations compatible with a given low resolution magnetogram