15,388 research outputs found
Tensors, Learning, and 'Kolmogorov Extension' for Finite-alphabet Random Vectors
Estimating the joint probability mass function (PMF) of a set of random
variables lies at the heart of statistical learning and signal processing.
Without structural assumptions, such as modeling the variables as a Markov
chain, tree, or other graphical model, joint PMF estimation is often considered
mission impossible - the number of unknowns grows exponentially with the number
of variables. But who gives us the structural model? Is there a generic,
`non-parametric' way to control joint PMF complexity without relying on a
priori structural assumptions regarding the underlying probability model? Is it
possible to discover the operational structure without biasing the analysis up
front? What if we only observe random subsets of the variables, can we still
reliably estimate the joint PMF of all? This paper shows, perhaps surprisingly,
that if the joint PMF of any three variables can be estimated, then the joint
PMF of all the variables can be provably recovered under relatively mild
conditions. The result is reminiscent of Kolmogorov's extension theorem -
consistent specification of lower-dimensional distributions induces a unique
probability measure for the entire process. The difference is that for
processes of limited complexity (rank of the high-dimensional PMF) it is
possible to obtain complete characterization from only three-dimensional
distributions. In fact not all three-dimensional PMFs are needed; and under
more stringent conditions even two-dimensional will do. Exploiting multilinear
algebra, this paper proves that such higher-dimensional PMF completion can be
guaranteed - several pertinent identifiability results are derived. It also
provides a practical and efficient algorithm to carry out the recovery task.
Judiciously designed simulations and real-data experiments on movie
recommendation and data classification are presented to showcase the
effectiveness of the approach
Nonparametric empirical Bayes and maximum likelihood estimation for high-dimensional data analysis
Nonparametric empirical Bayes methods provide a flexible and attractive
approach to high-dimensional data analysis. One particularly elegant empirical
Bayes methodology, involving the Kiefer-Wolfowitz nonparametric maximum
likelihood estimator (NPMLE) for mixture models, has been known for decades.
However, implementation and theoretical analysis of the Kiefer-Wolfowitz NPMLE
are notoriously difficult. A fast algorithm was recently proposed that makes
NPMLE-based procedures feasible for use in large-scale problems, but the
algorithm calculates only an approximation to the NPMLE. In this paper we make
two contributions. First, we provide upper bounds on the convergence rate of
the approximate NPMLE's statistical error, which have the same order as the
best known bounds for the true NPMLE. This suggests that the approximate NPMLE
is just as effective as the true NPMLE for statistical applications. Second, we
illustrate the promise of NPMLE procedures in a high-dimensional binary
classification problem. We propose a new procedure and show that it vastly
outperforms existing methods in experiments with simulated data. In real data
analyses involving cancer survival and gene expression data, we show that it is
very competitive with several recently proposed methods for regularized linear
discriminant analysis, another popular approach to high-dimensional
classification
Horseshoe Regularization for Machine Learning in Complex and Deep Models
Since the advent of the horseshoe priors for regularization, global-local
shrinkage methods have proved to be a fertile ground for the development of
Bayesian methodology in machine learning, specifically for high-dimensional
regression and classification problems. They have achieved remarkable success
in computation, and enjoy strong theoretical support. Most of the existing
literature has focused on the linear Gaussian case; see Bhadra et al. (2019b)
for a systematic survey. The purpose of the current article is to demonstrate
that the horseshoe regularization is useful far more broadly, by reviewing both
methodological and computational developments in complex models that are more
relevant to machine learning applications. Specifically, we focus on
methodological challenges in horseshoe regularization in nonlinear and
non-Gaussian models; multivariate models; and deep neural networks. We also
outline the recent computational developments in horseshoe shrinkage for
complex models along with a list of available software implementations that
allows one to venture out beyond the comfort zone of the canonical linear
regression problems
Approximate nonparametric maximum likelihood inference for mixture models via convex optimization
Nonparametric maximum likelihood (NPML) for mixture models is a technique for
estimating mixing distributions that has a long and rich history in statistics
going back to the 1950s, and is closely related to empirical Bayes methods.
Historically, NPML-based methods have been considered to be relatively
impractical because of computational and theoretical obstacles. However, recent
work focusing on approximate NPML methods suggests that these methods may have
great promise for a variety of modern applications. Building on this recent
work, a class of flexible, scalable, and easy to implement approximate NPML
methods is studied for problems with multivariate mixing distributions.
Concrete guidance on implementing these methods is provided, with theoretical
and empirical support; topics covered include identifying the support set of
the mixing distribution, and comparing algorithms (across a variety of metrics)
for solving the simple convex optimization problem at the core of the
approximate NPML problem. Additionally, three diverse real data applications
are studied to illustrate the methods' performance: (i) A baseball data
analysis (a classical example for empirical Bayes methods), (ii)
high-dimensional microarray classification, and (iii) online prediction of
blood-glucose density for diabetes patients. Among other things, the empirical
results demonstrate the relative effectiveness of using multivariate (as
opposed to univariate) mixing distributions for NPML-based approaches.Comment: 27 pages, 1 figur
Estimating mutual information in high dimensions via classification error
Multivariate pattern analyses approaches in neuroimaging are fundamentally
concerned with investigating the quantity and type of information processed by
various regions of the human brain; typically, estimates of classification
accuracy are used to quantify information. While a extensive and powerful
library of methods can be applied to train and assess classifiers, it is not
always clear how to use the resulting measures of classification performance to
draw scientific conclusions: e.g. for the purpose of evaluating redundancy
between brain regions. An additional confound for interpreting classification
performance is the dependence of the error rate on the number and choice of
distinct classes obtained for the classification task. In contrast, mutual
information is a quantity defined independently of the experimental design, and
has ideal properties for comparative analyses. Unfortunately, estimating the
mutual information based on observations becomes statistically infeasible in
high dimensions without some kind of assumption or prior.
In this paper, we construct a novel classification-based estimator of mutual
information based on high-dimensional asymptotics. We show that in a particular
limiting regime, the mutual information is an invertible function of the
expected -class Bayes error. While the theory is based on a large-sample,
high-dimensional limit, we demonstrate through simulations that our proposed
estimator has superior performance to the alternatives in problems of moderate
dimensionality
Learning from a lot: Empirical Bayes in high-dimensional prediction settings
Empirical Bayes is a versatile approach to `learn from a lot' in two ways:
first, from a large number of variables and second, from a potentially large
amount of prior information, e.g. stored in public repositories. We review
applications of a variety of empirical Bayes methods to several well-known
model-based prediction methods including penalized regression, linear
discriminant analysis, and Bayesian models with sparse or dense priors. We
discuss `formal' empirical Bayes methods which maximize the marginal
likelihood, but also more informal approaches based on other data summaries. We
contrast empirical Bayes to cross-validation and full Bayes, and discuss hybrid
approaches. To study the relation between the quality of an empirical Bayes
estimator and , the number of variables, we consider a simple empirical
Bayes estimator in a linear model setting.
We argue that empirical Bayes is particularly useful when the prior contains
multiple parameters which model a priori information on variables, termed
`co-data'. In particular, we present two novel examples that allow for co-data.
First, a Bayesian spike-and-slab setting that facilitates inclusion of multiple
co-data sources and types; second, a hybrid empirical Bayes-full Bayes ridge
regression approach for estimation of the posterior predictive interval
Integrative genetic risk prediction using nonparametric empirical Bayes classification
Genetic risk prediction is an important component of individualized medicine,
but prediction accuracies remain low for many complex diseases. A fundamental
limitation is the sample sizes of the studies on which the prediction
algorithms are trained. One way to increase the effective sample size is to
integrate information from previously existing studies. However, it can be
difficult to find existing data that examine the target disease of interest,
especially if that disease is rare or poorly studied. Furthermore,
individual-level genotype data from these auxiliary studies are typically
difficult to obtain. This paper proposes a new approach to integrative genetic
risk prediction of complex diseases with binary phenotypes. It accommodates
possible heterogeneity in the genetic etiologies of the target and auxiliary
diseases using a tuning parameter-free nonparametric empirical Bayes procedure,
and can be trained using only auxiliary summary statistics. Simulation studies
show that the proposed method can provide superior predictive accuracy relative
to non-integrative as well as integrative classifiers. The method is applied to
a recent study of pediatric autoimmune diseases, where it substantially reduces
prediction error for certain target/auxiliary disease combinations. The
proposed method is implemented in the R package ssa
Kernel Mean Embedding of Distributions: A Review and Beyond
A Hilbert space embedding of a distribution---in short, a kernel mean
embedding---has recently emerged as a powerful tool for machine learning and
inference. The basic idea behind this framework is to map distributions into a
reproducing kernel Hilbert space (RKHS) in which the whole arsenal of kernel
methods can be extended to probability measures. It can be viewed as a
generalization of the original "feature map" common to support vector machines
(SVMs) and other kernel methods. While initially closely associated with the
latter, it has meanwhile found application in fields ranging from kernel
machines and probabilistic modeling to statistical inference, causal discovery,
and deep learning. The goal of this survey is to give a comprehensive review of
existing work and recent advances in this research area, and to discuss the
most challenging issues and open problems that could lead to new research
directions. The survey begins with a brief introduction to the RKHS and
positive definite kernels which forms the backbone of this survey, followed by
a thorough discussion of the Hilbert space embedding of marginal distributions,
theoretical guarantees, and a review of its applications. The embedding of
distributions enables us to apply RKHS methods to probability measures which
prompts a wide range of applications such as kernel two-sample testing,
independent testing, and learning on distributional data. Next, we discuss the
Hilbert space embedding for conditional distributions, give theoretical
insights, and review some applications. The conditional mean embedding enables
us to perform sum, product, and Bayes' rules---which are ubiquitous in
graphical model, probabilistic inference, and reinforcement learning---in a
non-parametric way. We then discuss relationships between this framework and
other related areas. Lastly, we give some suggestions on future research
directions.Comment: 147 pages; this is a version of the manuscript after the review
proces
Bayes factor consistency
Good large sample performance is typically a minimum requirement of any model
selection criterion. This article focuses on the consistency property of the
Bayes factor, a commonly used model comparison tool, which has experienced a
recent surge of attention in the literature. We thoroughly review existing
results. As there exists such a wide variety of settings to be considered, e.g.
parametric vs. nonparametric, nested vs. non-nested, etc., we adopt the view
that a unified framework has didactic value. Using the basic marginal
likelihood identity of Chib (1995), we study Bayes factor asymptotics by
decomposing the natural logarithm of the ratio of marginal likelihoods into
three components. These are, respectively, log ratios of likelihoods, prior
densities, and posterior densities. This yields an interpretation of the log
ratio of posteriors as a penalty term, and emphasizes that to understand Bayes
factor consistency, the prior support conditions driving posterior consistency
in each respective model under comparison should be contrasted in terms of the
rates of posterior contraction they imply.Comment: 53 page
TULIP: A Toolbox for Linear Discriminant Analysis with Penalties
Linear discriminant analysis (LDA) is a powerful tool in building classifiers
with easy computation and interpretation. Recent advancements in science
technology have led to the popularity of datasets with high dimensions, high
orders and complicated structure. Such datasetes motivate the generalization of
LDA in various research directions. The R package TULIP integrates several
popular high-dimensional LDA-based methods and provides a comprehensive and
user-friendly toolbox for linear, semi-parametric and tensor-variate
classification. Functions are included for model fitting, cross validation and
prediction. In addition, motivated by datasets with diverse sources of
predictors, we further include functions for covariate adjustment. Our package
is carefully tailored for low storage and high computation efficiency.
Moreover, our package is the first R package for many of these methods,
providing great convenience to researchers in this area
- …