9,246 research outputs found
Bayesian group latent factor analysis with structured sparsity
Latent factor models are the canonical statistical tool for exploratory
analyses of low-dimensional linear structure for an observation matrix with p
features across n samples. We develop a structured Bayesian group factor
analysis model that extends the factor model to multiple coupled observation
matrices; in the case of two observations, this reduces to a Bayesian model of
canonical correlation analysis. The main contribution of this work is to
carefully define a structured Bayesian prior that encourages both element-wise
and column-wise shrinkage and leads to desirable behavior on high-dimensional
data. In particular, our model puts a structured prior on the joint factor
loading matrix, regularizing at three levels, which enables element-wise
sparsity and unsupervised recovery of latent factors corresponding to
structured variance across arbitrary subsets of the observations. In addition,
our structured prior allows for both dense and sparse latent factors so that
covariation among either all features or only a subset of features can both be
recovered. We use fast parameter-expanded expectation-maximization for
parameter estimation in this model. We validate our method on both simulated
data with substantial structure and real data, comparing against a number of
state-of-the-art approaches. These results illustrate useful properties of our
model, including i) recovering sparse signal in the presence of dense effects;
ii) the ability to scale naturally to large numbers of observations; iii)
flexible observation- and factor-specific regularization to recover factors
with a wide variety of sparsity levels and percentage of variance explained;
and iv) tractable inference that scales to modern genomic and document data
sizes
VARX-L: Structured Regularization for Large Vector Autoregressions with Exogenous Variables
The vector autoregression (VAR) has long proven to be an effective method for
modeling the joint dynamics of macroeconomic time series as well as
forecasting. A major shortcoming of the VAR that has hindered its applicability
is its heavy parameterization: the parameter space grows quadratically with the
number of series included, quickly exhausting the available degrees of freedom.
Consequently, forecasting using VARs is intractable for low-frequency,
high-dimensional macroeconomic data. However, empirical evidence suggests that
VARs that incorporate more component series tend to result in more accurate
forecasts. Conventional methods that allow for the estimation of large VARs
either tend to require ad hoc subjective specifications or are computationally
infeasible. Moreover, as global economies become more intricately intertwined,
there has been substantial interest in incorporating the impact of stochastic,
unmodeled exogenous variables. Vector autoregression with exogenous variables
(VARX) extends the VAR to allow for the inclusion of unmodeled variables, but
it similarly faces dimensionality challenges.
We introduce the VARX-L framework, a structured family of VARX models, and
provide methodology that allows for both efficient estimation and accurate
forecasting in high-dimensional analysis. VARX-L adapts several prominent
scalar regression regularization techniques to a vector time series context in
order to greatly reduce the parameter space of VAR and VARX models. We also
highlight a compelling extension that allows for shrinking toward reference
models, such as a vector random walk. We demonstrate the efficacy of VARX-L in
both low- and high-dimensional macroeconomic forecasting applications and
simulated data examples. Our methodology is easily reproducible in a publicly
available R package
Information Projection and Approximate Inference for Structured Sparse Variables
Approximate inference via information projection has been recently introduced
as a general-purpose approach for efficient probabilistic inference given
sparse variables. This manuscript goes beyond classical sparsity by proposing
efficient algorithms for approximate inference via information projection that
are applicable to any structure on the set of variables that admits enumeration
using a \emph{matroid}. We show that the resulting information projection can
be reduced to combinatorial submodular optimization subject to matroid
constraints. Further, leveraging recent advances in submodular optimization, we
provide an efficient greedy algorithm with strong optimization-theoretic
guarantees. The class of probabilistic models that can be expressed in this way
is quite broad and, as we show, includes group sparse regression, group sparse
principal components analysis and sparse canonical correlation analysis, among
others. Moreover, empirical results on simulated data and high dimensional
neuroimaging data highlight the superior performance of the information
projection approach as compared to established baselines for a range of
probabilistic models
Bayesian Optimal Approximate Message Passing to Recover Structured Sparse Signals
We present a novel compressed sensing recovery algorithm - termed Bayesian
Optimal Structured Signal Approximate Message Passing (BOSSAMP) - that jointly
exploits the prior distribution and the structured sparsity of a signal that
shall be recovered from noisy linear measurements. Structured sparsity is
inherent to group sparse and jointly sparse signals. Our algorithm is based on
approximate message passing that poses a low complexity recovery algorithm
whose Bayesian optimal version allows to specify a prior distribution for each
signal component. We utilize this feature in order to establish an
iteration-wise extrinsic group update step, in which likelihood ratios of
neighboring group elements provide soft information about a specific group
element. Doing so, the recovery of structured signals is drastically improved.
We derive the extrinsic group update step for a sparse binary and a sparse
Gaussian signal prior, where the nonzero entries are either one or Gaussian
distributed, respectively. We also explain how BOSSAMP is applicable to
arbitrary sparse signals. Simulations demonstrate that our approach exhibits
superior performance compared to the current state of the art, while it retains
a simple iterative implementation with low computational complexity.Comment: 13 pages, 9 figures, 1 table. Submitted to IEEE Transactions on
Signal Processin
Learning Sparse Structured Ensembles with SG-MCMC and Network Pruning
An ensemble of neural networks is known to be more robust and accurate than
an individual network, however usually with linearly-increased cost in both
training and testing. In this work, we propose a two-stage method to learn
Sparse Structured Ensembles (SSEs) for neural networks. In the first stage, we
run SG-MCMC with group sparse priors to draw an ensemble of samples from the
posterior distribution of network parameters. In the second stage, we apply
weight-pruning to each sampled network and then perform retraining over the
remained connections. In this way of learning SSEs with SG-MCMC and pruning, we
not only achieve high prediction accuracy since SG-MCMC enhances exploration of
the model-parameter space, but also reduce memory and computation cost
significantly in both training and testing of NN ensembles. This is thoroughly
evaluated in the experiments of learning SSE ensembles of both FNNs and LSTMs.
For example, in LSTM based language modeling (LM), we obtain 21% relative
reduction in LM perplexity by learning a SSE of 4 large LSTM models, which has
only 30% of model parameters and 70% of computations in total, as compared to
the baseline large LSTM LM. To the best of our knowledge, this work represents
the first methodology and empirical study of integrating SG-MCMC, group sparse
prior and network pruning together for learning NN ensembles
Dependent relevance determination for smooth and structured sparse regression
In many problem settings, parameter vectors are not merely sparse but
dependent in such a way that non-zero coefficients tend to cluster together. We
refer to this form of dependency as "region sparsity." Classical sparse
regression methods, such as the lasso and automatic relevance determination
(ARD), which model parameters as independent a priori, and therefore do not
exploit such dependencies. Here we introduce a hierarchical model for smooth,
region-sparse weight vectors and tensors in a linear regression setting. Our
approach represents a hierarchical extension of the relevance determination
framework, where we add a transformed Gaussian process to model the
dependencies between the prior variances of regression weights. We combine this
with a structured model of the prior variances of Fourier coefficients, which
eliminates unnecessary high frequencies. The resulting prior encourages weights
to be region-sparse in two different bases simultaneously. We develop Laplace
approximation and Monte Carlo Markov Chain (MCMC) sampling to provide efficient
inference for the posterior. Furthermore, a two-stage convex relaxation of the
Laplace approximation approach is also provided to relax the inevitable
non-convexity during the optimization. We finally show substantial improvements
over comparable methods for both simulated and real datasets from brain
imaging.Comment: 42 pages, 15 figures, submitted to JML
Decomposition into Low-rank plus Additive Matrices for Background/Foreground Separation: A Review for a Comparative Evaluation with a Large-Scale Dataset
Recent research on problem formulations based on decomposition into low-rank
plus sparse matrices shows a suitable framework to separate moving objects from
the background. The most representative problem formulation is the Robust
Principal Component Analysis (RPCA) solved via Principal Component Pursuit
(PCP) which decomposes a data matrix in a low-rank matrix and a sparse matrix.
However, similar robust implicit or explicit decompositions can be made in the
following problem formulations: Robust Non-negative Matrix Factorization
(RNMF), Robust Matrix Completion (RMC), Robust Subspace Recovery (RSR), Robust
Subspace Tracking (RST) and Robust Low-Rank Minimization (RLRM). The main goal
of these similar problem formulations is to obtain explicitly or implicitly a
decomposition into low-rank matrix plus additive matrices. In this context,
this work aims to initiate a rigorous and comprehensive review of the similar
problem formulations in robust subspace learning and tracking based on
decomposition into low-rank plus additive matrices for testing and ranking
existing algorithms for background/foreground separation. For this, we first
provide a preliminary review of the recent developments in the different
problem formulations which allows us to define a unified view that we called
Decomposition into Low-rank plus Additive Matrices (DLAM). Then, we examine
carefully each method in each robust subspace learning/tracking frameworks with
their decomposition, their loss functions, their optimization problem and their
solvers. Furthermore, we investigate if incremental algorithms and real-time
implementations can be achieved for background/foreground separation. Finally,
experimental results on a large-scale dataset called Background Models
Challenge (BMC 2012) show the comparative performance of 32 different robust
subspace learning/tracking methods.Comment: 121 pages, 5 figures, submitted to Computer Science Review. arXiv
admin note: text overlap with arXiv:1312.7167, arXiv:1109.6297,
arXiv:1207.3438, arXiv:1105.2126, arXiv:1404.7592, arXiv:1210.0805,
arXiv:1403.8067 by other authors, Computer Science Review, November 201
Disentangled VAE Representations for Multi-Aspect and Missing Data
Many problems in machine learning and related application areas are
fundamentally variants of conditional modeling and sampling across multi-aspect
data, either multi-view, multi-modal, or simply multi-group. For example,
sampling from the distribution of English sentences conditioned on a given
French sentence or sampling audio waveforms conditioned on a given piece of
text. Central to many of these problems is the issue of missing data: we can
observe many English, French, or German sentences individually but only
occasionally do we have data for a sentence pair. Motivated by these
applications and inspired by recent progress in variational autoencoders for
grouped data, we develop factVAE, a deep generative model capable of handling
multi-aspect data, robust to missing observations, and with a prior that
encourages disentanglement between the groups and the latent dimensions. The
effectiveness of factVAE is demonstrated on a variety of rich real-world
datasets, including motion capture poses and pictures of faces captured from
varying poses and perspectives
Bayesian Variable Selection and Estimation for Group Lasso
The paper revisits the Bayesian group lasso and uses spike and slab priors
for group variable selection. In the process, the connection of our model with
penalized regression is demonstrated, and the role of posterior median for
thresholding is pointed out. We show that the posterior median estimator has
the oracle property for group variable selection and estimation under
orthogonal designs, while the group lasso has suboptimal asymptotic estimation
rate when variable selection consistency is achieved. Next we consider bi-level
selection problem and propose the Bayesian sparse group selection again with
spike and slab priors to select variables both at the group level and also
within a group. We demonstrate via simulation that the posterior median
estimator of our spike and slab models has excellent performance for both
variable selection and estimation.Comment: Published at http://dx.doi.org/10.1214/14-BA929 in the Bayesian
Analysis (http://projecteuclid.org/euclid.ba) by the International Society of
Bayesian Analysis (http://bayesian.org/
Classification of weak multi-view signals by sharing factors in a mixture of Bayesian group factor analyzers
We propose a novel classification model for weak signal data, building upon a
recent model for Bayesian multi-view learning, Group Factor Analysis (GFA).
Instead of assuming all data to come from a single GFA model, we allow latent
clusters, each having a different GFA model and producing a different class
distribution. We show that sharing information across the clusters, by sharing
factors, increases the classification accuracy considerably; the shared factors
essentially form a flexible noise model that explains away the part of data not
related to classification. Motivation for the setting comes from single-trial
functional brain imaging data, having a very low signal-to-noise ratio and a
natural multi-view setting, with the different sensors, measurement modalities
(EEG, MEG, fMRI) and possible auxiliary information as views. We demonstrate
our model on a MEG dataset.Comment: Presented at MLINI-2015 workshop, 2015 (arXiv:1605.04435
- …