9,246 research outputs found

    Bayesian group latent factor analysis with structured sparsity

    Full text link
    Latent factor models are the canonical statistical tool for exploratory analyses of low-dimensional linear structure for an observation matrix with p features across n samples. We develop a structured Bayesian group factor analysis model that extends the factor model to multiple coupled observation matrices; in the case of two observations, this reduces to a Bayesian model of canonical correlation analysis. The main contribution of this work is to carefully define a structured Bayesian prior that encourages both element-wise and column-wise shrinkage and leads to desirable behavior on high-dimensional data. In particular, our model puts a structured prior on the joint factor loading matrix, regularizing at three levels, which enables element-wise sparsity and unsupervised recovery of latent factors corresponding to structured variance across arbitrary subsets of the observations. In addition, our structured prior allows for both dense and sparse latent factors so that covariation among either all features or only a subset of features can both be recovered. We use fast parameter-expanded expectation-maximization for parameter estimation in this model. We validate our method on both simulated data with substantial structure and real data, comparing against a number of state-of-the-art approaches. These results illustrate useful properties of our model, including i) recovering sparse signal in the presence of dense effects; ii) the ability to scale naturally to large numbers of observations; iii) flexible observation- and factor-specific regularization to recover factors with a wide variety of sparsity levels and percentage of variance explained; and iv) tractable inference that scales to modern genomic and document data sizes

    VARX-L: Structured Regularization for Large Vector Autoregressions with Exogenous Variables

    Full text link
    The vector autoregression (VAR) has long proven to be an effective method for modeling the joint dynamics of macroeconomic time series as well as forecasting. A major shortcoming of the VAR that has hindered its applicability is its heavy parameterization: the parameter space grows quadratically with the number of series included, quickly exhausting the available degrees of freedom. Consequently, forecasting using VARs is intractable for low-frequency, high-dimensional macroeconomic data. However, empirical evidence suggests that VARs that incorporate more component series tend to result in more accurate forecasts. Conventional methods that allow for the estimation of large VARs either tend to require ad hoc subjective specifications or are computationally infeasible. Moreover, as global economies become more intricately intertwined, there has been substantial interest in incorporating the impact of stochastic, unmodeled exogenous variables. Vector autoregression with exogenous variables (VARX) extends the VAR to allow for the inclusion of unmodeled variables, but it similarly faces dimensionality challenges. We introduce the VARX-L framework, a structured family of VARX models, and provide methodology that allows for both efficient estimation and accurate forecasting in high-dimensional analysis. VARX-L adapts several prominent scalar regression regularization techniques to a vector time series context in order to greatly reduce the parameter space of VAR and VARX models. We also highlight a compelling extension that allows for shrinking toward reference models, such as a vector random walk. We demonstrate the efficacy of VARX-L in both low- and high-dimensional macroeconomic forecasting applications and simulated data examples. Our methodology is easily reproducible in a publicly available R package

    Information Projection and Approximate Inference for Structured Sparse Variables

    Full text link
    Approximate inference via information projection has been recently introduced as a general-purpose approach for efficient probabilistic inference given sparse variables. This manuscript goes beyond classical sparsity by proposing efficient algorithms for approximate inference via information projection that are applicable to any structure on the set of variables that admits enumeration using a \emph{matroid}. We show that the resulting information projection can be reduced to combinatorial submodular optimization subject to matroid constraints. Further, leveraging recent advances in submodular optimization, we provide an efficient greedy algorithm with strong optimization-theoretic guarantees. The class of probabilistic models that can be expressed in this way is quite broad and, as we show, includes group sparse regression, group sparse principal components analysis and sparse canonical correlation analysis, among others. Moreover, empirical results on simulated data and high dimensional neuroimaging data highlight the superior performance of the information projection approach as compared to established baselines for a range of probabilistic models

    Bayesian Optimal Approximate Message Passing to Recover Structured Sparse Signals

    Full text link
    We present a novel compressed sensing recovery algorithm - termed Bayesian Optimal Structured Signal Approximate Message Passing (BOSSAMP) - that jointly exploits the prior distribution and the structured sparsity of a signal that shall be recovered from noisy linear measurements. Structured sparsity is inherent to group sparse and jointly sparse signals. Our algorithm is based on approximate message passing that poses a low complexity recovery algorithm whose Bayesian optimal version allows to specify a prior distribution for each signal component. We utilize this feature in order to establish an iteration-wise extrinsic group update step, in which likelihood ratios of neighboring group elements provide soft information about a specific group element. Doing so, the recovery of structured signals is drastically improved. We derive the extrinsic group update step for a sparse binary and a sparse Gaussian signal prior, where the nonzero entries are either one or Gaussian distributed, respectively. We also explain how BOSSAMP is applicable to arbitrary sparse signals. Simulations demonstrate that our approach exhibits superior performance compared to the current state of the art, while it retains a simple iterative implementation with low computational complexity.Comment: 13 pages, 9 figures, 1 table. Submitted to IEEE Transactions on Signal Processin

    Learning Sparse Structured Ensembles with SG-MCMC and Network Pruning

    Full text link
    An ensemble of neural networks is known to be more robust and accurate than an individual network, however usually with linearly-increased cost in both training and testing. In this work, we propose a two-stage method to learn Sparse Structured Ensembles (SSEs) for neural networks. In the first stage, we run SG-MCMC with group sparse priors to draw an ensemble of samples from the posterior distribution of network parameters. In the second stage, we apply weight-pruning to each sampled network and then perform retraining over the remained connections. In this way of learning SSEs with SG-MCMC and pruning, we not only achieve high prediction accuracy since SG-MCMC enhances exploration of the model-parameter space, but also reduce memory and computation cost significantly in both training and testing of NN ensembles. This is thoroughly evaluated in the experiments of learning SSE ensembles of both FNNs and LSTMs. For example, in LSTM based language modeling (LM), we obtain 21% relative reduction in LM perplexity by learning a SSE of 4 large LSTM models, which has only 30% of model parameters and 70% of computations in total, as compared to the baseline large LSTM LM. To the best of our knowledge, this work represents the first methodology and empirical study of integrating SG-MCMC, group sparse prior and network pruning together for learning NN ensembles

    Dependent relevance determination for smooth and structured sparse regression

    Full text link
    In many problem settings, parameter vectors are not merely sparse but dependent in such a way that non-zero coefficients tend to cluster together. We refer to this form of dependency as "region sparsity." Classical sparse regression methods, such as the lasso and automatic relevance determination (ARD), which model parameters as independent a priori, and therefore do not exploit such dependencies. Here we introduce a hierarchical model for smooth, region-sparse weight vectors and tensors in a linear regression setting. Our approach represents a hierarchical extension of the relevance determination framework, where we add a transformed Gaussian process to model the dependencies between the prior variances of regression weights. We combine this with a structured model of the prior variances of Fourier coefficients, which eliminates unnecessary high frequencies. The resulting prior encourages weights to be region-sparse in two different bases simultaneously. We develop Laplace approximation and Monte Carlo Markov Chain (MCMC) sampling to provide efficient inference for the posterior. Furthermore, a two-stage convex relaxation of the Laplace approximation approach is also provided to relax the inevitable non-convexity during the optimization. We finally show substantial improvements over comparable methods for both simulated and real datasets from brain imaging.Comment: 42 pages, 15 figures, submitted to JML

    Decomposition into Low-rank plus Additive Matrices for Background/Foreground Separation: A Review for a Comparative Evaluation with a Large-Scale Dataset

    Full text link
    Recent research on problem formulations based on decomposition into low-rank plus sparse matrices shows a suitable framework to separate moving objects from the background. The most representative problem formulation is the Robust Principal Component Analysis (RPCA) solved via Principal Component Pursuit (PCP) which decomposes a data matrix in a low-rank matrix and a sparse matrix. However, similar robust implicit or explicit decompositions can be made in the following problem formulations: Robust Non-negative Matrix Factorization (RNMF), Robust Matrix Completion (RMC), Robust Subspace Recovery (RSR), Robust Subspace Tracking (RST) and Robust Low-Rank Minimization (RLRM). The main goal of these similar problem formulations is to obtain explicitly or implicitly a decomposition into low-rank matrix plus additive matrices. In this context, this work aims to initiate a rigorous and comprehensive review of the similar problem formulations in robust subspace learning and tracking based on decomposition into low-rank plus additive matrices for testing and ranking existing algorithms for background/foreground separation. For this, we first provide a preliminary review of the recent developments in the different problem formulations which allows us to define a unified view that we called Decomposition into Low-rank plus Additive Matrices (DLAM). Then, we examine carefully each method in each robust subspace learning/tracking frameworks with their decomposition, their loss functions, their optimization problem and their solvers. Furthermore, we investigate if incremental algorithms and real-time implementations can be achieved for background/foreground separation. Finally, experimental results on a large-scale dataset called Background Models Challenge (BMC 2012) show the comparative performance of 32 different robust subspace learning/tracking methods.Comment: 121 pages, 5 figures, submitted to Computer Science Review. arXiv admin note: text overlap with arXiv:1312.7167, arXiv:1109.6297, arXiv:1207.3438, arXiv:1105.2126, arXiv:1404.7592, arXiv:1210.0805, arXiv:1403.8067 by other authors, Computer Science Review, November 201

    Disentangled VAE Representations for Multi-Aspect and Missing Data

    Full text link
    Many problems in machine learning and related application areas are fundamentally variants of conditional modeling and sampling across multi-aspect data, either multi-view, multi-modal, or simply multi-group. For example, sampling from the distribution of English sentences conditioned on a given French sentence or sampling audio waveforms conditioned on a given piece of text. Central to many of these problems is the issue of missing data: we can observe many English, French, or German sentences individually but only occasionally do we have data for a sentence pair. Motivated by these applications and inspired by recent progress in variational autoencoders for grouped data, we develop factVAE, a deep generative model capable of handling multi-aspect data, robust to missing observations, and with a prior that encourages disentanglement between the groups and the latent dimensions. The effectiveness of factVAE is demonstrated on a variety of rich real-world datasets, including motion capture poses and pictures of faces captured from varying poses and perspectives

    Bayesian Variable Selection and Estimation for Group Lasso

    Full text link
    The paper revisits the Bayesian group lasso and uses spike and slab priors for group variable selection. In the process, the connection of our model with penalized regression is demonstrated, and the role of posterior median for thresholding is pointed out. We show that the posterior median estimator has the oracle property for group variable selection and estimation under orthogonal designs, while the group lasso has suboptimal asymptotic estimation rate when variable selection consistency is achieved. Next we consider bi-level selection problem and propose the Bayesian sparse group selection again with spike and slab priors to select variables both at the group level and also within a group. We demonstrate via simulation that the posterior median estimator of our spike and slab models has excellent performance for both variable selection and estimation.Comment: Published at http://dx.doi.org/10.1214/14-BA929 in the Bayesian Analysis (http://projecteuclid.org/euclid.ba) by the International Society of Bayesian Analysis (http://bayesian.org/

    Classification of weak multi-view signals by sharing factors in a mixture of Bayesian group factor analyzers

    Full text link
    We propose a novel classification model for weak signal data, building upon a recent model for Bayesian multi-view learning, Group Factor Analysis (GFA). Instead of assuming all data to come from a single GFA model, we allow latent clusters, each having a different GFA model and producing a different class distribution. We show that sharing information across the clusters, by sharing factors, increases the classification accuracy considerably; the shared factors essentially form a flexible noise model that explains away the part of data not related to classification. Motivation for the setting comes from single-trial functional brain imaging data, having a very low signal-to-noise ratio and a natural multi-view setting, with the different sensors, measurement modalities (EEG, MEG, fMRI) and possible auxiliary information as views. We demonstrate our model on a MEG dataset.Comment: Presented at MLINI-2015 workshop, 2015 (arXiv:1605.04435
    • …
    corecore