3,326 research outputs found

    Time-varying Autoregression with Low Rank Tensors

    Full text link
    We present a windowed technique to learn parsimonious time-varying autoregressive models from multivariate timeseries. This unsupervised method uncovers interpretable spatiotemporal structure in data via non-smooth and non-convex optimization. In each time window, we assume the data follow a linear model parameterized by a system matrix, and we model this stack of potentially different system matrices as a low rank tensor. Because of its structure, the model is scalable to high-dimensional data and can easily incorporate priors such as smoothness over time. We find the components of the tensor using alternating minimization and prove that any stationary point of this algorithm is a local minimum. We demonstrate on a synthetic example that our method identifies the true rank of a switching linear system in the presence of noise. We illustrate our model's utility and superior scalability over extant methods when applied to several synthetic and real-world example: two types of time-varying linear systems, worm behavior, sea surface temperature, and monkey brain datasets

    The unified maximum a posteriori (MAP) framework for neuronal system identification

    Full text link
    The functional relationship between an input and a sensory neuron's response can be described by the neuron's stimulus-response mapping function. A general approach for characterizing the stimulus-response mapping function is called system identification. Many different names have been used for the stimulus-response mapping function: kernel or transfer function, transducer, spatiotemporal receptive field. Many algorithms have been developed to estimate a neuron's mapping function from an ensemble of stimulus-response pairs. These include the spike-triggered average, normalized reverse correlation, linearized reverse correlation, ridge regression, local spectral reverse correlation, spike-triggered covariance, artificial neural networks, maximally informative dimensions, kernel regression, boosting, and models based on leaky integrate-and-fire neurons. Because many of these system identification algorithms were developed in other disciplines, they seem very different superficially and bear little relationship with each other. Each algorithm makes different assumptions about the neuron and how the data is generated. Without a unified framework it is difficult to select the most suitable algorithm for estimating the neuron's mapping function. In this review, we present a unified framework for describing these algorithms called maximum a posteriori estimation (MAP). In the MAP framework, the implicit assumptions built into any system identification algorithm are made explicit in three MAP constituents: model class, noise distributions, and priors. Understanding the interplay between these three MAP constituents will simplify the task of selecting the most appropriate algorithms for a given data set. The MAP framework can also facilitate the development of novel system identification algorithms by incorporating biophysically plausible assumptions and mechanisms into the MAP constituents.Comment: affiliations change

    Shallow Updates for Deep Reinforcement Learning

    Full text link
    Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods with linear representations, on the other hand, are more stable and require less hyper parameter tuning. Yet, substantial feature engineering is necessary to achieve good results. In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method. We do this by periodically re-training the last hidden layer of a DRL network with a batch least squares update. Key to our approach is a Bayesian regularization term for the least squares update, which prevents over-fitting to the more recent data. We tested LS-DQN on five Atari games and demonstrate significant improvement over vanilla DQN and Double-DQN. We also investigated the reasons for the superior performance of our method. Interestingly, we found that the performance improvement can be attributed to the large batch size used by the LS method when optimizing the last layer

    Bayesian Extensions of Kernel Least Mean Squares

    Full text link
    The kernel least mean squares (KLMS) algorithm is a computationally efficient nonlinear adaptive filtering method that "kernelizes" the celebrated (linear) least mean squares algorithm. We demonstrate that the least mean squares algorithm is closely related to the Kalman filtering, and thus, the KLMS can be interpreted as an approximate Bayesian filtering method. This allows us to systematically develop extensions of the KLMS by modifying the underlying state-space and observation models. The resulting extensions introduce many desirable properties such as "forgetting", and the ability to learn from discrete data, while retaining the computational simplicity and time complexity of the original algorithm.Comment: 7 pages, 4 fiure

    Regularizing Bayesian Predictive Regressions

    Full text link
    We show that regularizing Bayesian predictive regressions provides a framework for prior sensitivity analysis. We develop a procedure that jointly regularizes expectations and variance-covariance matrices using a pair of shrinkage priors. Our methodology applies directly to vector autoregressions (VAR) and seemingly unrelated regressions (SUR). The regularization path provides a prior sensitivity diagnostic. By exploiting a duality between regularization penalties and predictive prior distributions, we reinterpret two classic Bayesian analyses of macro-finance studies: equity premium predictability and forecasting macroeconomic growth rates. We find there exist plausible prior specifications for predictability in excess S&P 500 index returns using book-to-market ratios, CAY (consumption, wealth, income ratio), and T-bill rates. We evaluate the forecasts using a market-timing strategy, and we show the optimally regularized solution outperforms a buy-and-hold approach. A second empirical application involves forecasting industrial production, inflation, and consumption growth rates, and demonstrates the feasibility of our approach

    Bayesian Fused Lasso regression for dynamic binary networks

    Full text link
    We propose a multinomial logistic regression model for link prediction in a time series of directed binary networks. To account for the dynamic nature of the data we employ a dynamic model for the model parameters that is strongly connected with the fused lasso penalty. In addition to promoting sparseness, this prior allows us to explore the presence of change points in the structure of the network. We introduce fast computational algorithms for estimation and prediction using both optimization and Bayesian approaches. The performance of the model is illustrated using simulated data and data from a financial trading network in the NYMEX natural gas futures market. Supplementary material containing the trading network data set and code to implement the algorithms is available online

    Regularized brain reading with shrinkage and smoothing

    Full text link
    Functional neuroimaging measures how the brain responds to complex stimuli. However, sample sizes are modest, noise is substantial, and stimuli are high dimensional. Hence, direct estimates are inherently imprecise and call for regularization. We compare a suite of approaches which regularize via shrinkage: ridge regression, the elastic net (a generalization of ridge regression and the lasso), and a hierarchical Bayesian model based on small area estimation (SAE). We contrast regularization with spatial smoothing and combinations of smoothing and shrinkage. All methods are tested on functional magnetic resonance imaging (fMRI) data from multiple subjects participating in two different experiments related to reading, for both predicting neural response to stimuli and decoding stimuli from responses. Interestingly, when the regularization parameters are chosen by cross-validation independently for every voxel, low/high regularization is chosen in voxels where the classification accuracy is high/low, indicating that the regularization intensity is a good tool for identification of relevant voxels for the cognitive task. Surprisingly, all the regularization methods work about equally well, suggesting that beating basic smoothing and shrinkage will take not only clever methods, but also careful modeling.Comment: Published at http://dx.doi.org/10.1214/15-AOAS837 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Global sensitivity analysis for statistical model parameters

    Full text link
    Global sensitivity analysis (GSA) is frequently used to analyze the influence of uncertain parameters in mathematical models and simulations. In principle, tools from GSA may be extended to analyze the influence of parameters in statistical models. Such analyses may enable reduced or parsimonious modeling and greater predictive capability. However, difficulties such as parameter correlation, model stochasticity, multivariate model output, and unknown parameter distributions prohibit a direct application of GSA tools to statistical models. By leveraging a loss function associated with the statistical model, we introduce a novel framework to address these difficulties and enable efficient GSA for statistical model parameters. Theoretical and computational properties are considered and illustrated on a synthetic example. The framework is applied to a Gaussian process model from the literature, which depends on 95 parameters. Non-influential parameters are discovered through GSA and a reduced model with equal or stronger predictive capability is constructed by using only 79 parameters.Comment: revision

    Expectation Propagation for Nonlinear Inverse Problems -- with an Application to Electrical Impedance Tomography

    Full text link
    In this paper, we study a fast approximate inference method based on expectation propagation for exploring the posterior probability distribution arising from the Bayesian formulation of nonlinear inverse problems. It is capable of efficiently delivering reliable estimates of the posterior mean and covariance, thereby providing an inverse solution together with quantified uncertainties. Some theoretical properties of the iterative algorithm are discussed, and the efficient implementation for an important class of problems of projection type is described. The method is illustrated with one typical nonlinear inverse problem, electrical impedance tomography with complete electrode model, under sparsity constraints. Numerical results for real experimental data are presented, and compared with that by Markov chain Monte Carlo. The results indicate that the method is accurate and computationally very efficient.Comment: Journal of Computational Physics, to appea

    Statistical modeling of rates and trends in Holocene relative sea level

    Full text link
    Characterizing the spatio-temporal variability of relative sea level (RSL) and estimating local, regional, and global RSL trends requires statistical analysis of RSL data. Formal statistical treatments, needed to account for the spatially and temporally sparse distribution of data and for geochronological and elevational uncertainties, have advanced considerably over the last decade. Time-series models have adopted more flexible and physically-informed specifications with more rigorous quantification of uncertainties. Spatio-temporal models have evolved from simple regional averaging to frameworks that more richly represent the correlation structure of RSL across space and time. More complex statistical approaches enable rigorous quantification of spatial and temporal variability, the combination of geographically disparate data, and the separation of the RSL field into various components associated with different driving processes. We review the range of statistical modeling and analysis choices used in the literature, reformulating them for ease of comparison in a common hierarchical statistical framework. The hierarchical framework separates each model into different levels, clearly partitioning measurement and inferential uncertainty from process variability. Placing models in a hierarchical framework enables us to highlight both the similarities and differences among modeling and analysis choices. We illustrate the implications of some modeling and analysis choices currently used in the literature by comparing the results of their application to common datasets within a hierarchical framework. In light of the complex patterns of spatial and temporal variability exhibited by RSL, we recommend non-parametric approaches for modeling temporal and spatio-temporal RSL.Comment: 30 pages, 7 figure
    • …
    corecore