3,326 research outputs found
Time-varying Autoregression with Low Rank Tensors
We present a windowed technique to learn parsimonious time-varying
autoregressive models from multivariate timeseries. This unsupervised method
uncovers interpretable spatiotemporal structure in data via non-smooth and
non-convex optimization. In each time window, we assume the data follow a
linear model parameterized by a system matrix, and we model this stack of
potentially different system matrices as a low rank tensor. Because of its
structure, the model is scalable to high-dimensional data and can easily
incorporate priors such as smoothness over time. We find the components of the
tensor using alternating minimization and prove that any stationary point of
this algorithm is a local minimum. We demonstrate on a synthetic example that
our method identifies the true rank of a switching linear system in the
presence of noise. We illustrate our model's utility and superior scalability
over extant methods when applied to several synthetic and real-world example:
two types of time-varying linear systems, worm behavior, sea surface
temperature, and monkey brain datasets
The unified maximum a posteriori (MAP) framework for neuronal system identification
The functional relationship between an input and a sensory neuron's response
can be described by the neuron's stimulus-response mapping function. A general
approach for characterizing the stimulus-response mapping function is called
system identification. Many different names have been used for the
stimulus-response mapping function: kernel or transfer function, transducer,
spatiotemporal receptive field. Many algorithms have been developed to estimate
a neuron's mapping function from an ensemble of stimulus-response pairs. These
include the spike-triggered average, normalized reverse correlation, linearized
reverse correlation, ridge regression, local spectral reverse correlation,
spike-triggered covariance, artificial neural networks, maximally informative
dimensions, kernel regression, boosting, and models based on leaky
integrate-and-fire neurons. Because many of these system identification
algorithms were developed in other disciplines, they seem very different
superficially and bear little relationship with each other. Each algorithm
makes different assumptions about the neuron and how the data is generated.
Without a unified framework it is difficult to select the most suitable
algorithm for estimating the neuron's mapping function. In this review, we
present a unified framework for describing these algorithms called maximum a
posteriori estimation (MAP). In the MAP framework, the implicit assumptions
built into any system identification algorithm are made explicit in three MAP
constituents: model class, noise distributions, and priors. Understanding the
interplay between these three MAP constituents will simplify the task of
selecting the most appropriate algorithms for a given data set. The MAP
framework can also facilitate the development of novel system identification
algorithms by incorporating biophysically plausible assumptions and mechanisms
into the MAP constituents.Comment: affiliations change
Shallow Updates for Deep Reinforcement Learning
Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN)
have achieved state-of-the-art results in a variety of challenging,
high-dimensional domains. This success is mainly attributed to the power of
deep neural networks to learn rich domain representations for approximating the
value function or policy. Batch reinforcement learning methods with linear
representations, on the other hand, are more stable and require less hyper
parameter tuning. Yet, substantial feature engineering is necessary to achieve
good results. In this work we propose a hybrid approach -- the Least Squares
Deep Q-Network (LS-DQN), which combines rich feature representations learned by
a DRL algorithm with the stability of a linear least squares method. We do this
by periodically re-training the last hidden layer of a DRL network with a batch
least squares update. Key to our approach is a Bayesian regularization term for
the least squares update, which prevents over-fitting to the more recent data.
We tested LS-DQN on five Atari games and demonstrate significant improvement
over vanilla DQN and Double-DQN. We also investigated the reasons for the
superior performance of our method. Interestingly, we found that the
performance improvement can be attributed to the large batch size used by the
LS method when optimizing the last layer
Bayesian Extensions of Kernel Least Mean Squares
The kernel least mean squares (KLMS) algorithm is a computationally efficient
nonlinear adaptive filtering method that "kernelizes" the celebrated (linear)
least mean squares algorithm. We demonstrate that the least mean squares
algorithm is closely related to the Kalman filtering, and thus, the KLMS can be
interpreted as an approximate Bayesian filtering method. This allows us to
systematically develop extensions of the KLMS by modifying the underlying
state-space and observation models. The resulting extensions introduce many
desirable properties such as "forgetting", and the ability to learn from
discrete data, while retaining the computational simplicity and time complexity
of the original algorithm.Comment: 7 pages, 4 fiure
Regularizing Bayesian Predictive Regressions
We show that regularizing Bayesian predictive regressions provides a
framework for prior sensitivity analysis. We develop a procedure that jointly
regularizes expectations and variance-covariance matrices using a pair of
shrinkage priors. Our methodology applies directly to vector autoregressions
(VAR) and seemingly unrelated regressions (SUR). The regularization path
provides a prior sensitivity diagnostic. By exploiting a duality between
regularization penalties and predictive prior distributions, we reinterpret two
classic Bayesian analyses of macro-finance studies: equity premium
predictability and forecasting macroeconomic growth rates. We find there exist
plausible prior specifications for predictability in excess S&P 500 index
returns using book-to-market ratios, CAY (consumption, wealth, income ratio),
and T-bill rates. We evaluate the forecasts using a market-timing strategy, and
we show the optimally regularized solution outperforms a buy-and-hold approach.
A second empirical application involves forecasting industrial production,
inflation, and consumption growth rates, and demonstrates the feasibility of
our approach
Bayesian Fused Lasso regression for dynamic binary networks
We propose a multinomial logistic regression model for link prediction in a
time series of directed binary networks. To account for the dynamic nature of
the data we employ a dynamic model for the model parameters that is strongly
connected with the fused lasso penalty. In addition to promoting sparseness,
this prior allows us to explore the presence of change points in the structure
of the network. We introduce fast computational algorithms for estimation and
prediction using both optimization and Bayesian approaches. The performance of
the model is illustrated using simulated data and data from a financial trading
network in the NYMEX natural gas futures market. Supplementary material
containing the trading network data set and code to implement the algorithms is
available online
Regularized brain reading with shrinkage and smoothing
Functional neuroimaging measures how the brain responds to complex stimuli.
However, sample sizes are modest, noise is substantial, and stimuli are high
dimensional. Hence, direct estimates are inherently imprecise and call for
regularization. We compare a suite of approaches which regularize via
shrinkage: ridge regression, the elastic net (a generalization of ridge
regression and the lasso), and a hierarchical Bayesian model based on small
area estimation (SAE). We contrast regularization with spatial smoothing and
combinations of smoothing and shrinkage. All methods are tested on functional
magnetic resonance imaging (fMRI) data from multiple subjects participating in
two different experiments related to reading, for both predicting neural
response to stimuli and decoding stimuli from responses. Interestingly, when
the regularization parameters are chosen by cross-validation independently for
every voxel, low/high regularization is chosen in voxels where the
classification accuracy is high/low, indicating that the regularization
intensity is a good tool for identification of relevant voxels for the
cognitive task. Surprisingly, all the regularization methods work about equally
well, suggesting that beating basic smoothing and shrinkage will take not only
clever methods, but also careful modeling.Comment: Published at http://dx.doi.org/10.1214/15-AOAS837 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Global sensitivity analysis for statistical model parameters
Global sensitivity analysis (GSA) is frequently used to analyze the influence
of uncertain parameters in mathematical models and simulations. In principle,
tools from GSA may be extended to analyze the influence of parameters in
statistical models. Such analyses may enable reduced or parsimonious modeling
and greater predictive capability. However, difficulties such as parameter
correlation, model stochasticity, multivariate model output, and unknown
parameter distributions prohibit a direct application of GSA tools to
statistical models. By leveraging a loss function associated with the
statistical model, we introduce a novel framework to address these difficulties
and enable efficient GSA for statistical model parameters. Theoretical and
computational properties are considered and illustrated on a synthetic example.
The framework is applied to a Gaussian process model from the literature, which
depends on 95 parameters. Non-influential parameters are discovered through GSA
and a reduced model with equal or stronger predictive capability is constructed
by using only 79 parameters.Comment: revision
Expectation Propagation for Nonlinear Inverse Problems -- with an Application to Electrical Impedance Tomography
In this paper, we study a fast approximate inference method based on
expectation propagation for exploring the posterior probability distribution
arising from the Bayesian formulation of nonlinear inverse problems. It is
capable of efficiently delivering reliable estimates of the posterior mean and
covariance, thereby providing an inverse solution together with quantified
uncertainties. Some theoretical properties of the iterative algorithm are
discussed, and the efficient implementation for an important class of problems
of projection type is described. The method is illustrated with one typical
nonlinear inverse problem, electrical impedance tomography with complete
electrode model, under sparsity constraints. Numerical results for real
experimental data are presented, and compared with that by Markov chain Monte
Carlo. The results indicate that the method is accurate and computationally
very efficient.Comment: Journal of Computational Physics, to appea
Statistical modeling of rates and trends in Holocene relative sea level
Characterizing the spatio-temporal variability of relative sea level (RSL)
and estimating local, regional, and global RSL trends requires statistical
analysis of RSL data. Formal statistical treatments, needed to account for the
spatially and temporally sparse distribution of data and for geochronological
and elevational uncertainties, have advanced considerably over the last decade.
Time-series models have adopted more flexible and physically-informed
specifications with more rigorous quantification of uncertainties.
Spatio-temporal models have evolved from simple regional averaging to
frameworks that more richly represent the correlation structure of RSL across
space and time. More complex statistical approaches enable rigorous
quantification of spatial and temporal variability, the combination of
geographically disparate data, and the separation of the RSL field into various
components associated with different driving processes. We review the range of
statistical modeling and analysis choices used in the literature, reformulating
them for ease of comparison in a common hierarchical statistical framework. The
hierarchical framework separates each model into different levels, clearly
partitioning measurement and inferential uncertainty from process variability.
Placing models in a hierarchical framework enables us to highlight both the
similarities and differences among modeling and analysis choices. We illustrate
the implications of some modeling and analysis choices currently used in the
literature by comparing the results of their application to common datasets
within a hierarchical framework. In light of the complex patterns of spatial
and temporal variability exhibited by RSL, we recommend non-parametric
approaches for modeling temporal and spatio-temporal RSL.Comment: 30 pages, 7 figure
- …