15 research outputs found
PyOED: An Extensible Suite for Data Assimilation and Model-Constrained Optimal Design of Experiments
This paper describes the first version (v1.0) of PyOED, a highly extensible
scientific package that enables developing and testing model-constrained
optimal experimental design (OED) for inverse problems. Specifically, PyOED
aims to be a comprehensive Python toolkit for model-constrained OED. The
package targets scientists and researchers interested in understanding the
details of OED formulations and approaches. It is also meant to enable
researchers to experiment with standard and innovative OED technologies with a
wide range of test problems (e.g., simulation models). Thus, PyOED is
continuously being expanded with a plethora of Bayesian inversion, DA, and OED
methods as well as new scientific simulation models, observation error models,
and observation operators. These pieces are added such that they can be
permuted to enable testing OED methods in various settings of varying
complexities. The PyOED core is completely written in Python and utilizes the
inherent object-oriented capabilities; however, the current version of PyOED is
meant to be extensible rather than scalable. Specifically, PyOED is developed
to ``enable rapid development and benchmarking of OED methods with minimal
coding effort and to maximize code reutilization.'' PyOED will be continuously
expanded with a plethora of Bayesian inversion, DA, and OED methods as well as
new scientific simulation models, observation error models, and observation
operators. This paper provides a brief description of the PyOED layout and
philosophy and provides a set of exemplary test cases and tutorials to
demonstrate how the package can be utilized.Comment: 26 pages, 7 figures, 21 code snippet
Sensor Clusterization in D-optimal Design in Infinite Dimensional Bayesian Inverse Problems
We investigate the problem of sensor clusterization in optimal experimental
design for infinite-dimensional Bayesian inverse problems. We suggest an
analytically tractable model for such designs and reason how it may lead to
sensor clusterization in the case of iid measurement noise. We also show that
in the case of spatially correlated measurement error clusterization does not
occur. As a part of the analysis we prove a matrix determinant lemma analog in
infinite dimensions, as well as a lemma for calculating derivatives of of operators.Comment: 19 pages, two figure
Randomized low-rank approximation of monotone matrix functions
This work is concerned with computing low-rank approximations of a matrix
function for a large symmetric positive semi-definite matrix , a task
that arises in, e.g., statistical learning and inverse problems. The
application of popular randomized methods, such as the randomized singular
value decomposition or the Nystr\"om approximation, to requires
multiplying with a few random vectors. A significant disadvantage of
such an approach, matrix-vector products with are considerably more
expensive than matrix-vector products with , even when carried out only
approximately via, e.g., the Lanczos method. In this work, we present and
analyze funNystr\"om, a simple and inexpensive method that constructs a
low-rank approximation of directly from a Nystr\"om approximation of
, completely bypassing the need for matrix-vector products with . It
is sensible to use funNystr\"om whenever is monotone and satisfies . Under the stronger assumption that is operator monotone, which includes
the matrix square root and the matrix logarithm , we
derive probabilistic bounds for the error in the Frobenius, nuclear, and
operator norms. These bounds confirm the numerical observation that
funNystr\"om tends to return an approximation that compares well with the best
low-rank approximation of . Our method is also of interest when
estimating quantities associated with , such as the trace or the diagonal
entries of . In particular, we propose and analyze funNystr\"om++, a
combination of funNystr\"om with the recently developed Hutch++ method for
trace estimation
Learning to compress and search visual data in large-scale systems
The problem of high-dimensional and large-scale representation of visual data
is addressed from an unsupervised learning perspective. The emphasis is put on
discrete representations, where the description length can be measured in bits
and hence the model capacity can be controlled. The algorithmic infrastructure
is developed based on the synthesis and analysis prior models whose
rate-distortion properties, as well as capacity vs. sample complexity
trade-offs are carefully optimized. These models are then extended to
multi-layers, namely the RRQ and the ML-STC frameworks, where the latter is
further evolved as a powerful deep neural network architecture with fast and
sample-efficient training and discrete representations. For the developed
algorithms, three important applications are developed. First, the problem of
large-scale similarity search in retrieval systems is addressed, where a
double-stage solution is proposed leading to faster query times and shorter
database storage. Second, the problem of learned image compression is targeted,
where the proposed models can capture more redundancies from the training
images than the conventional compression codecs. Finally, the proposed
algorithms are used to solve ill-posed inverse problems. In particular, the
problems of image denoising and compressive sensing are addressed with
promising results.Comment: PhD thesis dissertatio
High-dimensional Bayesian methods for interpretable nowcasting and risk estimation
This thesis presents new models for nowcasting and macro risk estimation using frontier Bayesian methods that enable incorporating Big Data into policy relevant prediction problems. We propose variable selection algorithms motivated from Bayesian
decision theory to make model outcomes interpretable to the policy maker.
In chapter 2, we propose a Bayesian Structural Time Series (BSTS) model for nowcasting GDP growth. This model jointly estimates latent time trends to capture
slow moving changes in economic conditions along-side a high dimensional mixed
frequency component that is extracted from higher frequency (monthly) cyclical information. We extend on previous implementations of the BSTS with priors and
variable selection methods which facilitate selection over latent time trends as well
as mixed-frequency information that remain tractable to the policy maker. Empirically, we provide a novel nowcast application where we use a large dimensional
set of Internet search terms to gain advance information about supply and demand
sentiment for the US economy before more commonly considered macro information
are available to the nowcaster. We find that our proposed BSTS model offers large
improvements over competing models and that Internet search terms matter for
nowcasts before hard information about the macro economy have been published.
A simulation exercise confirms the good performance of the proposed model.
Chapter 3 presents the T-SV-t-BMIDAS (Bayesian Mixed Data Sampling) model
for nowcasting quarterly GDP growth. The model incorporates a long-run time-varying trend (T) and t-distributed stochastic volatility accounting for outliers (SV-t) into a Bayesian multivariate MIDAS. To address the high-dimensionality of the
model, to account for group-correlation in mixed frequency data, and to make the
model interpretable to the policy maker, we propose a new combination of group-shrinkage prior with sparsification algorithm for variable selection. The prior flexibly
accommodates between-group sparsity and within-group correlation and allows to
communicate the joint importance of predictors over the data release cycle. We
evaluate the model for UK GDP growth nowcasts covering also the time-span of
the Covid-19 recession. The model is competitive prior to the pandemic relative to
various benchmark models, while yielding substantial nowcast improvements during
the pandemic. Contrary to many previous nowcasting approaches, the model reads
in sparse group signals from the data. Simulations show competitive performance
of the variable selection methodology, with particularly good performance to be
expected for highly correlated data as well as dense data-generating-processes.
Chapter 4 presents a new Bayesian Quantile Regression (BQR) model for high dimensional risk estimation. It extends the horseshoe prior to the BQR framework
and provides a fast sampling algorithm for computation that makes it efficient for
high-dimensional problems. A large scale simulation exercise reveals that compared
to alternative shrinkage priors, the proposed methods yield better performance in
coefficient bias and forecast error, especially in sparse data-generating processes
and in estimating extreme quantiles. In a high dimensional Growth-at-Risk forecasting application, we forecast tail risks as well as complete forecast densities using
a database covering over 200 variables related to the U.S. economy. Quantile specific and density calibration score functions show that the horseshoe prior provides
the best performance compared to competing Bayesian quantile regression priors,
especially at short and medium run horizons.
Bayesian quantile regression models with continuous shrinkage priors are known to
predict well but are hard to interpret due to lack of exact posterior sparsity. Chapter 5 bridges this gap by extending the idea of decoupling shrinkage and sparsity.
The proposed procedure follows two steps: First, the quantile regression posterior is
shrunk via state of the art continuous shrinkage priors; then, the posterior is sparsified by taking the Bayes optimal solution to maximising a policy maker’s utility
function with joint preference for predictive accuracy as well as sparsity. For the
sparsification component, we propose a new variant of the signal adaptive variable
selection algorithm that automates the choice of penalization in the integrated tility
through a quantile specific loss-function that works well in high dimensions. Large
scale simulations show that, compared to the un-sparsified regression posterior, the
selection procedure decreases coefficient bias irrespective of the true underlying degree of sparsity in the data, and goodness of variable selection is competitive with
traditional variable selection priors. A high dimensional Growth-at-Risk forecasting
application to the US shows that the method detects varying degrees of sparsity
across the conditional GDP distribution and that the sources to downside risk vary
substantially over time.
Inspired by the work of Giannone et al. (2021) on the “illusion of sparsity” from
sparse modelling techniques, this chapter (6) investigates whether the recently popularised global-local priors, firstly, are implicitly informative about sparsity and,
secondly, whether they are able to communicate the true degree of sparsity from
the data. We consider two methods of analysis: implicit model size distributions
and sparsification techniques which are tested on a host of economic data sets and
simulations. The findings motivate a new horseshoe type model to which we add a
prior that makes it a-priori agnostic about the degree of sparsity and is shown to be
competitive to the spike-and-slab of Giannone et al. (2021) for forecasting as well
as sparsity detection.
Chapter 7 concludes with summaries, limitations of the thesis, as well as directions
for future research
Scalable Bayesian sparse learning in high-dimensional model
Nowadays, high-dimensional models, where the number of parameters or features can even be larger than the number of observations are encountered on a fairly regular basis due to advancements in modern computation. For example, in gene expression datasets, we often encounter datasets with observations in the order of at most a few hundred and with predictors from thousands of genes. One of the goals is to identify the genes which are relevant to the expression. Another example is model compression, which aims to alleviate the costs of large model sizes. The former example is the variable or feature selection problem, while the latter is the model selection problem.
In the Bayesian framework, we often specify shrinkage priors that induce sparsity in the model. The sparsity-inducing prior will have a high concentration around zero to identify the zero coefficient and heavy tails to capture the non-zero element.
In this thesis, we first provide an overview of the most well-known sparsity-inducing priors. Then we propose to use prior with a partially collapsed Gibbs (PCG) sampler 2 to explore the high dimensional parameter space in linear regression models and variable selection is achieved through credible intervals. We also develop a coordinate-wise optimization for posterior mode search with theoretical guarantees. We then extend the PCG sampler to develop a scalable ordinal regression model with a real application in the study of student evaluation of surveys. Next, we move to modern deep learning. A constrained variational Adam (CVA) algorithm has been introduced to optimize the Bayesian neural network and its connection to stochastic gradient Hamiltonian Monte Carlo has been discussed. We then generalize our algorithm to constrained variational Adam with expectation maximization (CVA-EM), which incorporates the spike-and-slab prior to capturing the sparsity of the neural network. Both nonlinear high dimensional variable selection and network pruning can be achieved by this algorithm. We further show that the CVA-EM algorithm can extend to the graph neural networks to produce both sparse graphs and sparse weights. Finally, we discuss the sparse VAE with prior as potential future work
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum