330 research outputs found
VAE: Encoding stochastic process priors with variational autoencoders
Stochastic processes provide a mathematically elegant way model complex data.
In theory, they provide flexible priors over function classes that can encode a
wide range of interesting assumptions. In practice, however, efficient
inference by optimisation or marginalisation is difficult, a problem further
exacerbated with big data and high dimensional input spaces. We propose a novel
variational autoencoder (VAE) called the prior encoding variational autoencoder
(VAE). The VAE is finitely exchangeable and Kolmogorov consistent,
and thus is a continuous stochastic process. We use VAE to learn low
dimensional embeddings of function classes. We show that our framework can
accurately learn expressive function classes such as Gaussian processes, but
also properties of functions to enable statistical inference (such as the
integral of a log Gaussian process). For popular tasks, such as spatial
interpolation, VAE achieves state-of-the-art performance both in terms of
accuracy and computational efficiency. Perhaps most usefully, we demonstrate
that the low dimensional independently distributed latent space representation
learnt provides an elegant and scalable means of performing Bayesian inference
for stochastic processes within probabilistic programming languages such as
Stan
Seq2Seq Surrogates of Epidemic Models to Facilitate Bayesian Inference
Epidemic models are powerful tools in understanding infectious disease.
However, as they increase in size and complexity, they can quickly become
computationally intractable. Recent progress in modelling methodology has shown
that surrogate models can be used to emulate complex epidemic models with a
high-dimensional parameter space. We show that deep sequence-to-sequence
(seq2seq) models can serve as accurate surrogates for complex epidemic models
with sequence based model parameters, effectively replicating seasonal and
long-term transmission dynamics. Once trained, our surrogate can predict
scenarios a several thousand times faster than the original model, making them
ideal for policy exploration. We demonstrate that replacing a traditional
epidemic model with a learned simulator facilitates robust Bayesian inference
PriorVAE: Encoding spatial priors with VAEs for small-area estimation
Gaussian processes (GPs), implemented through multivariate Gaussian
distributions for a finite collection of data, are the most popular approach in
small-area spatial statistical modelling. In this context they are used to
encode correlation structures over space and can generalise well in
interpolation tasks. Despite their flexibility, off-the-shelf GPs present
serious computational challenges which limit their scalability and practical
usefulness in applied settings. Here, we propose a novel, deep generative
modelling approach to tackle this challenge, termed PriorVAE: for a particular
spatial setting, we approximate a class of GP priors through prior sampling and
subsequent fitting of a variational autoencoder (VAE). Given a trained VAE, the
resultant decoder allows spatial inference to become incredibly efficient due
to the low dimensional, independently distributed latent Gaussian space
representation of the VAE. Once trained, inference using the VAE decoder
replaces the GP within a Bayesian sampling framework. This approach provides
tractable and easy-to-implement means of approximately encoding spatial priors
and facilitates efficient statistical inference. We demonstrate the utility of
our VAE two stage approach on Bayesian, small-area estimation tasks
Spatial Analysis Made Easy with Linear Regression and Kernels
Kernel methods are a popular technique for extending linear models to handle non-linear spatial problems via a mapping to an implicit, high-dimensional feature space. While kernel methods are computationally cheaper than an explicit feature mapping, they are still subject to cubic cost on the number of points. Given only a few thousand locations, this computational cost rapidly outstrips the currently available computational power. This paper aims to provide an overview of kernel methods from first-principals (with a focus on ridge regression) and progress to a review of random Fourier features (RFF), a method that enables the scaling of kernel methods to big datasets. We show how the RFF method is capable of approximating the full kernel matrix, providing a significant computational speed-up for a negligible cost to accuracy and can be incorporated into many existing spatial methods using only a few lines of code. We give an example of the implementation of RFFs on a simulated spatial data set to illustrate these properties. Lastly, we summarise the main issues with RFFs and highlight some of the advanced techniques aimed at alleviating them. At each stage, the associated R code is provided
Leaping through tree space: continuous phylogenetic inference for rooted and unrooted trees
Phylogenetics is now fundamental in life sciences, providing insights into
the earliest branches of life and the origins and spread of epidemics. However,
finding suitable phylogenies from the vast space of possible trees remains
challenging. To address this problem, for the first time, we perform both tree
exploration and inference in a continuous space where the computation of
gradients is possible. This continuous relaxation allows for major leaps across
tree space in both rooted and unrooted trees, and is less susceptible to
convergence to local minima. Our approach outperforms the current best methods
for inference on unrooted trees and, in simulation, accurately infers the tree
and root in ultrametric cases. The approach is effective in cases of empirical
data with negligible amounts of data, which we demonstrate on the phylogeny of
jawed vertebrates. Indeed, only a few genes with an ultrametric signal were
generally sufficient for resolving the major lineages of vertebrate. With
cubic-time complexity and efficient optimisation via automatic differentiation,
our method presents an effective way forwards for exploring the most difficult,
data-deficient phylogenetic questions.Comment: 13 pages, 4 figures, 14 supplementary pages, 2 supplementary figure
The interaction of transmission intensity, mortality, and the economy: a retrospective analysis of the COVID-19 pandemic
The COVID-19 pandemic has caused over 6.4 million registered deaths to date,
and has had a profound impact on economic activity. Here, we study the
interaction of transmission, mortality, and the economy during the SARS-CoV-2
pandemic from January 2020 to December 2022 across 25 European countries. We
adopt a Bayesian vector autoregressive model with both fixed and random
effects. We find that increases in disease transmission intensity decreases
Gross domestic product (GDP) and increases daily excess deaths, with a longer
lasting impact on excess deaths in comparison to GDP, which recovers more
rapidly. Broadly, our results reinforce the intuitive phenomenon that
significant economic activity arises from diverse person-to-person
interactions. We report on the effectiveness of non-pharmaceutical
interventions (NPIs) on transmission intensity, excess deaths and changes in
GDP, and resulting implications for policy makers. Our results highlight a
complex cost-benefit trade off from individual NPIs. For example, banning
international travel increases GDP however reduces excess deaths. We consider
country random effects and their associations with excess changes in GDP and
excess deaths. For example, more developed countries in Europe typically had
more cautious approaches to the COVID-19 pandemic, prioritising healthcare and
excess deaths over economic performance. Long term economic impairments are not
fully captured by our model, as well as long term disease effects (Long Covid).
Our results highlight that the impact of disease on a country is complex and
multifaceted, and simple heuristic conclusions to extract the best outcome from
the economy and disease burden are challenging
Refining the Global Spatial Limits of Dengue Virus Transmission by Evidence-Based Consensus
Background: Dengue is a growing problem both in its geographical spread and in its intensity, and yet current global distribution remains highly uncertain. Challenges in diagnosis and diagnostic methods as well as highly variable national health systems mean no single data source can reliably estimate the distribution of this disease. As such, there is a lack of agreement on national dengue status among international health organisations. Here we bring together all available information on dengue occurrence using a novel approach to produce an evidence consensus map of the disease range that highlights nations with an uncertain dengue status.
Methods/Principle Findings: A baseline methodology was used to assess a range of evidence for each country. In regions where dengue status was uncertain, additional evidence types were included to either clarify dengue status or confirm that it is unknown at this time. An algorithm was developed that assesses evidence quality and consistency, giving each country an evidence consensus score. Using this approach, we were able to generate a contemporary global map of national-level dengue status that assigns a relative measure of certainty and identifies gaps in the available evidence
- …