191 research outputs found
Donsker theorems for diffusions: Necessary and sufficient conditions
We consider the empirical process G_t of a one-dimensional diffusion with
finite speed measure, indexed by a collection of functions F. By the central
limit theorem for diffusions, the finite-dimensional distributions of G_t
converge weakly to those of a zero-mean Gaussian random process G. We prove
that the weak convergence G_t\Rightarrow G takes place in \ell^{\infty}(F) if
and only if the limit G exists as a tight, Borel measurable map. The proof
relies on majorizing measure techniques for continuous martingales.
Applications include the weak convergence of the local time density estimator
and the empirical distribution function on the full state space.Comment: Published at http://dx.doi.org/10.1214/009117905000000152 in the
Annals of Probability (http://www.imstat.org/aop/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Effective dynamics using conditional expectations
The question of coarse-graining is ubiquitous in molecular dynamics. In this
article, we are interested in deriving effective properties for the dynamics of
a coarse-grained variable , where describes the configuration of
the system in a high-dimensional space , and is a smooth function
with value in (typically a reaction coordinate). It is well known that,
given a Boltzmann-Gibbs distribution on , the equilibrium
properties on are completely determined by the free energy. On the
other hand, the question of the effective dynamics on is much more
difficult to address. Starting from an overdamped Langevin equation on , we propose an effective dynamics for using conditional
expectations. Using entropy methods, we give sufficient conditions for the time
marginals of the effective dynamics to be close to the original ones. We check
numerically on some toy examples that these sufficient conditions yield an
effective dynamics which accurately reproduces the residence times in the
potential energy wells. We also discuss the accuracy of the effective dynamics
in a pathwise sense, and the relevance of the free energy to build a
coarse-grained dynamics
Langevin and Hamiltonian based Sequential MCMC for Efficient Bayesian Filtering in High-dimensional Spaces
Nonlinear non-Gaussian state-space models arise in numerous applications in
statistics and signal processing. In this context, one of the most successful
and popular approximation techniques is the Sequential Monte Carlo (SMC)
algorithm, also known as particle filtering. Nevertheless, this method tends to
be inefficient when applied to high dimensional problems. In this paper, we
focus on another class of sequential inference methods, namely the Sequential
Markov Chain Monte Carlo (SMCMC) techniques, which represent a promising
alternative to SMC methods. After providing a unifying framework for the class
of SMCMC approaches, we propose novel efficient strategies based on the
principle of Langevin diffusion and Hamiltonian dynamics in order to cope with
the increasing number of high-dimensional applications. Simulation results show
that the proposed algorithms achieve significantly better performance compared
to existing algorithms
Quantifying Uncertainty with a Derivative Tracking SDE Model and Application to Wind Power Forecast Data
We develop a data-driven methodology based on parametric It\^{o}'s Stochastic
Differential Equations (SDEs) to capture the real asymmetric dynamics of
forecast errors. Our SDE framework features time-derivative tracking of the
forecast, time-varying mean-reversion parameter, and an improved
state-dependent diffusion term. Proofs of the existence, strong uniqueness, and
boundedness of the SDE solutions are shown under a principled condition for the
time-varying mean-reversion parameter. Inference based on approximate
likelihood, constructed through the moment-matching technique both in the
original forecast error space and in the Lamperti space, is performed through
numerical optimization procedures. We propose another contribution based on the
fixed-point likelihood optimization approach in the Lamperti space. All the
procedures are agnostic of the forecasting technology, and they enable
comparisons between different forecast providers. We apply our SDE framework to
model historical Uruguayan normalized wind power production and forecast data
between April and December 2019. Sharp empirical confidence bands of future
wind power production are obtained for the best-selected model.Comment: 28 pages and 11 figure
MCMC methods for functions modifying old algorithms to make\ud them faster
Many problems arising in applications result in the need\ud
to probe a probability distribution for functions. Examples include Bayesian nonparametric statistics and conditioned diffusion processes. Standard MCMC algorithms typically become arbitrarily slow under the mesh refinement dictated by nonparametric description of the unknown function. We describe an approach to modifying a whole range of MCMC methods which ensures that their speed of convergence is robust under mesh refinement. In the applications of interest the data is often sparse and the prior specification is an essential part of the overall modeling strategy. The algorithmic approach that we describe is applicable whenever the desired probability measure has density with respect to a Gaussian process or Gaussian random field prior, and to some useful non-Gaussian priors constructed through random truncation. Applications are shown in density estimation, data assimilation in fluid mechanics, subsurface geophysics and image registration. The key design principle is to formulate the MCMC method for functions. This leads to algorithms which can be implemented via minor modification of existing algorithms, yet which show enormous speed-up on a wide range of applied problems
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
Data assimilation in slow-fast systems using homogenized climate models
A deterministic multiscale toy model is studied in which a chaotic fast
subsystem triggers rare transitions between slow regimes, akin to weather or
climate regimes. Using homogenization techniques, a reduced stochastic
parametrization model is derived for the slow dynamics. The reliability of this
reduced climate model in reproducing the statistics of the slow dynamics of the
full deterministic model for finite values of the time scale separation is
numerically established. The statistics however is sensitive to uncertainties
in the parameters of the stochastic model. It is investigated whether the
stochastic climate model can be beneficial as a forecast model in an ensemble
data assimilation setting, in particular in the realistic setting when
observations are only available for the slow variables. The main result is that
reduced stochastic models can indeed improve the analysis skill, when used as
forecast models instead of the perfect full deterministic model. The stochastic
climate model is far superior at detecting transitions between regimes. The
observation intervals for which skill improvement can be obtained are related
to the characteristic time scales involved. The reason why stochastic climate
models are capable of producing superior skill in an ensemble setting is due to
the finite ensemble size; ensembles obtained from the perfect deterministic
forecast model lacks sufficient spread even for moderate ensemble sizes.
Stochastic climate models provide a natural way to provide sufficient ensemble
spread to detect transitions between regimes. This is corroborated with
numerical simulations. The conclusion is that stochastic parametrizations are
attractive for data assimilation despite their sensitivity to uncertainties in
the parameters.Comment: Accepted for publication in Journal of the Atmospheric Science
- …