2,984 research outputs found
Why one must use reweighting in Estimation Of Distribution Algorithms
International audienceWe study the update of the distribution in Estimation of Distribution Algorithms, and show that a simple modification leads to unbiased estimates of the optimum. The simple modification (based on a proper reweighting of estimates) leads to a strongly improved behavior in front of premature convergence
Lattice QCD with open boundary conditions and twisted-mass reweighting
Lattice QCD simulations at small lattice spacings and quark masses close to
their physical values are technically challenging. In particular, the
simulations can get trapped in the topological charge sectors of field space or
may run into instabilities triggered by accidental near-zero modes of the
lattice Dirac operator. As already noted in ref. [1], the first problem is
bypassed if open boundary conditions are imposed in the time direction, while
the second can potentially be overcome through twisted-mass determinant
reweighting [2]. In this paper, we show that twisted-mass reweighting works out
as expected in QCD with open boundary conditions and 2+1 flavours of O(a)
improved Wilson quarks. Further algorithmic improvements are tested as well and
a few physical quantities are computed for illustration.Comment: Plain TeX source, 27 pages, 7 figure
A Statistical Perspective on Algorithmic Leveraging
One popular method for dealing with large-scale data sets is sampling. For
example, by using the empirical statistical leverage scores as an importance
sampling distribution, the method of algorithmic leveraging samples and
rescales rows/columns of data matrices to reduce the data size before
performing computations on the subproblem. This method has been successful in
improving computational efficiency of algorithms for matrix problems such as
least-squares approximation, least absolute deviations approximation, and
low-rank matrix approximation. Existing work has focused on algorithmic issues
such as worst-case running times and numerical issues associated with providing
high-quality implementations, but none of it addresses statistical aspects of
this method.
In this paper, we provide a simple yet effective framework to evaluate the
statistical properties of algorithmic leveraging in the context of estimating
parameters in a linear regression model with a fixed number of predictors. We
show that from the statistical perspective of bias and variance, neither
leverage-based sampling nor uniform sampling dominates the other. This result
is particularly striking, given the well-known result that, from the
algorithmic perspective of worst-case analysis, leverage-based sampling
provides uniformly superior worst-case algorithmic results, when compared with
uniform sampling. Based on these theoretical results, we propose and analyze
two new leveraging algorithms. A detailed empirical evaluation of existing
leverage-based methods as well as these two new methods is carried out on both
synthetic and real data sets. The empirical results indicate that our theory is
a good predictor of practical performance of existing and new leverage-based
algorithms and that the new algorithms achieve improved performance.Comment: 44 pages, 17 figure
Approximating multivariate posterior distribution functions from Monte Carlo samples for sequential Bayesian inference
An important feature of Bayesian statistics is the opportunity to do
sequential inference: the posterior distribution obtained after seeing a
dataset can be used as prior for a second inference. However, when Monte Carlo
sampling methods are used for inference, we only have a set of samples from the
posterior distribution. To do sequential inference, we then either have to
evaluate the second posterior at only these locations and reweight the samples
accordingly, or we can estimate a functional description of the posterior
probability distribution from the samples and use that as prior for the second
inference. Here, we investigated to what extent we can obtain an accurate joint
posterior from two datasets if the inference is done sequentially rather than
jointly, under the condition that each inference step is done using Monte Carlo
sampling. To test this, we evaluated the accuracy of kernel density estimates,
Gaussian mixtures, vine copulas and Gaussian processes in approximating
posterior distributions, and then tested whether these approximations can be
used in sequential inference. In low dimensionality, Gaussian processes are
more accurate, whereas in higher dimensionality Gaussian mixtures or vine
copulas perform better. In our test cases, posterior approximations are
preferable over direct sample reweighting, although joint inference is still
preferable over sequential inference. Since the performance is case-specific,
we provide an R package mvdens with a unified interface for the density
approximation methods
- …