3,615 research outputs found
Multi-dimensional Boltzmann Sampling of Languages
This paper addresses the uniform random generation of words from a
context-free language (over an alphabet of size ), while constraining every
letter to a targeted frequency of occurrence. Our approach consists in a
multidimensional extension of Boltzmann samplers \cite{Duchon2004}. We show
that, under mostly \emph{strong-connectivity} hypotheses, our samplers return a
word of size in and exact frequency in
expected time. Moreover, if we accept tolerance
intervals of width in for the number of occurrences of each
letters, our samplers perform an approximate-size generation of words in
expected time. We illustrate these techniques on the
generation of Tetris tessellations with uniform statistics in the different
types of tetraminoes.Comment: 12p
Polynomial tuning of multiparametric combinatorial samplers
Boltzmann samplers and the recursive method are prominent algorithmic
frameworks for the approximate-size and exact-size random generation of large
combinatorial structures, such as maps, tilings, RNA sequences or various
tree-like structures. In their multiparametric variants, these samplers allow
to control the profile of expected values corresponding to multiple
combinatorial parameters. One can control, for instance, the number of leaves,
profile of node degrees in trees or the number of certain subpatterns in
strings. However, such a flexible control requires an additional non-trivial
tuning procedure. In this paper, we propose an efficient polynomial-time, with
respect to the number of tuned parameters, tuning algorithm based on convex
optimisation techniques. Finally, we illustrate the efficiency of our approach
using several applications of rational, algebraic and P\'olya structures
including polyomino tilings with prescribed tile frequencies, planar trees with
a given specific node degree distribution, and weighted partitions.Comment: Extended abstract, accepted to ANALCO2018. 20 pages, 6 figures,
colours. Implementation and examples are available at [1]
https://github.com/maciej-bendkowski/boltzmann-brain [2]
https://github.com/maciej-bendkowski/multiparametric-combinatorial-sampler
Generalized-ensemble simulations and cluster algorithms
The importance-sampling Monte Carlo algorithm appears to be the universally
optimal solution to the problem of sampling the state space of statistical
mechanical systems according to the relative importance of configurations for
the partition function or thermal averages of interest. While this is true in
terms of its simplicity and universal applicability, the resulting approach
suffers from the presence of temporal correlations of successive samples
naturally implied by the Markov chain underlying the importance-sampling
simulation. In many situations, these autocorrelations are moderate and can be
easily accounted for by an appropriately adapted analysis of simulation data.
They turn out to be a major hurdle, however, in the vicinity of phase
transitions or for systems with complex free-energy landscapes. The critical
slowing down close to continuous transitions is most efficiently reduced by the
application of cluster algorithms, where they are available. For first-order
transitions and disordered systems, on the other hand, macroscopic energy
barriers need to be overcome to prevent dynamic ergodicity breaking. In this
situation, generalized-ensemble techniques such as the multicanonical
simulation method can effect impressive speedups, allowing to sample the full
free-energy landscape. The Potts model features continuous as well as
first-order phase transitions and is thus a prototypic example for studying
phase transitions and new algorithmic approaches. I discuss the possibilities
of bringing together cluster and generalized-ensemble methods to combine the
benefits of both techniques. The resulting algorithm allows for the efficient
estimation of the random-cluster partition function encoding the information of
all Potts models, even with a non-integer number of states, for all
temperatures in a single simulation run per system size.Comment: 15 pages, 6 figures, proceedings of the 2009 Workshop of the Center
of Simulational Physics, Athens, G
Agnostic cosmology in the CAMEL framework
Cosmological parameter estimation is traditionally performed in the Bayesian
context. By adopting an "agnostic" statistical point of view, we show the
interest of confronting the Bayesian results to a frequentist approach based on
profile-likelihoods. To this purpose, we have developed the Cosmological
Analysis with a Minuit Exploration of the Likelihood ("CAMEL") software.
Written from scratch in pure C++, emphasis was put in building a clean and
carefully-designed project where new data and/or cosmological computations can
be easily included.
CAMEL incorporates the latest cosmological likelihoods and gives access from
the very same input file to several estimation methods: (i) A high quality
Maximum Likelihood Estimate (a.k.a "best fit") using MINUIT ; (ii) profile
likelihoods, (iii) a new implementation of an Adaptive Metropolis MCMC
algorithm that relieves the burden of reconstructing the proposal distribution.
We present here those various statistical techniques and roll out a full
use-case that can then used as a tutorial. We revisit the CDM
parameters determination with the latest Planck data and give results with both
methodologies. Furthermore, by comparing the Bayesian and frequentist
approaches, we discuss a "likelihood volume effect" that affects the optical
reionization depth when analyzing the high multipoles part of the Planck data.
The software, used in several Planck data analyzes, is available from
http://camel.in2p3.fr. Using it does not require advanced C++ skills.Comment: Typeset in Authorea. Online version available at:
https://www.authorea.com/users/90225/articles/104431/_show_articl
On the diversity of pattern distributions in rational language
International audienceIt is well known that, under some aperiodicity and irreducibility conditions, the number of occurrences of local patterns within a Markov chain (and, more generally, within the languages generated by weighted regular expressions/automata) follows a Gaussian distribu- tion with both variance and mean in (n). By contrast, when these conditions no longer hold, it has been denoted that the limiting distribution may follow a whole diversity of distributions, including the uniform, power-law or even multimodal distribution, arising as tradeo s between structural properties of the regular expression and the weight/probabilities associated with its transitions/letters. However these cases only partially cover the full diversity of behaviors induced within regular expressions, and a characterization of attainable distributions remained to be provided. In this article, we constructively show that the limiting distribution of the simplest foresee- able motif (a single letter!) may already follow an arbitrarily complex continuous distribution (or cadlag process). We also give applications in random generation (Boltzmann sampling) and bioinformatics (parsimonious segmentation of DNA)
Multidimensional integration through Markovian sampling under steered function morphing: a physical guise from statistical mechanics
We present a computational strategy for the evaluation of multidimensional
integrals on hyper-rectangles based on Markovian stochastic exploration of the
integration domain while the integrand is being morphed by starting from an
initial appropriate profile. Thanks to an abstract reformulation of Jarzynski's
equality applied in stochastic thermodynamics to evaluate the free-energy
profiles along selected reaction coordinates via non-equilibrium
transformations, it is possible to cast the original integral into the
exponential average of the distribution of the pseudo-work (that we may term
"computational work") involved in doing the function morphing, which is
straightforwardly solved. Several tests illustrate the basic implementation of
the idea, and show its performance in terms of computational time, accuracy and
precision. The formulation for integrand functions with zeros and possible sign
changes is also presented. It will be stressed that our usage of Jarzynski's
equality shares similarities with a practice already known in statistics as
Annealed Importance Sampling (AIS), when applied to computation of the
normalizing constants of distributions. In a sense, here we dress the AIS with
its "physical" counterpart borrowed from statistical mechanics.Comment: 3 figures Supplementary Material (pdf file named "JEMDI_SI.pdf"
- âŠ