3,615 research outputs found

    Multi-dimensional Boltzmann Sampling of Languages

    Get PDF
    This paper addresses the uniform random generation of words from a context-free language (over an alphabet of size kk), while constraining every letter to a targeted frequency of occurrence. Our approach consists in a multidimensional extension of Boltzmann samplers \cite{Duchon2004}. We show that, under mostly \emph{strong-connectivity} hypotheses, our samplers return a word of size in [(1−Δ)n,(1+Δ)n][(1-\varepsilon)n, (1+\varepsilon)n] and exact frequency in O(n1+k/2)\mathcal{O}(n^{1+k/2}) expected time. Moreover, if we accept tolerance intervals of width in Ω(n)\Omega(\sqrt{n}) for the number of occurrences of each letters, our samplers perform an approximate-size generation of words in expected O(n)\mathcal{O}(n) time. We illustrate these techniques on the generation of Tetris tessellations with uniform statistics in the different types of tetraminoes.Comment: 12p

    Polynomial tuning of multiparametric combinatorial samplers

    Full text link
    Boltzmann samplers and the recursive method are prominent algorithmic frameworks for the approximate-size and exact-size random generation of large combinatorial structures, such as maps, tilings, RNA sequences or various tree-like structures. In their multiparametric variants, these samplers allow to control the profile of expected values corresponding to multiple combinatorial parameters. One can control, for instance, the number of leaves, profile of node degrees in trees or the number of certain subpatterns in strings. However, such a flexible control requires an additional non-trivial tuning procedure. In this paper, we propose an efficient polynomial-time, with respect to the number of tuned parameters, tuning algorithm based on convex optimisation techniques. Finally, we illustrate the efficiency of our approach using several applications of rational, algebraic and P\'olya structures including polyomino tilings with prescribed tile frequencies, planar trees with a given specific node degree distribution, and weighted partitions.Comment: Extended abstract, accepted to ANALCO2018. 20 pages, 6 figures, colours. Implementation and examples are available at [1] https://github.com/maciej-bendkowski/boltzmann-brain [2] https://github.com/maciej-bendkowski/multiparametric-combinatorial-sampler

    Generalized-ensemble simulations and cluster algorithms

    Get PDF
    The importance-sampling Monte Carlo algorithm appears to be the universally optimal solution to the problem of sampling the state space of statistical mechanical systems according to the relative importance of configurations for the partition function or thermal averages of interest. While this is true in terms of its simplicity and universal applicability, the resulting approach suffers from the presence of temporal correlations of successive samples naturally implied by the Markov chain underlying the importance-sampling simulation. In many situations, these autocorrelations are moderate and can be easily accounted for by an appropriately adapted analysis of simulation data. They turn out to be a major hurdle, however, in the vicinity of phase transitions or for systems with complex free-energy landscapes. The critical slowing down close to continuous transitions is most efficiently reduced by the application of cluster algorithms, where they are available. For first-order transitions and disordered systems, on the other hand, macroscopic energy barriers need to be overcome to prevent dynamic ergodicity breaking. In this situation, generalized-ensemble techniques such as the multicanonical simulation method can effect impressive speedups, allowing to sample the full free-energy landscape. The Potts model features continuous as well as first-order phase transitions and is thus a prototypic example for studying phase transitions and new algorithmic approaches. I discuss the possibilities of bringing together cluster and generalized-ensemble methods to combine the benefits of both techniques. The resulting algorithm allows for the efficient estimation of the random-cluster partition function encoding the information of all Potts models, even with a non-integer number of states, for all temperatures in a single simulation run per system size.Comment: 15 pages, 6 figures, proceedings of the 2009 Workshop of the Center of Simulational Physics, Athens, G

    Agnostic cosmology in the CAMEL framework

    Full text link
    Cosmological parameter estimation is traditionally performed in the Bayesian context. By adopting an "agnostic" statistical point of view, we show the interest of confronting the Bayesian results to a frequentist approach based on profile-likelihoods. To this purpose, we have developed the Cosmological Analysis with a Minuit Exploration of the Likelihood ("CAMEL") software. Written from scratch in pure C++, emphasis was put in building a clean and carefully-designed project where new data and/or cosmological computations can be easily included. CAMEL incorporates the latest cosmological likelihoods and gives access from the very same input file to several estimation methods: (i) A high quality Maximum Likelihood Estimate (a.k.a "best fit") using MINUIT ; (ii) profile likelihoods, (iii) a new implementation of an Adaptive Metropolis MCMC algorithm that relieves the burden of reconstructing the proposal distribution. We present here those various statistical techniques and roll out a full use-case that can then used as a tutorial. We revisit the Λ\LambdaCDM parameters determination with the latest Planck data and give results with both methodologies. Furthermore, by comparing the Bayesian and frequentist approaches, we discuss a "likelihood volume effect" that affects the optical reionization depth when analyzing the high multipoles part of the Planck data. The software, used in several Planck data analyzes, is available from http://camel.in2p3.fr. Using it does not require advanced C++ skills.Comment: Typeset in Authorea. Online version available at: https://www.authorea.com/users/90225/articles/104431/_show_articl

    On the diversity of pattern distributions in rational language

    Get PDF
    International audienceIt is well known that, under some aperiodicity and irreducibility conditions, the number of occurrences of local patterns within a Markov chain (and, more generally, within the languages generated by weighted regular expressions/automata) follows a Gaussian distribu- tion with both variance and mean in (n). By contrast, when these conditions no longer hold, it has been denoted that the limiting distribution may follow a whole diversity of distributions, including the uniform, power-law or even multimodal distribution, arising as tradeo s between structural properties of the regular expression and the weight/probabilities associated with its transitions/letters. However these cases only partially cover the full diversity of behaviors induced within regular expressions, and a characterization of attainable distributions remained to be provided. In this article, we constructively show that the limiting distribution of the simplest foresee- able motif (a single letter!) may already follow an arbitrarily complex continuous distribution (or cadlag process). We also give applications in random generation (Boltzmann sampling) and bioinformatics (parsimonious segmentation of DNA)

    Multidimensional integration through Markovian sampling under steered function morphing: a physical guise from statistical mechanics

    Full text link
    We present a computational strategy for the evaluation of multidimensional integrals on hyper-rectangles based on Markovian stochastic exploration of the integration domain while the integrand is being morphed by starting from an initial appropriate profile. Thanks to an abstract reformulation of Jarzynski's equality applied in stochastic thermodynamics to evaluate the free-energy profiles along selected reaction coordinates via non-equilibrium transformations, it is possible to cast the original integral into the exponential average of the distribution of the pseudo-work (that we may term "computational work") involved in doing the function morphing, which is straightforwardly solved. Several tests illustrate the basic implementation of the idea, and show its performance in terms of computational time, accuracy and precision. The formulation for integrand functions with zeros and possible sign changes is also presented. It will be stressed that our usage of Jarzynski's equality shares similarities with a practice already known in statistics as Annealed Importance Sampling (AIS), when applied to computation of the normalizing constants of distributions. In a sense, here we dress the AIS with its "physical" counterpart borrowed from statistical mechanics.Comment: 3 figures Supplementary Material (pdf file named "JEMDI_SI.pdf"
    • 

    corecore