20,176 research outputs found

    Multiresolution analysis in statistical mechanics. I. Using wavelets to calculate thermodynamic properties

    Full text link
    The wavelet transform, a family of orthonormal bases, is introduced as a technique for performing multiresolution analysis in statistical mechanics. The wavelet transform is a hierarchical technique designed to separate data sets into sets representing local averages and local differences. Although one-to-one transformations of data sets are possible, the advantage of the wavelet transform is as an approximation scheme for the efficient calculation of thermodynamic and ensemble properties. Even under the most drastic of approximations, the resulting errors in the values obtained for average absolute magnetization, free energy, and heat capacity are on the order of 10%, with a corresponding computational efficiency gain of two orders of magnitude for a system such as a 4Ă—44\times 4 Ising lattice. In addition, the errors in the results tend toward zero in the neighborhood of fixed points, as determined by renormalization group theory.Comment: 13 pages plus 7 figures (PNG

    Maximum Fidelity

    Full text link
    The most fundamental problem in statistics is the inference of an unknown probability distribution from a finite number of samples. For a specific observed data set, answers to the following questions would be desirable: (1) Estimation: Which candidate distribution provides the best fit to the observed data?, (2) Goodness-of-fit: How concordant is this distribution with the observed data?, and (3) Uncertainty: How concordant are other candidate distributions with the observed data? A simple unified approach for univariate data that addresses these traditionally distinct statistical notions is presented called "maximum fidelity". Maximum fidelity is a strict frequentist approach that is fundamentally based on model concordance with the observed data. The fidelity statistic is a general information measure based on the coordinate-independent cumulative distribution and critical yet previously neglected symmetry considerations. An approximation for the null distribution of the fidelity allows its direct conversion to absolute model concordance (p value). Fidelity maximization allows identification of the most concordant model distribution, generating a method for parameter estimation, with neighboring, less concordant distributions providing the "uncertainty" in this estimate. Maximum fidelity provides an optimal approach for parameter estimation (superior to maximum likelihood) and a generally optimal approach for goodness-of-fit assessment of arbitrary models applied to univariate data. Extensions to binary data, binned data, multidimensional data, and classical parametric and nonparametric statistical tests are described. Maximum fidelity provides a philosophically consistent, robust, and seemingly optimal foundation for statistical inference. All findings are presented in an elementary way to be immediately accessible to all researchers utilizing statistical analysis.Comment: 66 pages, 32 figures, 7 tables, submitte

    Efficient Cosmological Parameter Estimation from Microwave Background Anisotropies

    Full text link
    We revisit the issue of cosmological parameter estimation in light of current and upcoming high-precision measurements of the cosmic microwave background power spectrum. Physical quantities which determine the power spectrum are reviewed, and their connection to familiar cosmological parameters is explicated. We present a set of physical parameters, analytic functions of the usual cosmological parameters, upon which the microwave background power spectrum depends linearly (or with some other simple dependence) over a wide range of parameter values. With such a set of parameters, microwave background power spectra can be estimated with high accuracy and negligible computational effort, vastly increasing the efficiency of cosmological parameter error determination. The techniques presented here allow calculation of microwave background power spectra 10510^5 times faster than comparably accurate direct codes (after precomputing a handful of power spectra). We discuss various issues of parameter estimation, including parameter degeneracies, numerical precision, mapping between physical and cosmological parameters, and systematic errors, and illustrate these considerations with an idealized model of the MAP experiment.Comment: 22 pages, 12 figure

    Statistical mechanics of transcription-factor binding site discovery using Hidden Markov Models

    Full text link
    Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the "inverse" statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.Comment: 25 pages, 2 figures, 1 table V2 - typos fixed and new references adde

    Bayesian Cluster Enumeration Criterion for Unsupervised Learning

    Full text link
    We derive a new Bayesian Information Criterion (BIC) by formulating the problem of estimating the number of clusters in an observed data set as maximization of the posterior probability of the candidate models. Given that some mild assumptions are satisfied, we provide a general BIC expression for a broad class of data distributions. This serves as a starting point when deriving the BIC for specific distributions. Along this line, we provide a closed-form BIC expression for multivariate Gaussian distributed variables. We show that incorporating the data structure of the clustering problem into the derivation of the BIC results in an expression whose penalty term is different from that of the original BIC. We propose a two-step cluster enumeration algorithm. First, a model-based unsupervised learning algorithm partitions the data according to a given set of candidate models. Subsequently, the number of clusters is determined as the one associated with the model for which the proposed BIC is maximal. The performance of the proposed two-step algorithm is tested using synthetic and real data sets.Comment: 14 pages, 7 figure
    • …
    corecore