70 research outputs found

    Large deviations of the sample mean in general vector spaces

    Get PDF
    Let X1, X2, ··· be a sequence of i.i.d. random vectors taking values in a space V, let X-n = (X1 + ··· + Xn)/n, and for J ⊂ V let an(J) = n-1log P(X-n∈ J). A powerful theory concerning the existence and value of limn→∞ an(J) has been developed by Lanford for the case when V is finite-dimensional and X1 is bounded. The present paper is both an exposition of Lanford's theory and an extension of it to the general case. A number of examples are considered; these include the cases when X1 is a Brownian motion or Brownian bridge on the real line, and the case when X-n is the empirical distribution function based on the first n values in an i.i.d. sequence of random variables (the Sanov problem)

    Dynamics on expanding spaces: modeling the emergence of novelties

    Full text link
    Novelties are part of our daily lives. We constantly adopt new technologies, conceive new ideas, meet new people, experiment with new situations. Occasionally, we as individuals, in a complicated cognitive and sometimes fortuitous process, come up with something that is not only new to us, but to our entire society so that what is a personal novelty can turn into an innovation at a global level. Innovations occur throughout social, biological and technological systems and, though we perceive them as a very natural ingredient of our human experience, little is known about the processes determining their emergence. Still the statistical occurrence of innovations shows striking regularities that represent a starting point to get a deeper insight in the whole phenomenology. This paper represents a small step in that direction, focusing on reviewing the scientific attempts to effectively model the emergence of the new and its regularities, with an emphasis on more recent contributions: from the plain Simon's model tracing back to the 1950s, to the newest model of Polya's urn with triggering of one novelty by another. What seems to be key in the successful modelling schemes proposed so far is the idea of looking at evolution as a path in a complex space, physical, conceptual, biological, technological, whose structure and topology get continuously reshaped and expanded by the occurrence of the new. Mathematically it is very interesting to look at the consequences of the interplay between the "actual" and the "possible" and this is the aim of this short review.Comment: 25 pages, 10 figure

    Inferential models: A framework for prior-free posterior probabilistic inference

    Full text link
    Posterior probabilistic statistical inference without priors is an important but so far elusive goal. Fisher's fiducial inference, Dempster-Shafer theory of belief functions, and Bayesian inference with default priors are attempts to achieve this goal but, to date, none has given a completely satisfactory picture. This paper presents a new framework for probabilistic inference, based on inferential models (IMs), which not only provides data-dependent probabilistic measures of uncertainty about the unknown parameter, but does so with an automatic long-run frequency calibration property. The key to this new approach is the identification of an unobservable auxiliary variable associated with observable data and unknown parameter, and the prediction of this auxiliary variable with a random set before conditioning on data. Here we present a three-step IM construction, and prove a frequency-calibration property of the IM's belief function under mild conditions. A corresponding optimality theory is developed, which helps to resolve the non-uniqueness issue. Several examples are presented to illustrate this new approach.Comment: 29 pages with 3 figures. Main text is the same as the published version. Appendix B is an addition, not in the published version, that contains some corrections and extensions of two of the main theorem

    On the Representability of Complete Genomes by Multiple Competing Finite-Context (Markov) Models

    Get PDF
    A finite-context (Markov) model of order yields the probability distribution of the next symbol in a sequence of symbols, given the recent past up to depth . Markov modeling has long been applied to DNA sequences, for example to find gene-coding regions. With the first studies came the discovery that DNA sequences are non-stationary: distinct regions require distinct model orders. Since then, Markov and hidden Markov models have been extensively used to describe the gene structure of prokaryotes and eukaryotes. However, to our knowledge, a comprehensive study about the potential of Markov models to describe complete genomes is still lacking. We address this gap in this paper. Our approach relies on (i) multiple competing Markov models of different orders (ii) careful programming techniques that allow orders as large as sixteen (iii) adequate inverted repeat handling (iv) probability estimates suited to the wide range of context depths used. To measure how well a model fits the data at a particular position in the sequence we use the negative logarithm of the probability estimate at that position. The measure yields information profiles of the sequence, which are of independent interest. The average over the entire sequence, which amounts to the average number of bits per base needed to describe the sequence, is used as a global performance measure. Our main conclusion is that, from the probabilistic or information theoretic point of view and according to this performance measure, multiple competing Markov models explain entire genomes almost as well or even better than state-of-the-art DNA compression methods, such as XM, which rely on very different statistical models. This is surprising, because Markov models are local (short-range), contrasting with the statistical models underlying other methods, where the extensive data repetitions in DNA sequences is explored, and therefore have a non-local character

    Bayesian Probability and Statistics in Management Research: A New Horizon

    Get PDF
    This special issue is focused on how a Bayesian approach to estimation, inference, and reasoning in organizational research might supplement—and in some cases supplant—traditional frequentist approaches. Bayesian methods are well suited to address the increasingly complex phenomena and problems faced by 21st-century researchers and organizations, where very complex data abound and the validity of knowledge and methods are often seen as contextually driven and constructed. Traditional modeling techniques and a frequentist view of probability and method are challenged by this new reality

    A Finitary Characterization of the Ewens Sampling Formula

    Full text link
    As the Ewens sampling formula represents an equilibrium distribution satisfying detailed balance, some properties difficult to prove are derived in a simple way
    • …
    corecore