47 research outputs found
Large deviations of the sample mean in general vector spaces
Let X1, X2, ··· be a sequence of i.i.d. random vectors taking values in a space V, let X-n = (X1 + ··· + Xn)/n, and for J ⊂ V let an(J) = n-1log P(X-n∈ J). A powerful theory concerning the existence and value of limn→∞ an(J) has been developed by Lanford for the case when V is finite-dimensional and X1 is bounded. The present paper is both an exposition of Lanford's theory and an extension of it to the general case. A number of examples are considered; these include the cases when X1 is a Brownian motion or Brownian bridge on the real line, and the case when X-n is the empirical distribution function based on the first n values in an i.i.d. sequence of random variables (the Sanov problem)
Inferential models: A framework for prior-free posterior probabilistic inference
Posterior probabilistic statistical inference without priors is an important
but so far elusive goal. Fisher's fiducial inference, Dempster-Shafer theory of
belief functions, and Bayesian inference with default priors are attempts to
achieve this goal but, to date, none has given a completely satisfactory
picture. This paper presents a new framework for probabilistic inference, based
on inferential models (IMs), which not only provides data-dependent
probabilistic measures of uncertainty about the unknown parameter, but does so
with an automatic long-run frequency calibration property. The key to this new
approach is the identification of an unobservable auxiliary variable associated
with observable data and unknown parameter, and the prediction of this
auxiliary variable with a random set before conditioning on data. Here we
present a three-step IM construction, and prove a frequency-calibration
property of the IM's belief function under mild conditions. A corresponding
optimality theory is developed, which helps to resolve the non-uniqueness
issue. Several examples are presented to illustrate this new approach.Comment: 29 pages with 3 figures. Main text is the same as the published
version. Appendix B is an addition, not in the published version, that
contains some corrections and extensions of two of the main theorem
Dynamics on expanding spaces: modeling the emergence of novelties
Novelties are part of our daily lives. We constantly adopt new technologies,
conceive new ideas, meet new people, experiment with new situations.
Occasionally, we as individuals, in a complicated cognitive and sometimes
fortuitous process, come up with something that is not only new to us, but to
our entire society so that what is a personal novelty can turn into an
innovation at a global level. Innovations occur throughout social, biological
and technological systems and, though we perceive them as a very natural
ingredient of our human experience, little is known about the processes
determining their emergence. Still the statistical occurrence of innovations
shows striking regularities that represent a starting point to get a deeper
insight in the whole phenomenology. This paper represents a small step in that
direction, focusing on reviewing the scientific attempts to effectively model
the emergence of the new and its regularities, with an emphasis on more recent
contributions: from the plain Simon's model tracing back to the 1950s, to the
newest model of Polya's urn with triggering of one novelty by another. What
seems to be key in the successful modelling schemes proposed so far is the idea
of looking at evolution as a path in a complex space, physical, conceptual,
biological, technological, whose structure and topology get continuously
reshaped and expanded by the occurrence of the new. Mathematically it is very
interesting to look at the consequences of the interplay between the "actual"
and the "possible" and this is the aim of this short review.Comment: 25 pages, 10 figure
On the Representability of Complete Genomes by Multiple Competing Finite-Context (Markov) Models
A finite-context (Markov) model of order yields the probability distribution of the next symbol in a sequence of symbols, given the recent past up to depth . Markov modeling has long been applied to DNA sequences, for example to find gene-coding regions. With the first studies came the discovery that DNA sequences are non-stationary: distinct regions require distinct model orders. Since then, Markov and hidden Markov models have been extensively used to describe the gene structure of prokaryotes and eukaryotes. However, to our knowledge, a comprehensive study about the potential of Markov models to describe complete genomes is still lacking. We address this gap in this paper. Our approach relies on (i) multiple competing Markov models of different orders (ii) careful programming techniques that allow orders as large as sixteen (iii) adequate inverted repeat handling (iv) probability estimates suited to the wide range of context depths used. To measure how well a model fits the data at a particular position in the sequence we use the negative logarithm of the probability estimate at that position. The measure yields information profiles of the sequence, which are of independent interest. The average over the entire sequence, which amounts to the average number of bits per base needed to describe the sequence, is used as a global performance measure. Our main conclusion is that, from the probabilistic or information theoretic point of view and according to this performance measure, multiple competing Markov models explain entire genomes almost as well or even better than state-of-the-art DNA compression methods, such as XM, which rely on very different statistical models. This is surprising, because Markov models are local (short-range), contrasting with the statistical models underlying other methods, where the extensive data repetitions in DNA sequences is explored, and therefore have a non-local character
Bayesian Probability and Statistics in Management Research: A New Horizon
This special issue is focused on how a Bayesian approach to estimation, inference, and reasoning
in organizational research might supplement—and in some cases supplant—traditional frequentist
approaches. Bayesian methods are well suited to address the increasingly complex phenomena
and problems faced by 21st-century researchers and organizations, where very complex data
abound and the validity of knowledge and methods are often seen as contextually driven and
constructed. Traditional modeling techniques and a frequentist view of probability and method
are challenged by this new reality
Symmetry and its discontents: essays on the history of inductive probability
Containing essays on the history and philosophy of probability and statistics