15,804 research outputs found
Statistical inference of the generation probability of T-cell receptors from sequence repertoires
Stochastic rearrangement of germline DNA by VDJ recombination is at the
origin of immune system diversity. This process is implemented via a series of
stochastic molecular events involving gene choices and random nucleotide
insertions between, and deletions from, genes. We use large sequence
repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta
chains to infer the statistical properties of these basic biochemical events.
Since any given CDR3 sequence can be produced in multiple ways, the probability
distribution of hidden recombination events cannot be inferred directly from
the observed sequences; we therefore develop a maximum likelihood inference
method to achieve this end. To separate the properties of the molecular
rearrangement mechanism from the effects of selection, we focus on
non-productive CDR3 sequences in T-cell DNA. We infer the joint distribution of
the various generative events that occur when a new T-cell receptor gene is
created. We find a rich picture of correlation (and absence thereof), providing
insight into the molecular mechanisms involved. The generative event statistics
are consistent between individuals, suggesting a universal biochemical process.
Our distribution predicts the generation probability of any specific CDR3
sequence by the primitive recombination process, allowing us to quantify the
potential diversity of the T-cell repertoire and to understand why some
sequences are shared between individuals. We argue that the use of formal
statistical inference methods, of the kind presented in this paper, will be
essential for quantitative understanding of the generation and evolution of
diversity in the adaptive immune system.Comment: 20 pages, including Appendi
A Natural Law of Succession
Consider the problem of multinomial estimation. You are given an alphabet of
k distinct symbols and are told that the i-th symbol occurred exactly n_i times
in the past. On the basis of this information alone, you must now estimate the
conditional probability that the next symbol will be i. In this report, we
present a new solution to this fundamental problem in statistics and
demonstrate that our solution outperforms standard approaches, both in theory
and in practice.Comment: 23 page
Probabilities in Statistical Mechanics: What are they?
This paper addresses the question of how we should regard the probability distributions introduced into statistical mechanics. It will be argued that it is problematic to take them either as purely ontic, or purely epistemic. I will propose a third alternative: they are almost objective probabilities, or epistemic chances. The definition of such probabilities involves an interweaving of epistemic and physical considerations, and thus they cannot be classified as either purely epistemic or purely ontic. This conception, it will be argued, resolves some of the puzzles associated with statistical mechanical probabilities: it explains how probabilistic posits introduced on the basis of incomplete knowledge can yield testable predictions, and it also bypasses the problem of disastrous retrodictions, that is, the fact the standard equilibrium measures yield high probability of the system being in equilibrium in the recent past, even when we know otherwise. As the problem does not arise on the conception of probabilities considered here, there is no need to invoke a Past Hypothesis as a special posit to avoid it
Estimating rate of occurrence of rare events with empirical Bayes : a railway application
Classical approaches to estimating the rate of occurrence of events perform poorly when data are few. Maximum likelihood estimators result in overly optimistic point estimates of zero for situations where there have been no events. Alternative empirical-based approaches have been proposed based on median estimators or non-informative prior distributions. While these alternatives offer an improvement over point estimates of zero, they can be overly conservative. Empirical Bayes procedures offer an unbiased approach through pooling data across different hazards to support stronger statistical inference. This paper considers the application of Empirical Bayes to high consequence low-frequency events, where estimates are required for risk mitigation decision support such as as low as reasonably possible. A summary of empirical Bayes methods is given and the choices of estimation procedures to obtain interval estimates are discussed. The approaches illustrated within the case study are based on the estimation of the rate of occurrence of train derailments within the UK. The usefulness of empirical Bayes within this context is discusse
Empirical interpretation of imprecise probabilities
This paper investigates the possibility of a frequentist interpretation of imprecise probabilities, by generalizing the approach of Bernoulli’s Ars Conjectandi. That is, by studying, in the case of games of chance, under which assumptions imprecise probabilities can be satisfactorily estimated from data. In fact, estimability on the basis of finite amounts of data is a necessary condition for imprecise probabilities in order to have a clear empirical meaning. Unfortunately, imprecise probabilities can be estimated arbitrarily well from data only in very limited settings
Network Inference from Co-Occurrences
The recovery of network structure from experimental data is a basic and
fundamental problem. Unfortunately, experimental data often do not directly
reveal structure due to inherent limitations such as imprecision in timing or
other observation mechanisms. We consider the problem of inferring network
structure in the form of a directed graph from co-occurrence observations. Each
observation arises from a transmission made over the network and indicates
which vertices carry the transmission without explicitly conveying their order
in the path. Without order information, there are an exponential number of
feasible graphs which agree with the observed data equally well. Yet, the basic
physical principles underlying most networks strongly suggest that all feasible
graphs are not equally likely. In particular, vertices that co-occur in many
observations are probably closely connected. Previous approaches to this
problem are based on ad hoc heuristics. We model the experimental observations
as independent realizations of a random walk on the underlying graph, subjected
to a random permutation which accounts for the lack of order information.
Treating the permutations as missing data, we derive an exact
expectation-maximization (EM) algorithm for estimating the random walk
parameters. For long transmission paths the exact E-step may be computationally
intractable, so we also describe an efficient Monte Carlo EM (MCEM) algorithm
and derive conditions which ensure convergence of the MCEM algorithm with high
probability. Simulations and experiments with Internet measurements demonstrate
the promise of this approach.Comment: Submitted to IEEE Transactions on Information Theory. An extended
version is available as University of Wisconsin Technical Report ECE-06-
Correction algorithm for finite sample statistics
Assume in a sample of size M one finds M_i representatives of species i with
i=1...N^*. The normalized frequency p^*_i=M_i/M, based on the finite sample,
may deviate considerably from the true probabilities p_i. We propose a method
to infer rank-ordered true probabilities r_i from measured frequencies M_i. We
show that the rank-ordered probabilities provide important informations on the
system, e.g., the true number of species, the Shannon- and the Renyi-entropies.Comment: 11 pages, 9 figure
Adaptive Bayesian and frequentist data processing for quantum tomography
The outcome statistics of an informationally complete quantum measurement for
a system in a given state can be used to evaluate the ensemble expectation of
any linear operator in the same state, by averaging a function of the outcomes
that depends on the specific operator. Here we introduce two novel
data-processing strategies, non-linear in the frequencies, which lead to faster
convergence to theoretical expectations.Comment: 12 pages, 2 figures, revised versio
Using Bayes formula to estimate rates of rare events in transition path sampling simulations
Transition path sampling is a method for estimating the rates of rare events
in molecular systems based on the gradual transformation of a path distribution
containing a small fraction of reactive trajectories into a biased distribution
in which these rare trajectories have become frequent. Then, a multistate
reweighting scheme is implemented to postprocess data collected from the staged
simulations. Herein, we show how Bayes formula allows to directly construct a
biased sample containing an enhanced fraction of reactive trajectories and to
concomitantly estimate the transition rate from this sample. The approach can
remediate the convergence issues encountered in free energy perturbation or
umbrella sampling simulations when the transformed distribution insufficiently
overlaps with the reference distribution.Comment: 11 pages, 8 figure
- …