15,804 research outputs found

    Statistical inference of the generation probability of T-cell receptors from sequence repertoires

    Full text link
    Stochastic rearrangement of germline DNA by VDJ recombination is at the origin of immune system diversity. This process is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Since any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on non-productive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our distribution predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.Comment: 20 pages, including Appendi

    A Natural Law of Succession

    Full text link
    Consider the problem of multinomial estimation. You are given an alphabet of k distinct symbols and are told that the i-th symbol occurred exactly n_i times in the past. On the basis of this information alone, you must now estimate the conditional probability that the next symbol will be i. In this report, we present a new solution to this fundamental problem in statistics and demonstrate that our solution outperforms standard approaches, both in theory and in practice.Comment: 23 page

    Probabilities in Statistical Mechanics: What are they?

    Get PDF
    This paper addresses the question of how we should regard the probability distributions introduced into statistical mechanics. It will be argued that it is problematic to take them either as purely ontic, or purely epistemic. I will propose a third alternative: they are almost objective probabilities, or epistemic chances. The definition of such probabilities involves an interweaving of epistemic and physical considerations, and thus they cannot be classified as either purely epistemic or purely ontic. This conception, it will be argued, resolves some of the puzzles associated with statistical mechanical probabilities: it explains how probabilistic posits introduced on the basis of incomplete knowledge can yield testable predictions, and it also bypasses the problem of disastrous retrodictions, that is, the fact the standard equilibrium measures yield high probability of the system being in equilibrium in the recent past, even when we know otherwise. As the problem does not arise on the conception of probabilities considered here, there is no need to invoke a Past Hypothesis as a special posit to avoid it

    Estimating rate of occurrence of rare events with empirical Bayes : a railway application

    Get PDF
    Classical approaches to estimating the rate of occurrence of events perform poorly when data are few. Maximum likelihood estimators result in overly optimistic point estimates of zero for situations where there have been no events. Alternative empirical-based approaches have been proposed based on median estimators or non-informative prior distributions. While these alternatives offer an improvement over point estimates of zero, they can be overly conservative. Empirical Bayes procedures offer an unbiased approach through pooling data across different hazards to support stronger statistical inference. This paper considers the application of Empirical Bayes to high consequence low-frequency events, where estimates are required for risk mitigation decision support such as as low as reasonably possible. A summary of empirical Bayes methods is given and the choices of estimation procedures to obtain interval estimates are discussed. The approaches illustrated within the case study are based on the estimation of the rate of occurrence of train derailments within the UK. The usefulness of empirical Bayes within this context is discusse

    Empirical interpretation of imprecise probabilities

    Get PDF
    This paper investigates the possibility of a frequentist interpretation of imprecise probabilities, by generalizing the approach of Bernoulli’s Ars Conjectandi. That is, by studying, in the case of games of chance, under which assumptions imprecise probabilities can be satisfactorily estimated from data. In fact, estimability on the basis of finite amounts of data is a necessary condition for imprecise probabilities in order to have a clear empirical meaning. Unfortunately, imprecise probabilities can be estimated arbitrarily well from data only in very limited settings

    Network Inference from Co-Occurrences

    Full text link
    The recovery of network structure from experimental data is a basic and fundamental problem. Unfortunately, experimental data often do not directly reveal structure due to inherent limitations such as imprecision in timing or other observation mechanisms. We consider the problem of inferring network structure in the form of a directed graph from co-occurrence observations. Each observation arises from a transmission made over the network and indicates which vertices carry the transmission without explicitly conveying their order in the path. Without order information, there are an exponential number of feasible graphs which agree with the observed data equally well. Yet, the basic physical principles underlying most networks strongly suggest that all feasible graphs are not equally likely. In particular, vertices that co-occur in many observations are probably closely connected. Previous approaches to this problem are based on ad hoc heuristics. We model the experimental observations as independent realizations of a random walk on the underlying graph, subjected to a random permutation which accounts for the lack of order information. Treating the permutations as missing data, we derive an exact expectation-maximization (EM) algorithm for estimating the random walk parameters. For long transmission paths the exact E-step may be computationally intractable, so we also describe an efficient Monte Carlo EM (MCEM) algorithm and derive conditions which ensure convergence of the MCEM algorithm with high probability. Simulations and experiments with Internet measurements demonstrate the promise of this approach.Comment: Submitted to IEEE Transactions on Information Theory. An extended version is available as University of Wisconsin Technical Report ECE-06-

    Correction algorithm for finite sample statistics

    Full text link
    Assume in a sample of size M one finds M_i representatives of species i with i=1...N^*. The normalized frequency p^*_i=M_i/M, based on the finite sample, may deviate considerably from the true probabilities p_i. We propose a method to infer rank-ordered true probabilities r_i from measured frequencies M_i. We show that the rank-ordered probabilities provide important informations on the system, e.g., the true number of species, the Shannon- and the Renyi-entropies.Comment: 11 pages, 9 figure

    Adaptive Bayesian and frequentist data processing for quantum tomography

    Full text link
    The outcome statistics of an informationally complete quantum measurement for a system in a given state can be used to evaluate the ensemble expectation of any linear operator in the same state, by averaging a function of the outcomes that depends on the specific operator. Here we introduce two novel data-processing strategies, non-linear in the frequencies, which lead to faster convergence to theoretical expectations.Comment: 12 pages, 2 figures, revised versio

    Using Bayes formula to estimate rates of rare events in transition path sampling simulations

    Full text link
    Transition path sampling is a method for estimating the rates of rare events in molecular systems based on the gradual transformation of a path distribution containing a small fraction of reactive trajectories into a biased distribution in which these rare trajectories have become frequent. Then, a multistate reweighting scheme is implemented to postprocess data collected from the staged simulations. Herein, we show how Bayes formula allows to directly construct a biased sample containing an enhanced fraction of reactive trajectories and to concomitantly estimate the transition rate from this sample. The approach can remediate the convergence issues encountered in free energy perturbation or umbrella sampling simulations when the transformed distribution insufficiently overlaps with the reference distribution.Comment: 11 pages, 8 figure
    corecore