23 research outputs found

    Quantifying selection in immune receptor repertoires

    Full text link
    The efficient recognition of pathogens by the adaptive immune system relies on the diversity of receptors displayed at the surface of immune cells. T-cell receptor diversity results from an initial random DNA editing process, called VDJ recombination, followed by functional selection of cells according to the interaction of their surface receptors with self and foreign antigenic peptides. To quantify the effect of selection on the highly variable elements of the receptor, we apply a probabilistic maximum likelihood approach to the analysis of high-throughput sequence data from the ÎČ\beta-chain of human T-cell receptors. We quantify selection factors for V and J gene choice, and for the length and amino-acid composition of the variable region. Our approach is necessary to disentangle the effects of selection from biases inherent in the recombination process. Inferred selection factors differ little between donors, or between naive and memory repertoires. The number of sequences shared between donors is well-predicted by the model, indicating a purely stochastic origin of such "public" sequences. We find a significant correlation between biases induced by VDJ recombination and our inferred selection factors, together with a reduction of diversity during selection. Both effects suggest that natural selection acting on the recombination process has anticipated the selection pressures experienced during somatic evolution

    OLGA: fast computation of generation probabilities of B- and T-cell receptor amino acid sequences and motifs

    Full text link
    Motivation: High-throughput sequencing of large immune repertoires has enabled the development of methods to predict the probability of generation by V(D)J recombination of T- and B-cell receptors of any specific nucleotide sequence. These generation probabilities are very non-homogeneous, ranging over 20 orders of magnitude in real repertoires. Since the function of a receptor really depends on its protein sequence, it is important to be able to predict this probability of generation at the amino acid level. However, brute-force summation over all the nucleotide sequences with the correct amino acid translation is computationally intractable. The purpose of this paper is to present a solution to this problem. Results: We use dynamic programming to construct an efficient and flexible algorithm, called OLGA (Optimized Likelihood estimate of immunoGlobulin Amino-acid sequences), for calculating the probability of generating a given CDR3 amino acid sequence or motif, with or without V/J restriction, as a result of V(D)J recombination in B or T cells. We apply it to databases of epitope-specific T-cell receptors to evaluate the probability that a typical human subject will possess T cells responsive to specific disease-associated epitopes. The model prediction shows an excellent agreement with published data. We suggest that OLGA may be a useful tool to guide vaccine design. Availability: Source code is available at https://github.com/zsethna/OLG

    On generative models of T-cell receptor sequences

    Full text link
    T-cell receptors (TCR) are key proteins of the adaptive immune system, generated randomly in each individual, whose diversity underlies our ability to recognize infections and malignancies. Modeling the distribution of TCR sequences is of key importance for immunology and medical applications. Here, we compare two inference methods trained on high-throughput sequencing data: a knowledge-guided approach, which accounts for the details of sequence generation, supplemented by a physics-inspired model of selection; and a knowledge-free Variational Auto-Encoder based on deep artificial neural networks. We show that the knowledge-guided model outperforms the deep network approach at predicting TCR probabilities, while being more interpretable, at a lower computational cost

    Inferring processes underlying B-cell repertoire diversity

    Full text link
    We quantify the VDJ recombination and somatic hypermutation processes in human B-cells using probabilistic inference methods on high-throughput DNA sequence repertoires of human B-cell receptor heavy chains. Our analysis captures the statistical properties of the naive repertoire, first after its initial generation via VDJ recombination and then after selection for functionality. We also infer statistical properties of the somatic hypermutation machinery (exclusive of subsequent effects of selection). Our main results are the following: the B-cell repertoire is substantially more diverse than T-cell repertoires, due to longer junctional insertions; sequences that pass initial selection are distinguished by having a higher probability of being generated in a VDJ recombination event; somatic hypermutations have a non-uniform distribution along the V gene that is well explained by an independent site model for the sequence context around the hypermutation site.Comment: acknowledgement adde

    Relating trajectories of uniform and variable yield populations.

    No full text
    <p>All gray trajectories, ending on the solid red line (variable-yield stopping line) at points corresponding to , build up the cumulative probability for the final population to have less than cells of type 1. Due to the monotone property of trajectories, they all cross also the dashed line (uniform-yield stopping line) that passes through the point obeying the same ocnstraint and thus the cumulative probability is the same. The parameters of the two lines are simply related through (see Eq. (11)).</p

    Final population size for a heterogeneous micro-population with metabolic tradeoff.

    No full text
    <p>(A) Distributions of the final populations size from simulations with division rate ratio and yield ratio , for different initial population sizes - cells (solid line), cells (dashed line), cells (dotted line), cells (dash-dot line). In (B) we can see the Standard deviation of the final population size as function of initial population size, in good agreement with the analytic approximation.</p

    Average final vs. initial population size in micro-populations grown to saturation of resource.

    No full text
    <p>Dotted line: a population with a uniform yield. Symbols: Monte Carlo results for two-state populations with variability in yield and in growth rate. Dashed lines: analytic approximations relevant only for special parameter values. Crosses: Monte Carlo simulation for “metabolic tradeoff” (lower crosses), (upper crosses). circles: variable yield positively correlated with division rate (upper circles), (lower circles).</p

    Scaled distribution of the number of cells of metabolic type 1 in the final population.

    No full text
    <p>All distributions are for symmetric initial composition, equal yields and a large number of divisions. Different distributions in a plot are for different initial populations (Blue - , Green - , Red - ). These distributions are plotted as a function of the scaling variable (see text for details), and their shape does not depend on the number of divisions but does depend on the initial number of cells. (A) The two types have the same growth rate and are therefore equal in all their properties. Population composition varies only because individual trajectories are composed of different sequences of divisions of the two types. Because of the symmetry between types, all distributions are symmetric around . (B) The two types have different growth rates, and the distribution of final composition becomes skewed.</p
    corecore