23 research outputs found
Quantifying selection in immune receptor repertoires
The efficient recognition of pathogens by the adaptive immune system relies
on the diversity of receptors displayed at the surface of immune cells. T-cell
receptor diversity results from an initial random DNA editing process, called
VDJ recombination, followed by functional selection of cells according to the
interaction of their surface receptors with self and foreign antigenic
peptides. To quantify the effect of selection on the highly variable elements
of the receptor, we apply a probabilistic maximum likelihood approach to the
analysis of high-throughput sequence data from the -chain of human
T-cell receptors. We quantify selection factors for V and J gene choice, and
for the length and amino-acid composition of the variable region. Our approach
is necessary to disentangle the effects of selection from biases inherent in
the recombination process. Inferred selection factors differ little between
donors, or between naive and memory repertoires. The number of sequences shared
between donors is well-predicted by the model, indicating a purely stochastic
origin of such "public" sequences. We find a significant correlation between
biases induced by VDJ recombination and our inferred selection factors,
together with a reduction of diversity during selection. Both effects suggest
that natural selection acting on the recombination process has anticipated the
selection pressures experienced during somatic evolution
OLGA: fast computation of generation probabilities of B- and T-cell receptor amino acid sequences and motifs
Motivation: High-throughput sequencing of large immune repertoires has
enabled the development of methods to predict the probability of generation by
V(D)J recombination of T- and B-cell receptors of any specific nucleotide
sequence. These generation probabilities are very non-homogeneous, ranging over
20 orders of magnitude in real repertoires. Since the function of a receptor
really depends on its protein sequence, it is important to be able to predict
this probability of generation at the amino acid level. However, brute-force
summation over all the nucleotide sequences with the correct amino acid
translation is computationally intractable. The purpose of this paper is to
present a solution to this problem.
Results: We use dynamic programming to construct an efficient and flexible
algorithm, called OLGA (Optimized Likelihood estimate of immunoGlobulin
Amino-acid sequences), for calculating the probability of generating a given
CDR3 amino acid sequence or motif, with or without V/J restriction, as a result
of V(D)J recombination in B or T cells. We apply it to databases of
epitope-specific T-cell receptors to evaluate the probability that a typical
human subject will possess T cells responsive to specific disease-associated
epitopes. The model prediction shows an excellent agreement with published
data. We suggest that OLGA may be a useful tool to guide vaccine design.
Availability: Source code is available at https://github.com/zsethna/OLG
On generative models of T-cell receptor sequences
T-cell receptors (TCR) are key proteins of the adaptive immune system,
generated randomly in each individual, whose diversity underlies our ability to
recognize infections and malignancies. Modeling the distribution of TCR
sequences is of key importance for immunology and medical applications. Here,
we compare two inference methods trained on high-throughput sequencing data: a
knowledge-guided approach, which accounts for the details of sequence
generation, supplemented by a physics-inspired model of selection; and a
knowledge-free Variational Auto-Encoder based on deep artificial neural
networks. We show that the knowledge-guided model outperforms the deep network
approach at predicting TCR probabilities, while being more interpretable, at a
lower computational cost
Inferring processes underlying B-cell repertoire diversity
We quantify the VDJ recombination and somatic hypermutation processes in
human B-cells using probabilistic inference methods on high-throughput DNA
sequence repertoires of human B-cell receptor heavy chains. Our analysis
captures the statistical properties of the naive repertoire, first after its
initial generation via VDJ recombination and then after selection for
functionality. We also infer statistical properties of the somatic
hypermutation machinery (exclusive of subsequent effects of selection). Our
main results are the following: the B-cell repertoire is substantially more
diverse than T-cell repertoires, due to longer junctional insertions; sequences
that pass initial selection are distinguished by having a higher probability of
being generated in a VDJ recombination event; somatic hypermutations have a
non-uniform distribution along the V gene that is well explained by an
independent site model for the sequence context around the hypermutation site.Comment: acknowledgement adde
Relating trajectories of uniform and variable yield populations.
<p>All gray trajectories, ending on the solid red line (variable-yield stopping line) at points corresponding to , build up the cumulative probability for the final population to have less than cells of type 1. Due to the monotone property of trajectories, they all cross also the dashed line (uniform-yield stopping line) that passes through the point obeying the same ocnstraint and thus the cumulative probability is the same. The parameters of the two lines are simply related through (see Eq. (11)).</p
Final population size for a heterogeneous micro-population with metabolic tradeoff.
<p>(A) Distributions of the final populations size from simulations with division rate ratio and yield ratio , for different initial population sizes - cells (solid line), cells (dashed line), cells (dotted line), cells (dash-dot line). In (B) we can see the Standard deviation of the final population size as function of initial population size, in good agreement with the analytic approximation.</p
Average final vs. initial population size in micro-populations grown to saturation of resource.
<p>Dotted line: a population with a uniform yield. Symbols: Monte Carlo results for two-state populations with variability in yield and in growth rate. Dashed lines: analytic approximations relevant only for special parameter values. Crosses: Monte Carlo simulation for âmetabolic tradeoffâ (lower crosses), (upper crosses). circles: variable yield positively correlated with division rate (upper circles), (lower circles).</p
Scaled distribution of the number of cells of metabolic type 1 in the final population.
<p>All distributions are for symmetric initial composition, equal yields and a large number of divisions. Different distributions in a plot are for different initial populations (Blue - , Green - , Red - ). These distributions are plotted as a function of the scaling variable (see text for details), and their shape does not depend on the number of divisions but does depend on the initial number of cells. (A) The two types have the same growth rate and are therefore equal in all their properties. Population composition varies only because individual trajectories are composed of different sequences of divisions of the two types. Because of the symmetry between types, all distributions are symmetric around . (B) The two types have different growth rates, and the distribution of final composition becomes skewed.</p