7,531 research outputs found
Computational strategies for dissecting the high-dimensional complexity of adaptive immune repertoires
The adaptive immune system recognizes antigens via an immense array of
antigen-binding antibodies and T-cell receptors, the immune repertoire. The
interrogation of immune repertoires is of high relevance for understanding the
adaptive immune response in disease and infection (e.g., autoimmunity, cancer,
HIV). Adaptive immune receptor repertoire sequencing (AIRR-seq) has driven the
quantitative and molecular-level profiling of immune repertoires thereby
revealing the high-dimensional complexity of the immune receptor sequence
landscape. Several methods for the computational and statistical analysis of
large-scale AIRR-seq data have been developed to resolve immune repertoire
complexity in order to understand the dynamics of adaptive immunity. Here, we
review the current research on (i) diversity, (ii) clustering and network,
(iii) phylogenetic and (iv) machine learning methods applied to dissect,
quantify and compare the architecture, evolution, and specificity of immune
repertoires. We summarize outstanding questions in computational immunology and
propose future directions for systems immunology towards coupling AIRR-seq with
the computational discovery of immunotherapeutics, vaccines, and
immunodiagnostics.Comment: 27 pages, 2 figure
Quantifying evolutionary constraints on B cell affinity maturation
The antibody repertoire of each individual is continuously updated by the
evolutionary process of B cell receptor mutation and selection. It has recently
become possible to gain detailed information concerning this process through
high-throughput sequencing. Here, we develop modern statistical molecular
evolution methods for the analysis of B cell sequence data, and then apply them
to a very deep short-read data set of B cell receptors. We find that the
substitution process is conserved across individuals but varies significantly
across gene segments. We investigate selection on B cell receptors using a
novel method that side-steps the difficulties encountered by previous work in
differentiating between selection and motif-driven mutation; this is done
through stochastic mapping and empirical Bayes estimators that compare the
evolution of in-frame and out-of-frame rearrangements. We use this new method
to derive a per-residue map of selection, which provides a more nuanced view of
the constraints on framework and variable regions.Comment: Previously entitled "Substitution and site-specific selection driving
B cell affinity maturation is consistent across individuals
Assessing T cell clonal size distribution: a non-parametric approach
Clonal structure of the human peripheral T-cell repertoire is shaped by a
number of homeostatic mechanisms, including antigen presentation, cytokine and
cell regulation. Its accurate tuning leads to a remarkable ability to combat
pathogens in all their variety, while systemic failures may lead to severe
consequences like autoimmune diseases. Here we develop and make use of a
non-parametric statistical approach to assess T cell clonal size distributions
from recent next generation sequencing data. For 41 healthy individuals and a
patient with ankylosing spondylitis, who undergone treatment, we invariably
find power law scaling over several decades and for the first time calculate
quantitatively meaningful values of decay exponent. It has proved to be much
the same among healthy donors, significantly different for an autoimmune
patient before the therapy, and converging towards a typical value afterwards.
We discuss implications of the findings for theoretical understanding and
mathematical modeling of adaptive immunity.Comment: 13 pages, 3 figures, 2 table
Inferring processes underlying B-cell repertoire diversity
We quantify the VDJ recombination and somatic hypermutation processes in
human B-cells using probabilistic inference methods on high-throughput DNA
sequence repertoires of human B-cell receptor heavy chains. Our analysis
captures the statistical properties of the naive repertoire, first after its
initial generation via VDJ recombination and then after selection for
functionality. We also infer statistical properties of the somatic
hypermutation machinery (exclusive of subsequent effects of selection). Our
main results are the following: the B-cell repertoire is substantially more
diverse than T-cell repertoires, due to longer junctional insertions; sequences
that pass initial selection are distinguished by having a higher probability of
being generated in a VDJ recombination event; somatic hypermutations have a
non-uniform distribution along the V gene that is well explained by an
independent site model for the sequence context around the hypermutation site.Comment: acknowledgement adde
Recommended from our members
Worldwide genetic variation of the IGHV and TRBV immune receptor gene families in humans.
The immunoglobulin heavy variable (IGHV) and T cell beta variable (TRBV) loci are among the most complex and variable regions in the human genome. Generated through a process of gene duplication/deletion and diversification, these loci can vary extensively between individuals in copy number and contain genes that are highly similar, making their analysis technically challenging. Here, we present a comprehensive study of the functional gene segments in the IGHV and TRBV loci, quantifying their copy number and single-nucleotide variation in a globally diverse sample of 109 (IGHV) and 286 (TRBV) humans from over a 100 populations. We find that the IGHV and TRBV gene families exhibit starkly different patterns of variation. In addition to providing insight into the different evolutionary paths of the IGHV and TRBV loci, our results are also important to the adaptive immune repertoire sequencing community, where the lack of frequencies of common alleles and copy number variants is hampering existing analytical pipelines
Statistical inference of the generation probability of T-cell receptors from sequence repertoires
Stochastic rearrangement of germline DNA by VDJ recombination is at the
origin of immune system diversity. This process is implemented via a series of
stochastic molecular events involving gene choices and random nucleotide
insertions between, and deletions from, genes. We use large sequence
repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta
chains to infer the statistical properties of these basic biochemical events.
Since any given CDR3 sequence can be produced in multiple ways, the probability
distribution of hidden recombination events cannot be inferred directly from
the observed sequences; we therefore develop a maximum likelihood inference
method to achieve this end. To separate the properties of the molecular
rearrangement mechanism from the effects of selection, we focus on
non-productive CDR3 sequences in T-cell DNA. We infer the joint distribution of
the various generative events that occur when a new T-cell receptor gene is
created. We find a rich picture of correlation (and absence thereof), providing
insight into the molecular mechanisms involved. The generative event statistics
are consistent between individuals, suggesting a universal biochemical process.
Our distribution predicts the generation probability of any specific CDR3
sequence by the primitive recombination process, allowing us to quantify the
potential diversity of the T-cell repertoire and to understand why some
sequences are shared between individuals. We argue that the use of formal
statistical inference methods, of the kind presented in this paper, will be
essential for quantitative understanding of the generation and evolution of
diversity in the adaptive immune system.Comment: 20 pages, including Appendi
- …