564 research outputs found
Recommended from our members
Worldwide genetic variation of the IGHV and TRBV immune receptor gene families in humans.
The immunoglobulin heavy variable (IGHV) and T cell beta variable (TRBV) loci are among the most complex and variable regions in the human genome. Generated through a process of gene duplication/deletion and diversification, these loci can vary extensively between individuals in copy number and contain genes that are highly similar, making their analysis technically challenging. Here, we present a comprehensive study of the functional gene segments in the IGHV and TRBV loci, quantifying their copy number and single-nucleotide variation in a globally diverse sample of 109 (IGHV) and 286 (TRBV) humans from over a 100 populations. We find that the IGHV and TRBV gene families exhibit starkly different patterns of variation. In addition to providing insight into the different evolutionary paths of the IGHV and TRBV loci, our results are also important to the adaptive immune repertoire sequencing community, where the lack of frequencies of common alleles and copy number variants is hampering existing analytical pipelines
Recommended from our members
Computational Tools for Immune Repertoire Characterization and Primer Set Design
The enormous decrease in the cost of genomic sequencing over the past two decades has enabled researchers to revisit previously unaddressable questions in sequence analysis. However, this boom of genomic information has introduced new sets of problems that often demand computationally efficient methods. In this work, we describe computational tools for two such settings involving large-scale genomic data: 1) estimating copy number and allelic variation in two highly complex gene families, and 2) selective sequencing of a target genome in a complex DNA sample.We first describe a method that takes short reads from high-throughput sequencing and characterizes both copy number and allelic variation in the IGHV and TRBV loci. These two loci can vary extensively between individuals in copy number and contain genes that are highly similar, making their analysis technically challenging. Additionally, we have conducted the first study of a globally diverse sample of hundreds of individuals in these two loci from over a hundred populations. In addition to providing insight into the different evolutionary paths of the IGHV and TRBV loci, our results are also important to the adaptive immune repertoire sequencing community, where the lack of frequencies of common alleles and copy number variants is hampering existing analytical pipelines.In our second problem setting, we describe SOAPswga, an optimized and parallelized pipeline for primer design in the context of selective amplification. Unlike previous heuristic-based methods, SOAPswga uses machine learning methods to evaluate both individual primers and primer sets. Additionally, rather than brute force search for primer sets, such as in predecessor methods, SOAPswga uses branch-and-bound principles to pursue only the most promising sets. These optimizations, including the parallelization of each step, allow for a huge decrease in runtime from the order of weeks to minutes. We also discuss the results of our pipeline applied to the selective amplification of Mycobacterium tuberculosis in a sample of human blood. Lastly, we expand on the importance of this work, and in general, its potential usefulness to any setting consisting of targeted sequencing
Characterization of the immunoglobulin lambda chain across diverse human populations.
The adaptive immune system relies on a diverse set of over one hundred immunoglobulin (IG) genes across three genomic loci that are variably combined to form antibodies (Ab). The IG Lambda locus is one of two loci which encodes the IG light chain. The complexity of the IGL locus severely limits the effective use of standard short-read sequencing, limiting our knowledge of population diversity in these loci. We leveraged single molecule real-time (SMRT) long-read sequencing in conjunction with IGL-targeted DNA capture to develop the method IG-Cap for accurate and high-throughput sequencing of the IGL locus. We benchmarked this method using six gold-standard assemblies of the IGL locus. Using IG-Cap and whole genome long-read sequencing data, we resolved the IGL locus in 238 individuals of diverse population origins. From these individuals, we identified 207 novel IGL alleles and resolved multiple large structural variations, including a large 60 KB deletion affecting 6 functional IGLV genes and population variable duplications in the IGL constant region. Additionally, we identified signatures of balancing and purifying selection in and around functional genes across the IGL locus including gene-specific patterns of heterozygosity and allelic richness. Finally, we found that IGLV alleles are enriched for nonsynonymous mutations resulting in disparate amino acid changes. Overall, this work revealed significant unexplored diversity in the IGL locus and provides an important set of genomic tools and resources to enable future functional studies, disease association studies, and targeted therapeutic development
Immunogenomics of the Rhesus macaque, an animal model for HIV vaccine development
Human Immunodeficiency Virus (HIV) is a lentivirus that causes Acquired Immunodeficiency Syndrome (AIDS) resulting in the progressive failure of the immune system. Due to its rapid replication rate and high mutation frequency, the virus is able to evade the immune system and thwart an efficacious response. Current HIV infection prophylaxes and therapeutics are not optimal and there is an urgent need to develop an efficacious HIV vaccine. Recently, high-throughput sequencing of the Immunoglobulin (Ig) repertoire from HIV-infected humans and immunized Rhesus macaques has led to important insights into vaccines against HIV-1. Further elucidation of the antibody response in these crucial animal studies will require substantially greater power to analyze the Ig repertoires than is currently possible. Reliable information on macaque Ig genes is insufficient due to the incompleteness of the whole genome sequence (WGS) and the inherent difficulty of obtaining complete Ig sequences due to its complex and repetitive nature. To address this issue, we have generated a high quality, annotated WGS with precisely annotated Ig loci from ten macaques. We used low error, synthetic long reads generated by Illumina TruSeq technology, Illumina 150bp, paired-end reads (110X coverage) and Irys genome mapping technology to assemble the genome de novo. We employed a bait-and-sequence strategy using human Ig probes to capture macaque Ig genes for the accurate assembly and annotation of Ig genes and alleles. Together, these data will generate a complete Rhesus macaque genome with detailed information on allelic diversity at the Ig loci. This study is essential for making the macaque a viable model for adaptive immunity. In addition, it will provide information on the similarities and differences between macaque and human Ig genes that will aid in the design and interpretation of vaccine studies
Inferred Allelic Variants of Immunoglobulin Receptor Genes: a system for their evaluation, documentation, and naming
Immunoglobulins or antibodies are the main effector molecules of the B-cell lineage and are encoded by hundreds of variable (V), diversity (D), and joining (J) germline genes, which recombine to generate enormous IG diversity. Recently, high-throughput adaptive immune receptor repertoire sequencing (AIRR-seq) of recombined V-(D)-J genes has offered unprecedented insights into the dynamics of IG repertoires in health and disease. Faithful biological interpretation of AIRR-seq studies depends upon the annotation of raw AIRR-seq data, using reference germline gene databases to identify the germline genes within each rearrangement. Existing reference databases are incomplete, as shown by recent AIRR-seq studies that have inferred the existence of many previously unreported polymorphisms. Completing the documentation of genetic variation in germline gene databases is therefore of crucial importance. Lymphocyte receptor genes and alleles are currently assigned by the Immunoglobulins, T cell Receptors and Major Histocompatibility Nomenclature Subcommittee of the International Union of Immunological Societies (IUIS) and managed in IMGT®, the international ImMunoGeneTics information system® (IMGT). In 2017, the IMGT Group reached agreement with a group of AIRR-seq researchers on the principles of a streamlined process for identifying and naming inferred allelic sequences, for their incorporation into IMGT®. These researchers represented the AIRR Community, a network of over 300 researchers whose objective is to promote all aspects of immunoglobulin and T-cell receptor repertoire studies, including the standardization of experimental and computational aspects of AIRR-seq data generation and analysis. The Inferred Allele Review Committee (IARC) was established by the AIRR Community to devise policies, criteria, and procedures to perform this function. Formalized evaluations of novel inferred sequences have now begun and submissions are invited via a new dedicated portal (https://ogrdb.airr-community.org). Here, we summarize recommendations developed by the IARC—focusing, to begin with, on human IGHV genes—with the goal of facilitating the acceptance of inferred allelic variants of germline IGHV genes. We believe that this initiative will improve the quality of AIRR-seq studies by facilitating the description of human IG germline gene variation, and that in time, it will expand to the documentation of TR and IG genes in many vertebrate species
Population matched (pm) germline allelic variants of immunoglobulin (IG) loci: relevance in infectious diseases and vaccination studies in human populations
Immunoglobulin (IG) loci harbor inter-individual allelic variants in many different germline IG variable, diversity and joining genes of the IG heavy (IGH), kappa (IGK) and lambda (IGL) loci, which together form the genetic basis of the highly diverse antigen-specific B-cell receptors. These allelic variants can be shared between or be specific to human populations. The current immunogenetics resources gather the germline alleles, however, lack the population specificity of the alleles which poses limitations for disease-association studies related to immune responses in different human populations. Therefore, we systematically identified germline alleles from 26 different human populations around the world, profiled by "1000 Genomes" data. We identified 409 IGHV, 179 IGKV, and 199 IGLV germline alleles supported by at least seven haplotypes. The diversity of germline alleles is the highest in Africans. Remarkably, the variants in the identified novel alleles show strikingly conserved patterns, the same as found in other IG databases, suggesting over-time evolutionary selection processes. We could relate the genetic variants to population-specific immune responses, e.g. IGHV1-69 for flu in Africans. The population matched IG (pmIG) resource will enhance our understanding of the SHM-related B-cell receptor selection processes in (infectious) diseases and vaccination within and between different human populations.Molecular Epidemiolog
A novel framework for characterizing genomic haplotype diversity in the human immunoglobulin heavy chain locus
An incomplete ascertainment of genetic variation within the highly polymorphic immunoglobulin heavy chain locus (IGH) has hindered our ability to define genetic factors that influence antibody-mediated processes. Due to locus complexity, standard high-throughput approaches have failed to accurately and comprehensively capture IGH polymorphism. As a result, the locus has only been fully characterized two times, severely limiting our knowledge of human IGH diversity. Here, we combine targeted long-read sequencing with a novel bioinformatics tool, IGenotyper, to fully characterize IGH variation in a haplotype-specific manner. We apply this approach to eight human samples, including a haploid cell line and two mother-father-child trios, and demonstrate the ability to generate high-quality assemblies (>98% complete and >99% accurate), genotypes, and gene annotations, identifying 2 novel structural variants and 15 novel IGH alleles. We show multiplexing allows for scaling of the approach without impacting data quality, and that our genotype call sets are more accurate than short-read (>35% increase in true positives and >97% decrease in false-positives) and array/imputation-based datasets. This framework establishes a desperately needed foundation for leveraging IG genomic data to study population-level variation in antibody-mediated immunity, critical for bettering our understanding of disease risk, and responses to vaccines and therapeutics
Immunoglobulin gene usage and affinity maturation in antiviral antibodies
The ability of antibodies to block infections makes them highly relevant for successful
vaccine development. Through the papers described in this thesis, I attempt to characterize
the functional and genetic aspects of antiviral antibodies induced by infection and
vaccination.
In Paper I, we characterized the distribution and maturation of HIV-1 envelope
glycoproteins (Env)-specific antibody lineages post-vaccination in different immune
compartments of rhesus macaques. Vaccine-induced Env-specific antibody lineages were
disseminated across the periphery, lymph node, spleen, and bone marrow (BM) but not in
gut tissue. We observed a consistent increase in the somatic hypermutation (SHM) levels
of Env-specific antibody sequences after each boost and the SHM levels strongly
correlated with the affinity of members from a potent neutralizing antibody lineage.
In Paper II, we set out to understand the role of SHM in a broad, potent, public class of
antibodies isolated from a healthcare worker who was previously infected with SARS-CoV2.
I selected a potent neutralizing antibody and reverted the heavy chain (HC) to the
germline sequence. I then sequentially introduced individual or combinations of SHM so
that we could test the functional impact of this. We found a substantial gain of antibody
potency and breadth when certain SHM mutations were reintroduced, and we identified
two key mutations that largely contributed to the breadth of this lineage. Furthermore, we
showed that the mature antibody retained neutralizing activity against potential future viral
variants by deep mutational scanning (DMS) experiments. A high-resolution structure of
this antibody obtained by cryo-electron microscopy (cryo-EM) confirmed important
interactions made by the identified SHMs with the SARS-CoV-2 spike (S).
In Papers III and IV, we investigated the effect of immunoglobulin heavy chain variable
(IGHV) gene polymorphisms on the function of human SARS-CoV-2 antibodies isolated
post-infection. We genotyped a cohort of previously infected healthcare workers and
evaluated the neutralization activity of germline-reverted and allele-swapped S-specific
IGHV1-69*20-using antibodies from two independent donors carrying this allele.
Neutralization was retained when reverting the IGHV region to the germline IGHV1-
69∗20 allele but lost when reverting to other IGHV1-69 alleles demonstrating a strong
allele-dependence in these antibodies. A high resolution cryo-EM structure of one of the
antibodies revealed significant contacts made by two IGHV1-69*20-germline encoded
amino acid residues with the S, illustrating the impact of IGHV polymorphisms on
antibody functions. We next focused on the IGHV3-30 group of genes, which are
frequently used by S-specific antibodies. By IGHV genotype and haplotype analysis we
observed that IGHV3-30-3 gene was deleted in many individuals, and the IGHV3-30
alleles were heterogeneously distributed in our cohort. When the IGHV region of an
IGHV3-30-3*01 neutralizing antibody was swapped with IGHV3-30 alleles, the
neutralization remained unaffected demonstrating functional redundancy within this gene
group, at least for this antibody lineage.
The results from my doctoral research provide insight into functional and genetic
properties of antibodies induced by viral antigens, which have important clinical relevance
both for guided-vaccine design and monoclonal antibody therapeutics, and for our general
understanding of antibody responses in the population
The effect of genetic variation at the immunoglobulin heavy chain variable region gene loci on biases in the generation of the human primary antibody repertoire
The human primary antibody repertoire must be incredibly diverse in order to combat a constantly evolving array of pathogens. Random events contribute to the repertoire diversity that is created within an individual on a daily basis. Similarly, much of the genetic variation existing between individuals and populations at the immunoglobulin loci has been generated via stochastic processes. However, deviations from randomness have been detected both during the processes that generate the primary repertoire within an individual and during the evolution of the immunoglobulin genes. This complex set of events and genes has been notoriously difficult to investigate. However, next-generation sequencing technologies have recently allowed the creation of datasets containing thousands of rearranged sequences. This has allowed great insight into the events taking place during the formation of a B cell in the bone marrow and into genetic diversity within and between human populations. The information contained in such large datasets allows many questions to be asked. Preferential IGHD-IGHJ pairing has been reported, but the mechanism involved is unclear. Using very large datasets and the fact that VDJ rearrangement is an intrachromosomal event, complex patterns were detected. These patterns changed in a predictable way in individuals carrying IGHD deletion polymorphisms, suggesting a strong positional influence and the involvement of other factors too. Results also suggest that the recombinase associates with an IGHJ gene before associating with an IGHD partner. Allele frequencies of structural IGH polymorphisms vary between different ethnic human populations. Differentiation of sequence variants of the IGH locus was investigated in individuals from six different ethnic backgrounds, including a rarely studied Amerindian population. Significant differentiation was observed between several pairs of populations. This work has important implications for a more personalised approach to vaccines and therapeutics. Analysis of these diverse populations suggested the existence of many novel unreported allelic variants. These alleles were then included into the repertoire used previously to study the evolution of the subregions of IGHV genes. The analysis produced remarkably similar results, suggesting that the implications for selection in these regions will not change regardless of the number of extra alleles that remain to be reported
Definition of immunoglobulin germline genes by next generation sequencing for studies of antigen-specific b cell responses
Immunoglobulins play a critical role in the adaptive immune system, existing as cell surface-expressed B cell receptors and secreted antibodies. Circulating antibodies are the main correlate of protective immunity for most vaccines. An improved understanding of the germline genes that rearrange to encode the vast repertoire of antibodies is therefore of central interest. Despite this, current databases of immunoglobulin germline gene variation are incomplete, both for humans and research animal models, limiting studies of antigen-specific B cell responses.
In Paper I, we developed a computational tool, IgDiscover, which infers germline immunoglobulin V alleles from the repertoire of expressed antibodies in a given individual. We validated IgDiscover for the identification of human, mouse and rhesus macaque IGHV alleles and described novel IGHV alleles in all three species. Our results highlighted a high degree of inter-individual allelic diversity in rhesus macaques. In Paper II, we optimized and compared two major immunoglobulin library production methods based on 5′RACE and 5′multiplex PCR, respectively. We observed that, despite 5′RACE being unbiased in terms of amplification and having the advantage of not requiring 5′ end IGHV genomic information, current limitations on high-throughput sequence read length resulted in the 5′ multiplex method delivering a higher quality output due to its shorter amplicon size. In Paper III, we inferred germline immunoglobulin alleles in 45 macaques from four sub-populations of the two most common species used in biomedical research, rhesus and cynomolgus macaques. We confirmed and extended our observations concerning high inter-individual diversity, demonstrating that it was highest among Indonesian cynomolgus macaques and lowest among Mauritian cynomolgus macaques in the sub-populations studied. We compiled comprehensive IGHV, D and J allele databases and used several methods to independently validate novel alleles.
In conclusion, the work presented in this thesis establishes a road map to generate individualized immunoglobulin germline gene databases from diverse species, even if genomic immunoglobulin loci information is limited. This thesis also examines the advantages and disadvantages of commonly used next generation sequencing library preparation methods. Finally, it reports novel inferred immunoglobulin alleles in humans and macaques and illustrates a high degree of inter-individual immunoglobulin allelic diversity in primates, underlining the utility of generating individualized immunoglobulin databases for studies of immune repertoires
- …