21 research outputs found

    Cis -regulatory variation: significance in biomedicine and evolution

    Get PDF
    Cis-regulatory regions (CRR) control gene expression and chromatin modifications. Genetic variation at CRR in individuals across a population contributes to phenotypic differences of biomedical relevance. This standing variation is important for personalized genomic medicine as well as for adaptive evolution and speciation. This review focuses on genetic variation at CRR, its influence on chromatin, gene expression, and ultimately disease phenotypes. In addition, we summarize our understanding of how this variation may contribute to evolution. Recent technological and computational advances have accelerated research in the direction of personalized medicine, combining strengths of molecular biology and genomics. This will pave new ways to understand how CRR variation affects phenotypes and chart out possible avenues of intervention

    Correlations of antibody response phenotype to genotype revealed by molecular amplification fingerprinting

    Get PDF
    It has long been possible to measure the phenotype of antibody responses (antigen-specific titers) through conventional serological assays (e.g., ELISA). In contrast, the ability to measure the genotype of antibody responses has only recently become possible through the advent of high-throughput antibody repertoire sequencing (Ig-seq), which provides quantitative molecular information on clonal expansion, diversity and somatic hypermutation. However, Ig-seq is compromised by the presence of bias and errors introduced during library preparation and sequencing and thus prevent reliable immunological conclusions from being made. By using synthetic antibody spike-in genes, we determined that Ig-seq data overestimated antibody diversity measurements by up to 5000-fold and was less than 60% accurate in clonal frequency measurements. Please click Additional Files below to see the full abstract

    Synthetic Standards Combined With Error and Bias Correction Improve the Accuracy and Quantitative Resolution of Antibody Repertoire Sequencing in Human Naïve and Memory B Cells

    Get PDF
    High-throughput sequencing of immunoglobulin (Ig) repertoires (Ig-seq) is a powerful method for quantitatively interrogating B cell receptor sequence diversity. When applied to human repertoires, Ig-seq provides insight into fundamental immunological questions, and can be implemented in diagnostic and drug discovery projects. However, a major challenge in Ig-seq is ensuring accuracy, as library preparation protocols and sequencing platforms can introduce substantial errors and bias that compromise immunological interpretation. Here, we have established an approach for performing highly accurate human Ig-seq by combining synthetic standards with a comprehensive error and bias correction pipeline. First, we designed a set of 85 synthetic antibody heavy-chain standards (in vitro transcribed RNA) to assess correction workflow fidelity. Next, we adapted a library preparation protocol that incorporates unique molecular identifiers (UIDs) for error and bias correction which, when applied to the synthetic standards, resulted in highly accurate data. Finally, we performed Ig-seq on purified human circulating B cell subsets (naïve and memory), combined with a cellular replicate sampling strategy. This strategy enabled robust and reliable estimation of key repertoire features such as clonotype diversity, germline segment, and isotype subclass usage, and somatic hypermutation. We anticipate that our standards and error and bias correction pipeline will become a valuable tool for researchers to validate and improve accuracy in human Ig-seq studies, thus leading to potentially new insights and applications in human antibody repertoire profiling

    Mining the sequence space of antibody repertoires to predict and design antigen-specific antibodies

    No full text
    The mammalian adaptive immune system is able to identify specific molecular structures on foreign pathogens. Specificity to these epitopes is achieved through a group of receptors belonging to the immunoglobulin superfamily: B cell receptors (BCR), their secreted version (Antibodies) and T cell receptors (TCR). Each of these receptors carries highly variable regions, which facilitate antigen recognition and which are generated during progenitor cell development (and thus are thought to be unique clones or clonal lineages). The current estimate for the theoretical diversity of unique naïve BCR sequences is around 5x1013 clonal combinations for humans and at least 1012 for mice. The diverse population of BCRs, antibodies or TCRs in a given individual is referred to as the immune repertoire. Immune repertoire sequencing (AIRR-Seq, Ig-Seq) utilizes deep sequencing to access and analyze this vast diversity in different immunological compartments and immune cell subsets. This massive wealth of information has generated novel insights in the fields of antibody engineering, immunodiagnostics, vaccine design, as well as basic immunology. In Chapter 1 of this thesis, I review the current trends in immune repertoire sequencing and the efforts taken to improve existing protocols in relation to accuracy and quality of the sequencing data. I highlight several of the most major challenges in the field, such as obtaining paired variable region (e.g., variable heavy and variable light) sequencing and a lack of accuracy. For example, since sequencing library preparation and platforms for deep sequencing can introduce errors and biases, it can compromise immunological interpretations. This is especially confounding in the context of B cells that undergo somatic hypermutation, a natural process that introduces mutations in antibody variable regions. In Chapter 2, I describe an experimental and computational method we have developed based on synthetic standards and molecular barcoding, which has been implemented to achieve highly accurate antibody repertoire sequencing. We show how this conceptually simple procedure allows us to significantly reduce error rates across the whole sequencing region. By applying this technique to human B cell samples, we demonstrate that it can improve the measurements of antibody repertoires across various dimensions. Although it is now possible to produce high quality Ig-Seq datasets, linking sequence to antigen-specificity is an immensely challenging task. In Chapter 3, I provide an introduction to the concept of modeling the large sequence space of immune repertoires in order to extract deterministic sequence motifs that correlate with antigen exposure and specificity. I review various classes of statistical and machine learning algorithms that can be used to model sequence generation. In chapter 4 I develop a novel approach to identify antigen-specific sequence patterns in antibody repertoires based on generative deep models. To model the underlying process of BCR generation, variational autoencoders (VAE)s were used, where it was assumed that data generation follows a Gaussian mixture model (GMM) in latent space. This provided both a latent embedding and also cluster labels that group similar sequences together, which revealed a multitude of convergent, antigen-associated sequence patterns. These antigen-associated sequence patterns were predictive of immunological history and represent antigen-binding antibodies. Finally, I demonstrate how these sequence patterns can be used to generate further antigen-specific antibodies in silico, that are experimentally verified to retain antigen-specificity

    High-throughput sequencing error and bias correction increases the quantitative resolution of human naïve and memory B-cell receptor repertoires

    No full text
    Accurate high-throughput sequencing of immunoglobulin (Ig) chains (Ig-Seq) is often problematic due to primer bias and sequencing errors. Human Ig sequencing is further complicated by factors such as greater population-level germline allelic diversity, longer CDR3 regions relative to murine sequences, and a more complex antigenic history combined with higher frequency of somatic hypermutation (SHM), particularly in affinity-matured memory B-cell subsets. As a result, Ig heavy chain repertoire analysis tends to underestimate combinatorial diversity while simultaneously overestimating SHM. To overcome these issues, we developed a workflow for highly accurate human antibody heavy chain sequencing. First, we designed a set of 85 synthetic (in vitro transcribed RNA) Ig heavy chain standards representing all known IGHV and IGHJ alleles, unique CDR3s, and incorporating point mutations to mimic SHM. These standards are used in both isotype-dependent and -independent manners at predetermined ratios as spike-ins with biological samples to control for sequencing accuracy. Next, we prepared antibody libraries from purified circulating human B cells and spike-in RNA using a protocol known as molecular amplification fingerprinting (MAF), which incorporates unique molecular identifiers before and during multiplexed PCR amplification. We then performed MAF-based error and bias correction, and cellular replicate sampling to generate a robust, reliable, and highly accurate analysis of human antibody repertoires. We applied the workflow to estimate clonal diversity, gene segment usage, and SHM in naïve (IgM+ CD27-) and memory (IgG+ CD27+) B-cell subsets isolated from three different donors. Based on the sampling size, we are able to estimate the clonal diversity of the human naïve B-cell repertoire and that of the IgG memory B-cell repertoire combined with the level of SHM

    Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting

    No full text
    High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion—the intraclonal diversity index—which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology.ISSN:2375-254

    Deep learning enables therapeutic antibody optimization in mammalian cells by deciphering high-dimensional protein sequence space

    No full text
    Therapeutic antibody optimization is time and resource intensive, largely because it requires low-throughput screening (103 variants) of full-length IgG in mammalian cells, typically resulting in only a few optimized leads. Here, we use deep learning to interrogate and predict antigen-specificity from a massively diverse sequence space to identify globally optimized antibody variants. Using a mammalian display platform and the therapeutic antibody trastuzumab, rationally designed site-directed mutagenesis libraries are introduced by CRISPR/Cas9-mediated homology-directed repair (HDR). Screening and deep sequencing of relatively small libraries (104) produced high quality data capable of training deep neural networks that accurately predict antigen-binding based on antibody sequence. Deep learning is then used to predict millions of antigen binders from an in silico library of ~108 variants, where experimental testing of 30 randomly selected variants showed all 30 retained antigen specificity. The full set of in silico predicted binders is then subjected to multiple developability filters, resulting in thousands of highly-optimized lead candidates. With its scalability and capacity to interrogate high-dimensional protein sequence space, deep learning offers great potential for antibody engineering and optimization

    Antibody discovery and engineering by enhanced CRISPR-Cas9 integration of variable gene cassette libraries in mammalian cells

    No full text
    Antibody engineering in mammalian cells offers the important advantage of expression and screening of libraries in their native conformation, increasing the likelihood of generating candidates with more favorable molecular properties. Major advances in cellular engineering enabled by CRISPR-Cas9 genome editing have made it possible to expand the use of mammalian cells in biotechnological applications. Here, we describe an antibody engineering and screening approach where complete variable light (VL) and heavy (VH) chain cassette libraries are stably integrated into the genome of hybridoma cells by enhanced Cas9-driven homology-directed repair (HDR), resulting in their surface display and secretion. By developing an improved HDR donor format that utilizes in situ linearization, we are able to achieve >15-fold improvement of genomic integration, resulting in a screening workflow that only requires a simple plasmid electroporation. This proved suitable for different applications in antibody discovery and engineering. By integrating and screening an immune library obtained from the variable gene repertoire of an immunized mouse, we could isolate a diverse panel of >40 unique antigen-binding variants. Additionally, we successfully performed affinity maturation by directed evolution screening of an antibody library based on random mutagenesis, leading to the isolation of several clones with affinities in the picomolar range.ISSN:1942-0862ISSN:1942-087

    The Physiological Landscape and Specificity of Antibody Repertoires

    No full text
    Diverse antibody repertoires spanning multiple lymphoid organs (e.g., bone marrow, spleen, lymph nodes) form the foundation of protective humoral immunity. Changes in their composition across lymphoid organs are a consequence of B-cell selection and migration events leading to a highly dynamic and unique physiological landscape of antibody repertoires upon antigenic challenge (e.g., vaccination). However, to what extent B cells encoding identical or similar antibody sequences (clones) are distributed across multiple lymphoid organs and how this is shaped by the strength of a humoral response, remains largely unexplored. Here, we performed an in-depth systems analysis of antibody repertoires across multiple distinct lymphoid organs of immunized mice, and discovered that organ-specific antibody repertoire features (e.g., germline V-gene usage and clonal expansion profiles) equilibrated upon a strong humoral response (multiple immunizations and high serum titers). This resulted in a surprisingly high degree of repertoire consolidation, characterized by highly connected and overlapping B-cell clones across multiple lymphoid organs. Finally, we revealed distinct physiological axes indicating clonal migrations and showed that antibody repertoire consolidation directly correlated with antigen-specificity. Our study uncovered how a strong humoral response resulted in a more uniform but redundant physiological landscape of antibody repertoires, indicating that increases in antibody serum titers were a result of synergistic contributions from antigen-specific B-cell clones distributed across multiple lymphoid organs. Our findings provide valuable insights for the assessment and design of vaccine strategies

    Presentation_1_Synthetic Standards Combined With Error and Bias Correction Improve the Accuracy and Quantitative Resolution of Antibody Repertoire Sequencing in Human Naïve and Memory B Cells.PDF

    No full text
    <p>High-throughput sequencing of immunoglobulin (Ig) repertoires (Ig-seq) is a powerful method for quantitatively interrogating B cell receptor sequence diversity. When applied to human repertoires, Ig-seq provides insight into fundamental immunological questions, and can be implemented in diagnostic and drug discovery projects. However, a major challenge in Ig-seq is ensuring accuracy, as library preparation protocols and sequencing platforms can introduce substantial errors and bias that compromise immunological interpretation. Here, we have established an approach for performing highly accurate human Ig-seq by combining synthetic standards with a comprehensive error and bias correction pipeline. First, we designed a set of 85 synthetic antibody heavy-chain standards (in vitro transcribed RNA) to assess correction workflow fidelity. Next, we adapted a library preparation protocol that incorporates unique molecular identifiers (UIDs) for error and bias correction which, when applied to the synthetic standards, resulted in highly accurate data. Finally, we performed Ig-seq on purified human circulating B cell subsets (naïve and memory), combined with a cellular replicate sampling strategy. This strategy enabled robust and reliable estimation of key repertoire features such as clonotype diversity, germline segment, and isotype subclass usage, and somatic hypermutation. We anticipate that our standards and error and bias correction pipeline will become a valuable tool for researchers to validate and improve accuracy in human Ig-seq studies, thus leading to potentially new insights and applications in human antibody repertoire profiling.</p
    corecore