14 research outputs found

    Using antibody next generation sequencing data to aid antibody engineering

    No full text
    Future successful exploitation of antibodies as diagnostic and therapeutic agents will greatly benefit from an increased understanding of natural B-cell receptor (BCR) repertoire diversities. The advent of next-generation sequencing of immunoglobulin genes (Ig-seq) has made it possible to sequence large snapshots of BCR repertoires in a single experiment. In the results chapters of this thesis, we begin by describing a method (AntiBOdy Sequence Selector, “ABOSS”) for filtering BCR repertoire data, which considers the structural viability of each sequence and is orthogonal to all other current methods (Chapter 2). ABOSS leverages the presence/absence of a conserved disulphide bridge found in antibodies as a way of both identifying structurally viable BCR sequences and estimating the sequencing error rate. We show that this method is able to identify structurally impossible sequences missed by common error-correction methods. Next, we describe the development of Observed Antibody Space (OAS), the first resource that curates BCR sequences from publicly available studies. As of October 2020, OAS contains more than 1.9 billion sequences from 85 studies. In OAS, all BCR repertoire sequences are annotated and profiled for structural viability. We next describe the development of a novel method (SAAB+) to interrogate complete BCR repertoires at the structural level (Chapter 4). SAAB+ annotates large portions of BCR repertoires with three-dimensional information by mapping sequences to crystallographically solved antibody structures. By applying SAAB+ to BCR repertoires in OAS we, for the first time, document repertoire structural changes along the B-cell maturation axis in humans and mice. In the final experimental chapter, we describe our work in COVID-19 research where we have compared the structural and sequence diversities of SARS-CoV-2 BCR repertoires to healthy repertoires deposited in OAS. We also outline the development of the first organised database (CoV-AbDab) that curates all publicly available anti-SARS-CoV-2 antibodies in a standardised format. Finally, we discuss how recent developments in paired-chain Ig-seq platforms and deep learning algorithms could have a lasting impact on established Ig-seq analysis pipelines. We also outline how the tools described in this thesis can be combined with these field-disruptive technologies to advance our understanding of the immune system and improve computational antibody engineering.</p

    CoV-AbDab: the coronavirus antibody database

    No full text
    Motivation The emergence of a novel strain of betacoronavirus, SARS-CoV-2, has led to a pandemic that has been associated with over 700 000 deaths as of August 5, 2020. Research is ongoing around the world to create vaccines and therapies to minimize rates of disease spread and mortality. Crucial to these efforts are molecular characterizations of neutralizing antibodies to SARS-CoV-2. Such antibodies would be valuable for measuring vaccine efficacy, diagnosing exposure and developing effective biotherapeutics. Here, we describe our new database, CoV-AbDab, which already contains data on over 1400 published/patented antibodies and nanobodies known to bind to at least one betacoronavirus. This database is the first consolidation of antibodies known to bind SARS-CoV-2 as well as other betacoronaviruses such as SARS-CoV-1 and MERS-CoV. It contains relevant metadata including evidence of cross-neutralization, antibody/nanobody origin, full variable domain sequence (where available) and germline assignments, epitope region, links to relevant PDB entries, homology models and source literature. Results On August 5, 2020, CoV-AbDab referenced sequence information on 1402 anti-coronavirus antibodies and nanobodies, spanning 66 papers and 21 patents. Of these, 1131 bind to SARS-CoV-2. Availability and implementation CoV-AbDab is free to access and download without registration at http://opig.stats.ox.ac.uk/webapps/coronavirus. Community submissions are encouraged.</p

    Looking for therapeutic antibodies in next-generation sequencing repositories

    No full text
    Recently it has become possible to query the great diversity of natural antibody repertoires using next-generation sequencing (NGS). These methods are capable of producing millions of sequences in a single experiment. Here we compare clinical-stage therapeutic antibodies to the ~1b sequences from 60 independent sequencing studies in the Observed Antibody Space database, which includes antibody sequences from NGS analysis of immunoglobulin gene repertoires. Of 242 post-Phase 1 antibodies, we found 16 with sequence identity matches of 95% or better for both heavy and light chains. There are also 54 perfect matches to therapeutic CDR-H3 regions in the NGS outputs, suggesting a nontrivial amount of convergence between naturally observed sequences and those developed artificially. This has potential implications for both the legal protection of commercial antibodies and the discovery of antibody therapeutics

    Looking for therapeutic antibodies in next-generation sequencing repositories

    No full text
    Recently it has become possible to query the great diversity of natural antibody repertoires using next-generation sequencing (NGS). These methods are capable of producing millions of sequences in a single experiment. Here we compare clinical-stage therapeutic antibodies to the ~1b sequences from 60 independent sequencing studies in the Observed Antibody Space database, which includes antibody sequences from NGS analysis of immunoglobulin gene repertoires. Of 242 post-Phase 1 antibodies, we found 16 with sequence identity matches of 95% or better for both heavy and light chains. There are also 54 perfect matches to therapeutic CDR-H3 regions in the NGS outputs, suggesting a nontrivial amount of convergence between naturally observed sequences and those developed artificially. This has potential implications for both the legal protection of commercial antibodies and the discovery of antibody therapeutics

    Filtering next-generation sequencing of the Ig gene repertoire data using antibody structural information

    No full text
    Next-generation sequencing of the Ig gene repertoire (Ig-seq) produces large volumes of information at the nucleotide sequence level. Such data have improved our understanding of immune systems across numerous species and have already been successfully applied in vaccine development and drug discovery. However, the high-throughput nature of Ig-seq means that it is afflicted by high error rates. This has led to the development of error-correction approaches. Computational error-correction methods use sequence information alone, primarily designating sequences as likely to be correct if they are observed frequently. In this work, we describe an orthogonal method for filtering Ig-seq data, which considers the structural viability of each sequence. A typical natural Ab structure requires the presence of a disulfide bridge within each of its variable chains to maintain the fold. Our Ab Sequence Selector (ABOSS) uses the presence/absence of this bridge as a way of both identifying structurally viable sequences and estimating the sequencing error rate. On simulated Ig-seq datasets, ABOSS is able to identify more than 99% of structurally viable sequences. Applying our method to six independent Ig-seq datasets (one mouse and five human), we show that our error calculations are in line with previous experimental and computational error estimates. We also show how ABOSS is able to identify structurally impossible sequences missed by other error-correction methods

    How B cell receptor repertoire sequencing can be enriched with structural antibody data

    No full text
    Next-generation sequencing of immunoglobulin gene repertoires (Ig-seq) allows the investigation of large-scale antibody dynamics at a sequence level. However, structural information, a crucial descriptor of antibody binding capability is not collected in Ig-seq protocols. Developing systematic relationships between the antibody sequence information gathered from Ig-seq and low-throughput techniques such as X-ray crystallography could radically improve our understanding of antibodies. The mapping of Ig-seq datasets to known antibody structures can indicate structurally, and perhaps functionally, uncharted areas. Furthermore, contrasting naïve and antigenically challenged datasets using structural antibody descriptors should provide insights into antibody maturation. As the number of antibody structures steadily increases and more and more Ig-seq datasets become available, the opportunities that arise from combining the two types of information increase as well. Here we review how these data types enrich one another and show potential for advancing our knowledge of the immune system and improving antibody engineering

    Public Baseline and shared response structures support the theory of antibody repertoire functional commonality

    No full text
    The naïve antibody/B-cell receptor (BCR) repertoires of different individuals ought to exhibit significant functional commonality, given that most pathogens trigger an effective antibody response to immunodominant epitopes. Sequence-based repertoire analysis has so far offered little evidence for this phenomenon. For example, a recent study estimated the number of shared (‘public’) antibody clonotypes in circulating baseline repertoires to be around 0.02% across ten unrelated individuals. However, to engage the same epitope, antibodies only require a similar binding site structure and the presence of key paratope interactions, which can occur even when their sequences are dissimilar. Here, we search for evidence of geometric similarity/convergence across human antibody repertoires. We first structurally profile naïve (‘baseline’) antibody diversity using snapshots from 41 unrelated individuals, predicting all modellable distinct structures within each repertoire. This analysis uncovers a high (much greater than random) degree of structural commonality. For instance, around 3% of distinct structures are common to the ten most diverse individual samples (‘Public Baseline’ structures). Our approach is the first computational method to find levels of BCR commonality commensurate with epitope immunodominance and could therefore be harnessed to find more genetically distant antibodies with same-epitope complementarity. We then apply the same structural profiling approach to repertoire snapshots from three individuals before and after flu vaccination, detecting a convergent structural drift indicative of recognising similar epitopes (‘Public Response’ structures). We show that Antibody Model Libraries derived from Public Baseline and Public Response structures represent a powerful geometric basis set of low-immunogenicity candidates exploitable for general or target-focused therapeutic antibody screening

    Large scale paired antibody language models

    Full text link
    Antibodies are proteins produced by the immune system that can identify and neutralise a wide variety of antigens with high specificity and affinity, and constitute the most successful class of biotherapeutics. With the advent of next-generation sequencing, billions of antibody sequences have been collected in recent years, though their application in the design of better therapeutics has been constrained by the sheer volume and complexity of the data. To address this challenge, we present IgBert and IgT5, the best performing antibody-specific language models developed to date which can consistently handle both paired and unpaired variable region sequences as input. These models are trained comprehensively using the more than two billion unpaired sequences and two million paired sequences of light and heavy chains present in the Observed Antibody Space dataset. We show that our models outperform existing antibody and protein language models on a diverse range of design and regression tasks relevant to antibody engineering. This advancement marks a significant leap forward in leveraging machine learning, large scale data sets and high-performance computing for enhancing antibody design for therapeutic development.Comment: 14 pages, 2 figures, 6 tables, model weights available at https://zenodo.org/doi/10.5281/zenodo.1087690

    Different B cell subpopulations show distinct patterns in their IgH repertoire metrics

    No full text
    Several human B cell subpopulations are recognised in the peripheral blood, which play distinct roles in the humoral immune response. These cells undergo developmental and maturational changes involving VDJ recombination, somatic hypermutation and class switch recombination, altogether shaping their immunoglobulin heavy chain (IgH) repertoire. Here, we sequenced the IgH repertoire of naïve, marginal zone, switched and plasma cells from 10 healthy adults along with matched unsorted and in silico separated CD19+ bulk B cells. Using advanced bioinformatic analysis and machine learning, we show that sorted B cell subpopulations are characterised by distinct repertoire characteristics on both the individual sequence and the repertoire level. Sorted subpopulations shared similar repertoire characteristics with their corresponding in silico separated subsets. Furthermore, certain IgH repertoire characteristics correlated with the position of the constant region on the IgH locus. Overall, this study provides unprecedented insight over mechanisms of B cell repertoire control in peripherally circulating B cell subpopulations

    Observed antibody space: a resource for data mining next generation sequencing of antibody repertoires

    No full text
    Abs are immune system proteins that recognize noxious molecules for elimination. Their sequence diversity and binding versatility have made Abs the primary class of biopharmaceuticals. Recently, it has become possible to query their immense natural diversity using next-generation sequencing of Ig gene repertoires (Ig-seq). However, Ig-seq outputs are currently fragmented across repositories and tend to be presented as raw nucleotide reads, which means nontrivial effort is required to reuse the data for analysis. To address this issue, we have collected Ig-seq outputs from 55 studies, covering more than half a billion Ab sequences across diverse immune states, organisms (primarily human and mouse), and individuals. We have sorted, cleaned, annotated, translated, and numbered these sequences and make the data available via our Observed Antibody Space (OAS) resource at http://antibodymap.org. The data within OAS will be regularly updated with newly released Ig-seq datasets. We believe OAS will facilitate data mining of immune repertoires for improved understanding of the immune system and development of better biotherapeutics
    corecore