7 research outputs found

    Entropy is a Simple Measure of the Antibody Profile and is an Indicator of Health Status: A Proof of Concept

    Get PDF
    abstract: We have previously shown that the diversity of antibodies in an individual can be displayed on chips on which 130,000 peptides chosen from random sequence space have been synthesized. This immunosignature technology is unbiased in displaying antibody diversity relative to natural sequence space, and has been shown to have diagnostic and prognostic potential for a wide variety of diseases and vaccines. Here we show that a global measure such as Shannon’s entropy can be calculated for each immunosignature. The immune entropy was measured across a diverse set of 800 people and in 5 individuals over 3 months. The immune entropy is affected by some population characteristics and varies widely across individuals. We find that people with infections or breast cancer, generally have higher entropy values than non-diseased individuals. We propose that the immune entropy as measured from immunosignatures may be a simple method to monitor health in individuals and populations.The final version of this article, as published in Scientific Reports, can be viewed online at: http://www.nature.com/articles/s41598-017-18469-

    Maximum-Entropy Models of Sequenced Immune Repertoires Predict Antigen-Antibody Affinity

    No full text
    <div><p>The immune system has developed a number of distinct complex mechanisms to shape and control the antibody repertoire. One of these mechanisms, the affinity maturation process, works in an evolutionary-like fashion: after binding to a foreign molecule, the antibody-producing B-cells exhibit a high-frequency mutation rate in the genome region that codes for the antibody active site. Eventually, cells that produce antibodies with higher affinity for their cognate antigen are selected and clonally expanded. Here, we propose a new statistical approach based on maximum entropy modeling in which a scoring function related to the binding affinity of antibodies against a specific antigen is inferred from a sample of sequences of the immune repertoire of an individual. We use our inference strategy to infer a statistical model on a data set obtained by sequencing a fairly large portion of the immune repertoire of an HIV-1 infected patient. The Pearson correlation coefficient between our scoring function and the IC<sub>50</sub> neutralization titer measured on 30 different antibodies of known sequence is as high as 0.77 (p-value 10<sup>−6</sup>), outperforming other sequence- and structure-based models.</p></div

    Maximum-Entropy Models of Sequenced Immune Repertoires Predict Antigen-Antibody Affinity - Fig 6

    No full text
    <p><i>Left Panel:</i> Direct Information map computed on the <i>hypermutated cluster</i>. The internal contact map of the VRC-PG04 heavy chain is shown in gray (PDB 3SE9). Two residues are considered to be in contact if at least a pair of heavy atoms is at a distance lower than 8Ã…. The first 300 couples with higher Direct Information DI [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004870#pcbi.1004870.ref013" target="_blank">13</a>] are displayed in green when they superpose to the internal contacts (true positives internal contact predictions) and in red when they do not (false positive internal contact predictions). <i>Right Panel:</i> Sensitivity plot of the Direct Information (DI) and Mutual Information (MI).</p

    Use of Large, Immunosignature Databases to Pose New Questions About Infection and Health Status

    Get PDF
    abstract: Immunosignature is a technology that retrieves information from the immune system. The technology is based on microarrays with peptides chosen from random sequence space. My thesis focuses on improving the Immunosignature platform and using Immunosignatures to improve diagnosis for diseases. I first contributed to the optimization of the immunosignature platform by introducing scoring metrics to select optimal parameters, considering performance as well as practicality. Next, I primarily worked on identifying a signature shared across various pathogens that can distinguish them from the healthy population. I further retrieved consensus epitopes from the disease common signature and proposed that most pathogens could share the signature by studying the enrichment of the common signature in the pathogen proteomes. Following this, I worked on studying cancer samples from different stages and correlated the immune response with whether the epitope presented by tumor is similar to the pathogen proteome. An effective immune response is defined as an antibody titer increasing followed by decrease, suggesting elimination of the epitope. I found that an effective immune response usually correlates with epitopes that are more similar to pathogens. This suggests that the immune system might occupy a limited space and can be effective against only certain epitopes that have similarity with pathogens. I then participated in the attempt to solve the antibiotic resistance problem by developing a classification algorithm that can distinguish bacterial versus viral infection. This algorithm outperforms other currently available classification methods. Finally, I worked on the concept of deriving a single number to represent all the data on the immunosignature platform. This is in resemblance to the concept of temperature, which is an approximate measurement of whether an individual is healthy. The measure of Immune Entropy was found to work best as a single measurement to describe the immune system information derived from the immunosignature. Entropy is relatively invariant in healthy population, but shows significant differences when comparing healthy donors with patients either infected with a pathogen or have cancer.Dissertation/ThesisDoctoral Dissertation Molecular and Cellular Biology 201

    Bayesian inference of virus evolutionary models from next-generation sequencing data

    Get PDF
    There is a rich tradition in mathematical biology of modeling virus population dynamics within hosts. Such models can reproduce trends in the progression of viral infections such as HIV-1, and have also generated insights on the emergence of drug resistance and treatment strategies. Existing mathematical work has focused on the problem of predicting dynamics given model parameters. The problem of estimating model parameters from observed data has received little attention. One reason is likely the historical difficulty of obtaining high-resolution samples of virus diversity within hosts. Now, next-generation sequencing (NGS) approaches developed in the past decade can supply such data. This thesis presents two Bayesian methods that harness classical models to generate testable hypotheses from NGS datasets. The quasispecies equilibrium explains genetic variation in virus populations as a balance between mutation and selection. We use this model to infer fitness effects of individual mutations and pairs of interacting mutations. Although our method provides a high resolution and accurate picture of the fitness landscape when equilibrium holds, we demonstrate the common observation of populations with coexisting, divergent viruses is unlikely to be consistent with equilibrium. Our second statistical method estimates virus growth rates and binding affinity between viruses and antibodies using the generalized Lotka-Volterra model. Immune responses can explain coexistence of abundant virus variants and their trajectories through time. Additionally, we can draw inferences about immune escape and antibody genetic variants responsible for improved virus recognition

    Bayesian statistical approach for protein residue-residue contact prediction

    Get PDF
    Despite continuous efforts in automating experimental structure determination and systematic target selection in structural genomics projects, the gap between the number of known amino acid sequences and solved 3D structures for proteins is constantly widening. While DNA sequencing technologies are advancing at an extraordinary pace, thereby constantly increasing throughput while at the same time reducing costs, protein structure determination is still labour intensive, time-consuming and expensive. This trend illustrates the essential importance of complementary computational approaches in order to bridge the so-called sequence-structure gap. About half of the protein families lack structural annotation and therefore are not amenable to techniques that infer protein structure from homologs. These protein families can be addressed by de novo structure prediction approaches that in practice are often limited by the immense computational costs required to search the conformational space for the lowest-energy conformation. Improved predictions of contacts between amino acid residues have been demonstrated to sufficiently constrain the overall protein fold and thereby extend the applicability of de novo methods to larger proteins. Residue-residue contact prediction is based on the idea that selection pressure on protein structure and function can lead to compensatory mutations between spatially close residues. This leaves an echo of correlation signatures that can be traced down from the evolutionary record. Despite the success of contact prediction methods, there are several challenges. The most evident limitation lies in the requirement of deep alignments, which excludes the majority of protein families without associated structural information that are the focus for contact guided de novo structure prediction. The heuristics applied by current contact prediction methods pose another challenge, since they omit available coevolutionary information. This work presents two different approaches for addressing the limitations of contact prediction methods. Instead of inferring evolutionary couplings by maximizing the pseudo-likelihood, I maximize the full likelihood of the statistical model for protein sequence families. This approach performed with comparable precision up to minor improvements over the pseudo-likelihood methods for protein families with few homologous sequences. A Bayesian statistical approach has been developed that provides posterior probability estimates for residue-residue contacts and eradicates the use of heuristics. The full information of coevolutionary signatures is exploited by explicitly modelling the distribution of statistical couplings that reflects the nature of residue-residue interactions. Surprisingly, the posterior probabilities do not directly translate into more precise predictions than obtained by pseudo-likelihood methods combined with prior knowledge. However, the Bayesian framework offers a statistically clean and theoretically solid treatment for the contact prediction problem. This flexible and transparent framework provides a convenient starting point for further developments, such as integrating more complex prior knowledge. The model can also easily be extended towards the Derivation of probability estimates for residue-residue distances to enhance the precision of predicted structures
    corecore