16 research outputs found

    New methods to measure residues coevolution in proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The covariation of two sites in a protein is often used as the degree of their coevolution. To quantify the covariation many methods have been developed and most of them are based on residues position-specific frequencies by using the mutual information (MI) model.</p> <p>Results</p> <p>In the paper, we proposed several new measures to incorporate new biological constraints in quantifying the covariation. The first measure is the mutual information with the amino acid background distribution (MIB), which incorporates the amino acid background distribution into the marginal distribution of the MI model. The modification is made to remove the effect of amino acid evolutionary pressure in measuring covariation. The second measure is the mutual information of residues physicochemical properties (MIP), which is used to measure the covariation of physicochemical properties of two sites. The third measure called MIBP is proposed by applying residues physicochemical properties into the MIB model. Moreover, scores of our new measures are applied to a robust indicator <it>conn(k) </it>in finding the covariation signal of each site.</p> <p>Conclusions</p> <p>We find that incorporating amino acid background distribution is effective in removing the effect of evolutionary pressure of amino acids. Thus the MIB measure describes more biological background information for the coevolution of residues. Besides, our analysis also reveals that the covariation of physicochemical properties is a new aspect of coevolution information.</p

    Epitope mapping using combinatorial phage-display libraries: a graph-based algorithm

    Get PDF
    A phage-display library of random peptides is a combinatorial experimental technique that can be harnessed for studying antibody–antigen interactions. In this technique, a phage peptide library is scanned against an antibody molecule to obtain a set of peptides that are bound by the antibody with high affinity. This set of peptides is regarded as mimicking the genuine epitope of the antibody's interacting antigen and can be used to define it. Here we present PepSurf, an algorithm for mapping a set of affinity-selected peptides onto the solved structure of the antigen. The problem of epitope mapping is converted into the task of aligning a set of query peptides to a graph representing the surface of the antigen. The best match of each peptide is found by aligning it against virtually all possible paths in the graph. Following a clustering step, which combines the most significant matches, a predicted epitope is inferred. We show that PepSurf accurately predicts the epitope in four cases for which the epitope is known from a solved antibody–antigen co-crystal complex. We further examine the capabilities of PepSurf for predicting other types of protein–protein interfaces. The performance of PepSurf is compared to other available epitope mapping programs

    Using Shifts in Amino Acid Frequency and Substitution Rate to Identify Latent Structural Characters in Base-Excision Repair Enzymes

    Get PDF
    Protein evolution includes the birth and death of structural motifs. For example, a zinc finger or a salt bridge may be present in some, but not all, members of a protein family. We propose that such transitions are manifest in sequence phylogenies as concerted shifts in substitution rates of amino acids that are neighbors in a representative structure. First, we identified rate shifts in a quartet from the Fpg/Nei family of base excision repair enzymes using a method developed by Xun Gu and coworkers. We found the shifts to be spatially correlated, more precisely, associated with a flexible loop involved in bacterial Fpg substrate specificity. Consistent with our result, sequences and structures provide convincing evidence that this loop plays a very different role in other family members. Second, then, we developed a method for identifying latent protein structural characters (LSC) given a set of homologous sequences based on Gu's method and proximity in a high-resolution structure. Third, we identified LSC and assigned states of LSC to clades within the Fpg/Nei family of base excision repair enzymes. We describe seven LSC; an accompanying Proteopedia page (http://proteopedia.org/wiki/index.php/Fpg_Nei_Protein_Family) describes these in greater detail and facilitates 3D viewing. The LSC we found provided a surprisingly complete picture of the interaction of the protein with the DNA capturing familiar examples, such as a Zn finger, as well as more subtle interactions. Their preponderance is consistent with an important role as phylogenetic characters. Phylogenetic inference based on LSC provided convincing evidence of independent losses of Zn fingers. Structural motifs may serve as important phylogenetic characters and modeling transitions involving structural motifs may provide a much deeper understanding of protein evolution

    An association-adjusted consensus deleterious scheme to classify homozygous Mis-sense mutations for personal genome interpretation

    Get PDF
    BACKGROUND: Personal genome analysis is now being considered for evaluation of disease risk in healthy individuals, utilizing both rare and common variants. Multiple scores have been developed to predict the deleteriousness of amino acid substitutions, using information on the allele frequencies, level of evolutionary conservation, and averaged structural evidence. However, agreement among these scores is limited and they likely over-estimate the fraction of the genome that is deleterious. METHOD: This study proposes an integrative approach to identify a subset of homozygous non-synonymous single nucleotide polymorphisms (nsSNPs). An 8-level classification scheme is constructed from the presence/absence of deleterious predictions combined with evidence of association with disease or complex traits. Detailed literature searches and structural validations are then performed for a subset of homozygous 826 mis-sense mutations in 575 proteins found in the genomes of 12 healthy adults. RESULTS: Implementation of the Association-Adjusted Consensus Deleterious Scheme (AACDS) classifies 11% of all predicted highly deleterious homozygous variants as most likely to influence disease risk. The number of such variants per genome ranges from 0 to 8 with no significant difference between African and Caucasian Americans. Detailed analysis of mutations affecting the APOE, MTMR2, THSB1, CHIA, αMyHC, and AMY2A proteins shows how the protein structure is likely to be disrupted, even though the associated phenotypes have not been documented in the corresponding individuals. CONCLUSIONS: The classification system for homozygous nsSNPs provides an opportunity to systematically rank nsSNPs based on suggestive evidence from annotations and sequence-based predictions. The ranking scheme, in-depth literature searches, and structural validations of highly prioritized mis-sense mutations compliment traditional sequence-based approaches and should have particular utility for the development of individualized health profiles. An online tool reporting the AACDS score for any variant is provided at the authors’ website

    Classifying RNA-Binding Proteins Based on Electrostatic Properties

    Get PDF
    Protein structure can provide new insight into the biological function of a protein and can enable the design of better experiments to learn its biological roles. Moreover, deciphering the interactions of a protein with other molecules can contribute to the understanding of the protein's function within cellular processes. In this study, we apply a machine learning approach for classifying RNA-binding proteins based on their three-dimensional structures. The method is based on characterizing unique properties of electrostatic patches on the protein surface. Using an ensemble of general protein features and specific properties extracted from the electrostatic patches, we have trained a support vector machine (SVM) to distinguish RNA-binding proteins from other positively charged proteins that do not bind nucleic acids. Specifically, the method was applied on proteins possessing the RNA recognition motif (RRM) and successfully classified RNA-binding proteins from RRM domains involved in protein–protein interactions. Overall the method achieves 88% accuracy in classifying RNA-binding proteins, yet it cannot distinguish RNA from DNA binding proteins. Nevertheless, by applying a multiclass SVM approach we were able to classify the RNA-binding proteins based on their RNA targets, specifically, whether they bind a ribosomal RNA (rRNA), a transfer RNA (tRNA), or messenger RNA (mRNA). Finally, we present here an innovative approach that does not rely on sequence or structural homology and could be applied to identify novel RNA-binding proteins with unique folds and/or binding motifs

    A broadly applicable artificial selection system for biomolecule evolution

    Get PDF
    Biocatalysis offers an attractive alternative to traditional chemical catalysis. However, it is often found that an enzyme with the optimal properties for a specific application is not available within the natural repertoire of enzymes. It is then desirable to obtain an improved variant by altering the sequence of a known enzyme, in a process known as protein engineering. Directed evolution is one of the most powerful tools for protein engineering. In directed evolution, the process of natural evolution is mimicked in the laboratory at a much shorter timescale and selecting for properties that make the enzyme (or any other type of biomolecule) more suitable for an application of human interest. The main bottleneck of directed evolution is the identification of the desired variants amongst a majority of variants without the sought altered or improved property. Selection approaches link the desired activity to an increased survival rate or improved growth. While in principle such methodologies allow for ultra high-throughput analysis of libraries, most selection techniques have a limited scope, and can only be applied to a relatively reduced set of biomolecules or properties. This thesis presents the most broadly-applicable artificial selection system for the evolution of biomolecules ever reported. The selection platform is based on an engineered E. coli strain with impaired regeneration of NAD+, causing a conditional growth defect during anaerobic fermentation. By directly or indirectly linking the activity of the biomolecules of interest to the oxidation of NADH, cells can be rescued from this growth defect. The efficacy of such selection system has been demonstrated by using it to select alcohol dehydrogenase, imine reductase and nitroreductase variants with altered or enhanced catalytic properties, as well as an isopropanol-producing metabolic pathway with optimised regulatory elements leading to a maximised yield of isopropanol. These results confirm the wide scope of the developed selection system, which can replace conventional screening currently used in many cases of direct relevance for industrial processes. Increasing the throughput of the variant search process by many orders of magnitude will lead to the discovery of novel biomolecules and accelerate the implementation of biocatalysis.Open Acces
    corecore