4 research outputs found

    Secondary Structure Segments are Much More Conserved than Primary Sequence Segments

    Get PDF
    oai:ojs.scjournal.ius.edu.ba:article/99To be biologically functional, all proteins must adopt specific folded three-dimensional structures. Some believes in that the genetic information for the protein specifies only the primary structure, the linear sequence of amino acids in the polypeptide backbone, and most purified proteins can spontaneously refold in vitro after being completely unfolded, so the three-dimensional structure must be determined by the primary structure (Creighton, 1990). How this occurs has come to be known as 'the protein folding problem'. As a part of the protein folding problem, the existence of similar substrings in diverse proteins is remarkable. Some scientist call it “conserved core” which echoes the claim that all proteins diversified from a common ancestor protein, and these similar pieces of the two or several proteins are the substrings that resisted the pressure of the evolution. Due to naturally-occurring (DNA fails to copy accurately) and external influences just like ultraviolet radiation, electromagnetic fields, atomic radiations, protein coding genes and proteins may undergo some changes by the time in response to mutations. The rate of these mutations is strongly correlated to the intensity of the environmental conditions, and it is not possible to estimate a constant rate just in the case of radioactive decay. Also there is no much evidence that the diversity of proteins relies on only these mutations. For this reason we prefer the term "similar substrings". In this paper we focused in the relation between primary and secondary structure mismatches of the substrings of length seventeen residues. We have seen that the mismatches in the corresponding secondary structure sequence substrings of the same length lags behind primary mismatches. We constructed a conditional probability landscape that resembles the conditional probability of a certain secondary substring mismatch given the primary substring mismatch. This landscape shows that even when 6-7 mismatches exist in two primary substrings of length 17 that belong to the two different proteins, the probability of full match of corresponding secondary structure substrings is remarkable. We downloaded primary and secondary sequences of all 303,524 proteins of the PDB protein databank. Eliminating the duplicates and proteins of residue length less than 30, we have got a non redundant database of 80,592. We developed a search algorithm FIND-SIM to find similar primary sequence substrings in a query protein and target proteins. Some examples of full secondary structure matches of short substrings corresponding to short primary structure substrings with high mismatches are given

    Leveraging Structural Flexibility to Predict Protein Function

    Get PDF
    Proteins are essentially versatile and flexible molecules and understanding protein function plays a fundamental role in understanding biological systems. Protein structure comparisons are widely used for revealing protein function. However,with rigidity or partial rigidity assumption, most existing comparison methods do not consider conformational flexibility in protein structures. To address this issue, this thesis seeks to develop algorithms for flexible structure comparisons to predict one specific aspect of protein function, binding specificity. Given conformational samples as flexibility representation, we focus on two predictive problems related to specificity: aggregate prediction and individual prediction.For aggregate prediction, we have designed FAVA (Flexible Aggregate Volumetric Analysis). FAVA is the first conformationally general method to compare proteins with identical folds but different specificities. FAVA is able to correctly categorize members of protein superfamilies and to identify influential amino acids that cause different specificities. A second method PEAP (Point-based Ensemble for Aggregate Prediction) employs ensemble clustering techniques from many base clustering to predict binding specificity. This method incorporates structural motions of functional substructures and is capable of mitigating prediction errors.For individual prediction, the first method is an atomic point representation for representing flexibilities in the binding cavity. This representation is able to predict binding specificity on each protein conformation with high accuracy, and it is the first to analyze maps of binding cavity conformations that describe proteins with different specificities. Our second method introduces a volumetric lattice representation. This representation localizes solvent-accessible shape of the binding cavity by computing cavity volume in each user-defined space. It proves to be more informative than point-based representations. Last but not least, we discuss a structure-independent representation. This representation builds a lattice model on protein electrostatic isopotentials. This is the first known method to predict binding specificity explicitly from the perspective of electrostatic fields.The methods presented in this thesis incorporate the variety of protein conformations into the analysis of protein ligand binding, and provide more views on flexible structure comparisons and structure-based function annotation of molecular design

    Role of complement genetic variants in inflammatory diseases by an interactive database and protein structure modelling

    Get PDF
    The rare diseases atypical haemolytic uraemic syndrome (aHUS) and C3 glomerulopathy (C3G) are associated with dysregulation of complement activation. It is unclear which genes most frequently predispose to aHUS or C3G. Accordingly, a six- centre analysis of 610 rare genetic variants in 13 mostly complement genes from >3500 patients with aHUS and C3G was performed. A new interactive Database of Complement Gene Variants was developed to extract allele frequencies for these 13 genes using the Exome Aggregation Consortium server as the reference genome. For aHUS, significantly more protein-altering rare variation was found in the five genes CFH, CFI, CD46, C3 and DGKE than in ExAC. For C3G, an association was only found for rare variants in C3 and the N-terminal C3b-binding or C-terminal non-surface-associated regions of factor H (FH). FH is the major regulator of C3b and its Tyr402His polymorphism is an age-related macular degeneration risk-factor. To better understand FH complement binding, the solution structures of both allotypes were studied. Starting from known FH short complement regulator domains and glycan structures, small angle X-ray scattering data were fitted using Monte Carlo methods to determine atomistic structures for monomeric FH. The analysis of 29,715 physically realistic but randomised FH conformations resulted in 100 similar best-fit FH structures for each allotype. Two distinct molecular structures resulted; an extended N-terminal domain arrangement with a folded-back C-terminus, or an extended C-terminus and folded-back N-terminus. To clarify FH functional roles in host protection, crystal structures for the FH complexes with C3b and C3dg revealed that the extended N-terminal conformation accounted for C3b fluid phase regulation, the extended C-terminal conformation accounted for C3d binding, and both conformations accounted for bivalent FH binding to the host cell-surface. Finally, statistical analyses indicated that the structural location of rare variants in complement may predict the occurrences of aHUS or C3G
    corecore