5 research outputs found

    Determining a substitution matrix for the alignment of disordered proteins

    Get PDF
    As the research of disordered proteins progresses and more disordered protein sequences are discovered, an optimal substitution matrix for the alignment of these sequences must be elucidated. The currently used substitution matrices, PAM and BLOSUM, are ideal for the alignment of general protein sequences. But it is discovered that this set of matrices is not adequate for the specific alignment of disordered protein sequences. By implementing genetic algorithms, a substitution matrix improved for the alignment of disordered proteins has been achieved. The genetic algorithm determined matrix performed two times better when compared to BLOSUM62 and PAM250

    Comparing Models of Evolution for Ordered and Disordered Proteins

    Get PDF
    Most models of protein evolution are based upon proteins that form relatively rigid 3D structures. A significant fraction of proteins, the so-called disordered proteins, do not form rigid 3D structures and sample a broad conformational ensemble. Disordered proteins do not typically maintain long-range interactions, so the constraints on their evolution should be different than ordered proteins. To test this hypothesis, we developed and compared models of evolution for disordered and ordered proteins. Substitution matrices were constructed using the sequences of putative homologs for sets of experimentally characterized disordered and ordered proteins. Separate matrices, at three levels of sequence similarity (>85%, 85–60%, and 60–40%), were inferred for each type of protein structure. The substitution matrices for disordered and ordered proteins differed significantly at each level of sequence similarity. The disordered matrices reflected a greater likelihood of evolutionary changes, relative to the ordered matrices, and these changes involved nonconservative substitutions. Glutamic acid and asparagine were interesting exceptions to this result. Important differences between the substitutions that are accepted in disordered proteins relative to ordered proteins were also identified. In general, disordered proteins have fewer evolutionary constraints than ordered proteins. However, some residues like tryptophan and tyrosine are highly conserved in disordered proteins. This is due to their important role in forming protein–protein interfaces. Finally, the amino acid frequencies for disordered proteins, computed during the development of the matrices, were compared with amino acid frequencies for different categories of secondary structure in ordered proteins. The highest correlations were observed between the amino acid frequencies in disordered proteins and the solvent-exposed loops and turns of ordered proteins, supporting an emerging structural model for disordered proteins

    Length-dependent prediction of protein intrinsic disorder

    Get PDF
    BACKGROUND: Due to the functional importance of intrinsically disordered proteins or protein regions, prediction of intrinsic protein disorder from amino acid sequence has become an area of active research as witnessed in the 6th experiment on Critical Assessment of Techniques for Protein Structure Prediction (CASP6). Since the initial work by Romero et al. (Identifying disordered regions in proteins from amino acid sequences, IEEE Int. Conf. Neural Netw., 1997), our group has developed several predictors optimized for long disordered regions (>30 residues) with prediction accuracy exceeding 85%. However, these predictors are less successful on short disordered regions (≤30 residues). A probable cause is a length-dependent amino acid compositions and sequence properties of disordered regions. RESULTS: We proposed two new predictor models, VSL2-M1 and VSL2-M2, to address this length-dependency problem in prediction of intrinsic protein disorder. These two predictors are similar to the original VSL1 predictor used in the CASP6 experiment. In both models, two specialized predictors were first built and optimized for short (≤30 residues) and long disordered regions (>30 residues), respectively. A meta predictor was then trained to integrate the specialized predictors into the final predictor model. As the 10-fold cross-validation results showed, the VSL2 predictors achieved well-balanced prediction accuracies of 81% on both short and long disordered regions. Comparisons over the VSL2 training dataset via 10-fold cross-validation and a blind-test set of unrelated recent PDB chains indicated that VSL2 predictors were significantly more accurate than several existing predictors of intrinsic protein disorder. CONCLUSION: The VSL2 predictors are applicable to disordered regions of any length and can accurately identify the short disordered regions that are often misclassified by our previous disorder predictors. The success of the VSL2 predictors further confirmed the previously observed differences in amino acid compositions and sequence properties between short and long disordered regions, and justified our approaches for modelling short and long disordered regions separately. The VSL2 predictors are freely accessible for non-commercial use a
    corecore