545 research outputs found

    Prediction of amyloid fibril-forming segments based on a support vector machine

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Amyloid fibrillar aggregates of proteins or polypeptides are known to be associated with many human diseases. Recent studies suggest that short protein regions trigger this aggregation. Thus, identifying these short peptides is critical for understanding diseases and finding potential therapeutic targets.</p> <p>Results</p> <p>We propose a method, named Pafig (Prediction of amyloid fibril-forming segments) based on support vector machines, to identify the hexpeptides associated with amyloid fibrillar aggregates. The features of Pafig were obtained by a two-round selection from AAindex. Using a 10-fold cross validation test on Hexpepset dataset, Pafig performed well with regards to overall accuracy of 81% and Matthews correlation coefficient of 0.63. Pafig was used to predict the potential fibril-forming hexpeptides in all of the 64,000,000 hexpeptides. As a result, approximately 5.08% of hexpeptides showed a high aggregation propensity. In the predicted fibril-forming hexpeptides, the amino acids – alanine, phenylalanine, isoleucine, leucine and valine occurred at the higher frequencies and the amino acids – aspartic acid, glutamic acid, histidine, lysine, arginine and praline, appeared with lower frequencies.</p> <p>Conclusion</p> <p>The performance of Pafig indicates that it is a powerful tool for identifying the hexpeptides associated with fibrillar aggregates and will be useful for large-scale analysis of proteomic data.</p

    Prediction of peptide and protein propensity for amyloid formation

    Get PDF
    Understanding which peptides and proteins have the potential to undergo amyloid formation and what driving forces are responsible for amyloid-like fiber formation and stabilization remains limited. This is mainly because proteins that can undergo structural changes, which lead to amyloid formation, are quite diverse and share no obvious sequence or structural homology, despite the structural similarity found in the fibrils. To address these issues, a novel approach based on recursive feature selection and feed-forward neural networks was undertaken to identify key features highly correlated with the self-assembly problem. This approach allowed the identification of seven physicochemical and biochemical properties of the amino acids highly associated with the self-assembly of peptides and proteins into amyloid-like fibrils (normalized frequency of β-sheet, normalized frequency of β-sheet from LG, weights for β-sheet at the window position of 1, isoelectric point, atom-based hydrophobic moment, helix termination parameter at position j+1 and ΔGº values for peptides extrapolated in 0 M urea). Moreover, these features enabled the development of a new predictor (available at http://cran.r-project.org/web/packages/appnn/index.html) capable of accurately and reliably predicting the amyloidogenic propensity from the polypeptide sequence alone with a prediction accuracy of 84.9 % against an external validation dataset of sequences with experimental in vitro, evidence of amyloid formation

    MILAMP : multiple instance prediction of amyloid proteins

    Get PDF
    Amyloid proteins are implicated in several diseases such as Parkinson’s, Alzheimer’s, prion diseases, etc. In order to characterize the amyloidogenicity of a given protein, it is important to locate the amyloid forming hotspot regions within the protein as well as to analyze the effects of mutations on these proteins. The biochemical and biological assays used for this purpose can be facilitated by computational means. This paper presents a machine learning method that can predict hotspot amyloidogenic regions within proteins and characterize changes in their amyloidogenicity due to point mutations. The proposed method called MILAMP (Multiple Instance Learning of AMyloid Proteins) achieves high accuracy for identification of amyloid proteins, hotspot localization and prediction of mutation effects on amyloidogenicity by integrating heterogenous data sources and exploiting common predictive patterns across these tasks through multiple instance learning. The paper presents comprehensive benchmarking experiments to test the predictive performance of MILAMP in comparison to previously published state of the art techniques for amyloid prediction. The python code for the implementation and webserver for MILAMP is available at the URL: http://faculty.pieas.edu.pk/fayyaz/software.html#MILAMP

    Cooperativity among Short Amyloid Stretches in Long Amyloidogenic Sequences

    Get PDF
    Amyloid fibrillar aggregates of polypeptides are associated with many neurodegenerative diseases. Short peptide segments in protein sequences may trigger aggregation. Identifying these stretches and examining their behavior in longer protein segments is critical for understanding these diseases and obtaining potential therapies. In this study, we combined machine learning and structure-based energy evaluation to examine and predict amyloidogenic segments. Our feature selection method discovered that windows consisting of long amino acid segments of ∼30 residues, instead of the commonly used short hexapeptides, provided the highest accuracy. Weighted contributions of an amino acid at each position in a 27 residue window revealed three cooperative regions of short stretch, resemble the β-strand-turn-β-strand motif in A-βpeptide amyloid and β-solenoid structure of HET-s(218–289) prion (C). Using an in-house energy evaluation algorithm, the interaction energy between two short stretches in long segment is computed and incorporated as an additional feature. The algorithm successfully predicted and classified amyloid segments with an overall accuracy of 75%. Our study revealed that genome-wide amyloid segments are not only dependent on short high propensity stretches, but also on nearby residues

    Computational methods to predict protein aggregation

    Get PDF
    Altres ajuts: Acord transformatiu CRUE-CSICIn most cases, protein aggregation stems from the establishment of non-native intermolecular contacts. The formation of insoluble protein aggregates is associated with many human diseases and is a major bottleneck for the industrial production of protein-based therapeutics. Strikingly, fibrillar aggregates are naturally exploited for structural scaffolding or to generate molecular switches and can be artificially engineered to build up multi-functional nanomaterials. Thus, there is a high interest in rationalizing and forecasting protein aggregation. Here, we review the available computational toolbox to predict protein aggregation propensities, identify sequential or structural aggregation-prone regions, evaluate the impact of mutations on aggregation or recognize prion-like domains. We discuss the strengths and limitations of these algorithms and how they can evolve in the next future

    Atomic structures of TDP-43 LCD segments and insights into reversible or pathogenic aggregation.

    Get PDF
    The normally soluble TAR DNA-binding protein 43 (TDP-43) is found aggregated both in reversible stress granules and in irreversible pathogenic amyloid. In TDP-43, the low-complexity domain (LCD) is believed to be involved in both types of aggregation. To uncover the structural origins of these two modes of β-sheet-rich aggregation, we have determined ten structures of segments of the LCD of human TDP-43. Six of these segments form steric zippers characteristic of the spines of pathogenic amyloid fibrils; four others form LARKS, the labile amyloid-like interactions characteristic of protein hydrogels and proteins found in membraneless organelles, including stress granules. Supporting a hypothetical pathway from reversible to irreversible amyloid aggregation, we found that familial ALS variants of TDP-43 convert LARKS to irreversible aggregates. Our structures suggest how TDP-43 adopts both reversible and irreversible β-sheet aggregates and the role of mutation in the possible transition of reversible to irreversible pathogenic aggregation

    Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences.</p> <p>Results</p> <p>The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%.</p> <p>Conclusions</p> <p>This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general.</p
    corecore