461 research outputs found
Alignment of helical membrane protein sequences using AlignMe
Few sequence alignment methods have been designed specifically for integral membrane proteins, even though these important proteins have distinct evolutionary and structural properties that might affect their alignments. Existing approaches typically consider membrane-related information either by using membrane-specific substitution matrices or by assigning distinct penalties for gap creation in transmembrane and non-transmembrane regions. Here, we ask whether favoring matching of predicted transmembrane segments within a standard dynamic programming algorithm can improve the accuracy of pairwise membrane protein sequence alignments. We tested various strategies using a specifically designed program called AlignMe. An updated set of homologous membrane protein structures, called HOMEP2, was used as a reference for optimizing the gap penalties. The best of the membrane-protein optimized approaches were then tested on an independent reference set of membrane protein sequence alignments from the BAliBASE collection. When secondary structure (S) matching was combined with evolutionary information (using a position-specific substitution matrix (P)), in an approach we called AlignMePS, the resultant pairwise alignments were typically among the most accurate over a broad range of sequence similarities when compared to available methods. Matching transmembrane predictions (T), in addition to evolutionary information, and secondary-structure predictions, in an approach called AlignMePST, generally reduces the accuracy of the alignments of closely-related proteins in the BAliBASE set relative to AlignMePS, but may be useful in cases of extremely distantly related proteins for which sequence information is less informative. The open source AlignMe code is available at https://sourceforge.net/projects/alignme/, and at http://www.forrestlab.org, along with an online server and the HOMEP2 data set
Improving protein order-disorder classification using charge-hydropathy plots
BACKGROUND: The earliest whole protein order/disorder predictor (Uversky et al., Proteins, 41: 415-427 (2000)), herein called the charge-hydropathy (C-H) plot, was originally developed using the Kyte-Doolittle (1982) hydropathy scale (Kyte & Doolittle., J. Mol. Biol, 157: 105-132(1982)). Here the goal is to determine whether the performance of the C-H plot in separating structured and disordered proteins can be improved by using an alternative hydropathy scale.
RESULTS: Using the performance of the CH-plot as the metric, we compared 19 alternative hydropathy scales, with the finding that the Guy (1985) hydropathy scale (Guy, Biophys. J, 47:61-70(1985)) was the best of the tested hydropathy scales for separating large collections structured proteins and intrinsically disordered proteins (IDPs) on the C-H plot. Next, we developed a new scale, named IDP-Hydropathy, which further improves the discrimination between structured proteins and IDPs. Applying the C-H plot to a dataset containing 109 IDPs and 563 non-homologous fully structured proteins, the Kyte-Doolittle (1982) hydropathy scale, the Guy (1985) hydropathy scale, and the IDP-Hydropathy scale gave balanced two-state classification accuracies of 79%, 84%, and 90%, respectively, indicating a very substantial overall improvement is obtained by using different hydropathy scales. A correlation study shows that IDP-Hydropathy is strongly correlated with other hydropathy scales, thus suggesting that IDP-Hydropathy probably has only minor contributions from amino acid properties other than hydropathy.
CONCLUSION: We suggest that IDP-Hydropathy would likely be the best scale to use for any type of algorithm developed to predict protein disorder
Recommended from our members
Investigation of transmembrane proteins using a computational approach
Background: An important subfamily of membrane proteins are the transmembrane α-helical proteins, in which the membrane-spanning regions are made up of α-helices. Given the obvious biological and medical significance of these proteins, it is of tremendous practical importance to identify the location of transmembrane segments. The difficulty of inferring the secondary or tertiary structure of transmembrane proteins using experimental techniques has led to a surge of interest in applying techniques from machine learning and bioinformatics to infer secondary structure from primary structure in these proteins. We are therefore interested in determining which physicochemical properties are most useful for discriminating transmembrane segments from non-transmembrane segments in transmembrane proteins, and for discriminating intrinsically unstructured segments from intrinsically structured segments in transmembrane proteins, and in using the results of these investigations to develop classifiers to identify transmembrane segments in transmembrane proteins. Results: We determined that the most useful properties for discriminating transmembrane segments from non-transmembrane segments and for discriminating intrinsically unstructured segments from intrinsically structured segments in transmembrane proteins were hydropathy, polarity, and flexibility, and used the results of this analysis to construct classifiers to discriminate transmembrane segments from non-transmembrane segments using four classification techniques: two variants of the Self-Organizing Global Ranking algorithm, a decision tree algorithm, and a support vector machine algorithm. All four techniques exhibited good performance, with out-of-sample accuracies of approximately 75%. Conclusions: Several interesting observations emerged from our study: intrinsically unstructured segments and transmembrane segments tend to have opposite properties; transmembrane proteins appear to be much richer in intrinsically unstructured segments than other proteins; and, in approximately 70% of transmembrane proteins that contain intrinsically unstructured segments, the intrinsically unstructured segments are close to transmembrane segments
Machine learning can guide experimental approaches for protein digestibility estimations
Food protein digestibility and bioavailability are critical aspects in
addressing human nutritional demands, particularly when seeking sustainable
alternatives to animal-based proteins. In this study, we propose a machine
learning approach to predict the true ileal digestibility coefficient of food
items. The model makes use of a unique curated dataset that combines
nutritional information from different foods with FASTA sequences of some of
their protein families. We extracted the biochemical properties of the proteins
and combined these properties with embeddings from a Transformer-based protein
Language Model (pLM). In addition, we used SHAP to identify features that
contribute most to the model prediction and provide interpretability. This
first AI-based model for predicting food protein digestibility has an accuracy
of 90% compared to existing experimental techniques. With this accuracy, our
model can eliminate the need for lengthy in-vivo or in-vitro experiments,
making the process of creating new foods faster, cheaper, and more ethical.Comment: 50 pages, submitted to Nature Foo
Software tools for simultaneous data visualization and T cell epitopes and disorder prediction in proteins
We have developed EpDis and MassPred, extendable open source software tools that support bioinformatic research and enable parallel use of different methods for the prediction of T cell epitopes, disorder and disordered binding regions and hydropathy calculation. These tools offer a semi-automated installation of chosen sets of external predictors and an interface allowing for easy application of the prediction methods, which can be applied either to individual proteins or to datasets of a large number of proteins. In addition to access to prediction methods, the tools also provide visualization of the obtained results, calculation of consensus from results of different methods, as well as import of experimental data and their comparison with results obtained with different predictors. The tools also offer a graphical user interface and the possibility to store data and the results obtained using all of the integrated methods in the relational database or flat file for further analysis. The MassPred part enables a massive parallel application of all integrated predictors to the set of proteins. Both tools can be downloaded from http://bioinfo.matf.bg.ac.rs/home/downloads.wafl?cat=Software. Appendix A includes the technical description of the created tools and a list of supported predictors
Software tools for simultaneous data visualization and T cell epitopes and disorder prediction in proteins
We have developed EpDis and MassPred, extendable open source software tools that support bioinformatic research and enable parallel use of different methods for the prediction of T cell epitopes, disorder and disordered binding regions and hydropathy calculation. These tools offer a semi-automated installation of chosen sets of external predictors and an interface allowing for easy application of the prediction methods, which can be applied either to individual proteins or to datasets of a large number of proteins. In addition to access to prediction methods, the tools also provide visualization of the obtained results, calculation of consensus from results of different methods, as well as import of experimental data and their comparison with results obtained with different predictors. The tools also offer a graphical user interface and the possibility to store data and the results obtained using all of the integrated methods in the relational database or flat file for further analysis. The MassPred part enables a massive parallel application of all integrated predictors to the set of proteins. Both tools can be downloaded from http://bioinfo.matf.bg.ac.rs/home/downloads.wafl?cat=Software. Appendix A includes the technical description of the created tools and a list of supported predictors
Factors Affecting Synonymous Codon Usage Bias in Chloroplast Genome of Oncidium Gower Ramsey
Oncidium Gower Ramsey is a fascinating and important ornamental flower in floral industry. In this research, the complete nucleotide sequence of the chloroplast genome in Oncidium Gower Ramsey was studied, then analyzed using Codonw software. Correspondence analysis and method of effective number of codon as Nc-plot were conducted to analyze synonymous codon usage. According to the corresponding analysis, codon bias in the chloroplast genome of Oncidium Gower Ramsey is related to their gene length, mutation bias, gene hydropathy level of each protein, gene function and selection or gene expression only subtly affect codon usage. This study will provide insights into the molecular evolution study and high-level transgene expression
Sensitivity of Water Dynamics to Biologically Significant Surfaces of Monomeric Insulin: Role of Topology and Electrostatic Interactions
In addition to the biologically active monomer of the protein Insulin
circulating in human blood, the molecule also exists in dimeric and hexameric
forms that are used as storage. The Insulin monomer contains two distinct
surfaces, namely the dimer forming surface (DFS) and the hexamer forming
surface (HFS) that are specifically designed to facilitate the formation of the
dimer and the hexamer, respectively. In order to characterize the structural
and dynamical behaviour of interfacial water molecules near these two surfaces
(DFS and HFS), we performed atomistic molecular dynamics simulations of Insulin
with explicit water. Dynamical characterization reveals that the structural
relaxation of the hydrogen bonds formed between the residues of DFS and the
interfacial water molecules is faster than those formed between water and that
of the HFS. Furthermore, the residence times of water molecules in the protein
hydration layer for both the DFS and HFS are found to be significantly higher
than those for some of the other proteins studied so far, such as HP-36 and
lysozyme. The surface topography and the arrangement of amino acid residues
work together to organize the water molecules in the hydration layer in order
to provide them with a preferred orientation. HFS having a large polar solvent
accessible surface area and a convex extensive nonpolar region, drives the
surrounding water molecules to acquire predominantly a clathrate-like
structure. In contrast, near the DFS, the surrounding water molecules acquire
an inverted orientation owing to the flat curvature of hydrophobic surface and
interrupted hydrophilic residual alignment. We have followed escape trajectory
of several such quasi-bound water molecules from both the surfaces and
constructed free energy surfaces of these water molecules.These free energy
surfaces reveal the differences between the two hydration layers.Comment: 34 pages, 10 figure
- …