19,024 research outputs found
Identifying Interaction Sites in "Recalcitrant" Proteins: Predicted Protein and Rna Binding Sites in Rev Proteins of Hiv-1 and Eiav Agree with Experimental Data
Protein-protein and protein nucleic acid interactions are vitally important
for a wide range of biological processes, including regulation of gene
expression, protein synthesis, and replication and assembly of many viruses. We
have developed machine learning approaches for predicting which amino acids of
a protein participate in its interactions with other proteins and/or nucleic
acids, using only the protein sequence as input. In this paper, we describe an
application of classifiers trained on datasets of well-characterized
protein-protein and protein-RNA complexes for which experimental structures are
available. We apply these classifiers to the problem of predicting protein and
RNA binding sites in the sequence of a clinically important protein for which
the structure is not known: the regulatory protein Rev, essential for the
replication of HIV-1 and other lentiviruses. We compare our predictions with
published biochemical, genetic and partial structural information for HIV-1 and
EIAV Rev and with our own published experimental mapping of RNA binding sites
in EIAV Rev. The predicted and experimentally determined binding sites are in
very good agreement. The ability to predict reliably the residues of a protein
that directly contribute to specific binding events - without the requirement
for structural information regarding either the protein or complexes in which
it participates - can potentially generate new disease intervention strategies.Comment: Pacific Symposium on Biocomputing, Hawaii, In press, Accepted, 200
SPRINT: Ultrafast protein-protein interaction prediction of the entire human interactome
Proteins perform their functions usually by interacting with other proteins.
Predicting which proteins interact is a fundamental problem. Experimental
methods are slow, expensive, and have a high rate of error. Many computational
methods have been proposed among which sequence-based ones are very promising.
However, so far no such method is able to predict effectively the entire human
interactome: they require too much time or memory. We present SPRINT (Scoring
PRotein INTeractions), a new sequence-based algorithm and tool for predicting
protein-protein interactions. We comprehensively compare SPRINT with
state-of-the-art programs on seven most reliable human PPI datasets and show
that it is more accurate while running orders of magnitude faster and using
very little memory. SPRINT is the only program that can predict the entire
human interactome. Our goal is to transform the very challenging problem of
predicting the entire human interactome into a routine task. The source code of
SPRINT is freely available from github.com/lucian-ilie/SPRINT/ and the datasets
and predicted PPIs from www.csd.uwo.ca/faculty/ilie/SPRINT/
An optimized energy potential can predict SH2 domain-peptide interactions
Peptide recognition modules (PRMs) are used throughout biology to mediate protein-protein interactions, and many PRMs are members of large protein domain families. Members of these families are often quite similar to each other, but each domain recognizes a distinct set of peptides, raising the question of how peptide recognition specificity is achieved using similar protein domains. The analysis of individual protein complex structures often gives answers that are not easily applicable to other members of the same PRM family. Bioinformatics-based approaches, one the other hand, may be difficult to interpret physically. Here we integrate structural information with a large, quantitative data set of SH2-peptide interactions to study the physical origin of domain-peptide specificity. We develop an energy model, inspired by protein folding, based on interactions between the amino acid positions in the domain and peptide. We use this model to successfully predict which SH2 domains and peptides interact and uncover the positions in each that are important for specificity. The energy model is general enough that it can be applied to other members of the SH2 family or to new peptides, and the cross-validation results suggest that these energy calculations will be useful for predicting binding interactions. It can also be adapted to study other PRM families, predict optimal peptides for a given SH2 domain, or study other biological interactions, e.g. protein-DNA interactions
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Machine learning-guided directed evolution for protein engineering
Machine learning (ML)-guided directed evolution is a new paradigm for
biological design that enables optimization of complex functions. ML methods
use data to predict how sequence maps to function without requiring a detailed
model of the underlying physics or biological pathways. To demonstrate
ML-guided directed evolution, we introduce the steps required to build ML
sequence-function models and use them to guide engineering, making
recommendations at each stage. This review covers basic concepts relevant to
using ML for protein engineering as well as the current literature and
applications of this new engineering paradigm. ML methods accelerate directed
evolution by learning from information contained in all measured variants and
using that information to select sequences that are likely to be improved. We
then provide two case studies that demonstrate the ML-guided directed evolution
process. We also look to future opportunities where ML will enable discovery of
new protein functions and uncover the relationship between protein sequence and
function.Comment: Made significant revisions to focus on aspects most relevant to
applying machine learning to speed up directed evolutio
Prediction of protein binding sites in protein structures using hidden Markov support vector machine
<p>Abstract</p> <p>Background</p> <p>Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance.</p> <p>Results</p> <p>In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods.</p> <p>Conclusion</p> <p>The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.</p
PUEPro : A Computational Pipeline for Prediction of Urine Excretory Proteins
This work is supported by the National Natural Science Foundation of China (Grant Nos. 81320108025, 61402194, 61572227), Development Project of Jilin Province of China (20140101180JC) and China Postdoctoral Science Foundation (2014T70291).Postprin
- …