4 research outputs found

    Predicting RNA-binding residues from evolutionary information and sequence conservation

    Get PDF
    Abstract Background RNA-binding proteins (RBPs) play crucial roles in post-transcriptional control of RNA. RBPs are designed to efficiently recognize specific RNA sequences after it is derived from the DNA sequence. To satisfy diverse functional requirements, RNA binding proteins are composed of multiple blocks of RNA-binding domains (RBDs) presented in various structural arrangements to provide versatile functions. The ability to computationally predict RNA-binding residues in a RNA-binding protein can help biologists reveal important site-directed mutagenesis in wet-lab experiments. Results The proposed prediction framework named “ProteRNA” combines a SVM-based classifier with conserved residue discovery by WildSpan to identify the residues that interact with RNA in a RNA-binding protein. Although these conserved residues can be either functionally conserved residues or structurally conserved residues, they provide clues on the important residues in a protein sequence. In the independent testing dataset, ProteRNA has been able to deliver overall accuracy of 89.78%, MCC of 0.2628, F-score of 0.3075, and F0.5-score of 0.3546. Conclusions This article presents the design of a sequence-based predictor aiming to identify the RNA-binding residues in a RNA-binding protein by combining machine learning and pattern mining approaches. RNA-binding proteins have diverse functions while interacting with different categories of RNAs because these proteins are composed of multiple copies of RNA-binding domains presented in various structural arrangements to expand the functional repertoire of RNA-binding proteins. Furthermore, predicting RNA-binding residues in a RNA-binding protein can help biologists reveal important site-directed mutagenesis in wet-lab experiments.</p

    Predicting RNA-Protein Interactions Using Only Sequence Information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions.</p> <p>Results</p> <p>We propose <b><it>RPISeq</it></b>, a family of classifiers for predicting <b><it>R</it></b>NA-<b><it>p</it></b>rotein <b><it>i</it></b>nteractions using only <b><it>seq</it></b>uence information. Given the sequences of an RNA and a protein as input, <it>RPIseq </it>predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of <it>RPISeq </it>are presented: <it>RPISeq-SVM</it>, which uses a Support Vector Machine (SVM) classifier and <it>RPISeq-RF</it>, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB), <it>RPISeq </it>achieved an AUC (Area Under the Receiver Operating Characteristic (ROC) curve) of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of <it>RPISeq </it>was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations) of the putative RNA and protein partners. In addition, <it>RPISeq </it>classifiers trained using the PRIDB data correctly predicted the majority (57-99%) of non-coding RNA-protein interactions in NPInter-derived networks from <it>E. coli, S. cerevisiae, D. melanogaster, M. musculus</it>, and <it>H. sapiens</it>.</p> <p>Conclusions</p> <p>Our experiments with <it>RPISeq </it>demonstrate that RNA-protein interactions can be reliably predicted using only sequence-derived information. <it>RPISeq </it>offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs. <it>RPISeq </it>is freely available as a web-based server at <url>http://pridb.gdcb.iastate.edu/RPISeq/.</url></p

    Computational prediction of RNA-protein interaction partners and interfaces

    Get PDF
    RNA-protein interactions play important roles in fundamental cellular processes involved in human diseases, viral replication and defense against pathogens in plants, animals and microbes. However, the detailed recognition mechanisms underlying these interactions are poorly understood. To gain a better understanding of the molecular recognition code for RNA-protein interactions, this dissertation has three related goals: i) to develop methods for predicting RNA-protein interaction partners; ii) to develop an approach for predicting interfacial residues in both the RNA and protein components of RNA-protein complexes; and iii) to develop computational tools and resources for investigating RNA-protein interactions. First, we present machine learning classifiers for predicting RNA-protein interaction partners. The classifiers use the amino acid composition of proteins and the ribonucleotide composition of RNAs as input to predict whether a given RNA-protein pair interacts. We show that protein and RNA sequences alone (i.e., in the absence of any structural information) contain enough signal to allow reliable prediction of interaction partners. Second, we present RPISeq, a webserver that predicts the interaction probabilities of input RNA-protein pairs, using the above-mentioned machine learning classifiers. A comprehensive database of RNA-protein interactions, RPIntDB, is integrated with the webserver to allow users to search for homologous proteins and their known interacting RNA partners. Finally, we perform an analysis of contiguous interfacial amino acids and ribonucleotides in RNA-protein complexes for which structures are known. We generate a dataset of bipartite RNA-protein motifs that can be used to predict interfacial residues in both the RNA and protein sequences of a given RNA-protein pair simultaneously. We show that taking binding partner information into account leads to higher precision in the prediction of RNA-binding residues in proteins. Taken together, these studies have increased our understanding of how RNA and proteins interact
    corecore