Search CORE

7 research outputs found

SPRINT: Ultrafast protein-protein interaction prediction of the entire human interactome

Author: Ilie Lucian
Li Yiwei
Publication venue
Publication date: 18/05/2017
Field of study

Proteins perform their functions usually by interacting with other proteins. Predicting which proteins interact is a fundamental problem. Experimental methods are slow, expensive, and have a high rate of error. Many computational methods have been proposed among which sequence-based ones are very promising. However, so far no such method is able to predict effectively the entire human interactome: they require too much time or memory. We present SPRINT (Scoring PRotein INTeractions), a new sequence-based algorithm and tool for predicting protein-protein interactions. We comprehensively compare SPRINT with state-of-the-art programs on seven most reliable human PPI datasets and show that it is more accurate while running orders of magnitude faster and using very little memory. SPRINT is the only program that can predict the entire human interactome. Our goal is to transform the very challenging problem of predicting the entire human interactome into a routine task. The source code of SPRINT is freely available from github.com/lucian-ilie/SPRINT/ and the datasets and predicted PPIs from www.csd.uwo.ca/faculty/ilie/SPRINT/

arXiv.org e-Print Archive

Directory of Open Access Journals

Issues in performance evaluation for host–pathogen protein interaction prediction

Author: Abbasi Wajid Arshad
Minhas Fayyaz ul Amir Afsar
Publication venue: World Scientific Publishing
Publication date: 01/06/2016
Field of study

The study of interactions between host and pathogen proteins is important for understanding the underlying mechanisms of infectious diseases and for developing novel therapeutic solutions. Wet-lab techniques for detecting protein–protein interactions (PPIs) can benefit from computational predictions. Machine learning is one of the computational approaches that can assist biologists by predicting promising PPIs. A number of machine learning based methods for predicting host–pathogen interactions (HPI) have been proposed in the literature. The techniques used for assessing the accuracy of such predictors are of critical importance in this domain. In this paper, we question the effectiveness of K-fold cross-validation for estimating the generalization ability of HPI prediction for proteins with no known interactions. K-fold cross-validation does not model this scenario, and we demonstrate a sizable difference between its performance and the performance of an alternative evaluation scheme called leave one pathogen protein out (LOPO) cross-validation. LOPO is more effective in modeling the real world use of HPI predictors, specifically for cases in which no information about the interacting partners of a pathogen protein is available during training. We also point out that currently used metrics such as areas under the precision-recall or receiver operating characteristic curves are not intuitive to biologists and propose simpler and more directly interpretable metrics for this purpose

Crossref

Warwick Research Archives Portal Repository

ProfPPIdb: Pairs of physical protein-protein interactions predicted for entire proteomes

Author: Hamp T
Rost B
Tran L
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Motivation Protein-protein interactions (PPIs) play a key role in many cellular processes. Most annotations of PPIs mix experimental and computational data. The mix optimizes coverage, but obfuscates the annotation origin. Some resources excel at focusing on reliable experimental data. Here, we focused on new pairs of interacting proteins for several model organisms based solely on sequence-based prediction methods. Results We extracted reliable experimental data about which proteins interact (binary) for eight diverse model organisms from public databases, namely from Escherichia coli, Schizosaccharomyces pombe, Plasmodium falciparum, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, Rattus norvegicus, Arabidopsis thaliana, and for the previously used Homo sapiens and Saccharomyces cerevisiae. Those data were the base to develop a PPI prediction method for each model organism. The method used evolutionary information through a profile-kernel Support Vector Machine (SVM). With the resulting eight models, we predicted all possible protein pairs in each organism and made the top predictions available through a web application. Almost all of the PPIs made available were predicted between proteins that have not been observed in any interaction, in particular for less well-studied organisms. Thus, our work complements existing resources and is particularly helpful for designing experiments because of its uniqueness. Experimental annotations and computational predictions are strongly influenced by the fact that some proteins have many partners and others few. To optimize machine learning, recent methods explicitly ignored such a network-structure and rely either on domain knowledge or sequence-only methods. Our approach is independent of domain-knowledge and leverages evolutionary information. The database interface representing our results is accessible from https://rostlab.org/services/ppipair/. The data can also be downloaded from https://figshare.com/collections/ProfPPI-DB/4141784

Directory of Open Access Journals

Spiral - Imperial College Digital Repository

FigShare

Improved K-mer Based Prediction of Protein-Protein Interactions With Chaos Game Representation, Deep Learning and Reduced Representation Bias

Author: MacLean Dan
Veevers Ruth
Publication venue
Publication date: 23/10/2023
Field of study

Protein-protein interactions drive many biological processes, including the detection of phytopathogens by plants' R-Proteins and cell surface receptors. Many machine learning studies have attempted to predict protein-protein interactions but performance is highly dependent on training data; models have been shown to accurately predict interactions when the proteins involved are included in the training data, but achieve consistently poorer results when applied to previously unseen proteins. In addition, models that are trained using proteins that take part in multiple interactions can suffer from representation bias, where predictions are driven not by learned biological features but by learning of the structure of the interaction dataset. We present a method for extracting unique pairs from an interaction dataset, generating non-redundant paired data for unbiased machine learning. After applying the method to datasets containing _Arabidopsis thaliana_ and pathogen effector interations, we developed a convolutional neural network model capable of learning and predicting interactions from Chaos Game Representations of proteins' coding genes

arXiv.org e-Print Archive

Recommended from our members

FunFam protein families improve residue level molecular function prediction

Author: Littmann M
Orengo C
Rost B
Scheibenreif L
Publication venue
Publication date: 01/01/2019
Field of study

BACKGROUND: The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. RESULTS: FunFam members agreed, on average, in 36.9 ± 0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding residue prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8 ± 0.4% for a stringent threshold. CONCLUSIONS: The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level

Columbia University Academic Commons

UCL Discovery

IDPpi:Protein-protein interaction analyses of human intrinsically disordered proteins

Author: Marsh Lindsey A.
Perovic Vladimir
Radovanovic Sandro
Roberts Stefan G.E.
Sumonja Neven
Veljkovic Nevena
Vukicevic Milan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Intrinsically disordered proteins (IDPs) are characterized by the lack of a fixed tertiary structure and are involved in the regulation of key biological processes via binding to multiple protein partners. IDPs are malleable, adapting to structurally different partners, and this flexibility stems from features encoded in the primary structure. The assumption that universal sequence information will facilitate coverage of the sparse zones of the human interactome motivated us to explore the possibility of predicting protein-protein interactions (PPIs) that involve IDPs based on sequence characteristics. We developed a method that relies on features of the interacting and non-interacting protein pairs and utilizes machine learning to classify and predict IDP PPIs. Consideration of both sequence determinants specific for conformational organizations and the multiplicity of IDP interactions in the training phase ensured a reliable approach that is superior to current state-of-the-art methods. By applying a strict evaluation procedure, we confirm that our method predicts interactions of the IDP of interest even on the proteome-scale. This service is provided as a web tool to expedite the discovery of new interactions and IDP functions with enhanced efficiency. © 2018 The Author(s)

Repository of the Vinča Nuclear Institute (VinaR)

Explore Bristol Research

In silico Prediction and Validations of Domains Involved in Gossypium hirsutum SnRK1 Protein Interaction With Cotton Leaf Curl Multan Betasatellite Encoded βC1

Author: Diwaker Tripathi
Fayyaz-ul-Amir Afsar Minhas
Hanu R. Pappu
Hira Kamal
Hira Kamal
Hira Kamal
Imran Amin
Muhammad Farooq
Muhammad Hamza
Muhammad Zuhaib Khan
Roma Mustafa
Shahid Mansoor
Publication venue: 'Frontiers Media SA'
Publication date: 01/05/2019
Field of study

Cotton leaf curl disease (CLCuD) caused by viruses of genus Begomovirus is a major constraint to cotton (Gossypium hirsutum) production in many cotton-growing regions of the world. Symptoms of the disease are caused by Cotton leaf curl Multan betasatellite (CLCuMB) that encodes a pathogenicity determinant protein, βC1. Here, we report the identification of interacting regions in βC1 protein by using computational approaches including sequence recognition, and binding site and interface prediction methods. We show the domain-level interactions based on the structural analysis of G. hirsutum SnRK1 protein and its domains with CLCuMB-βC1. To verify and validate the in silico predictions, three different experimental approaches, yeast two hybrid, bimolecular fluorescence complementation and pull down assay were used. Our results showed that ubiquitin-associated domain (UBA) and autoinhibitory sequence (AIS) domains of G. hirsutum-encoded SnRK1 are involved in CLCuMB-βC1 interaction. This is the first comprehensive investigation that combined in silico interaction prediction followed by experimental validation of interaction between CLCuMB-βC1 and a host protein. We demonstrated that data from computational biology could provide binding site information between CLCuD-associated viruses/satellites and new hosts that lack known binding site information for protein–protein interaction studies. Implications of these findings are discussed

Directory of Open Access Journals

Warwick Research Archives Portal Repository