189 research outputs found

    Proteome scanning to predict PDZ domain interactions using support vector machines

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>PDZ domains mediate protein-protein interactions involved in important biological processes through the recognition of short linear motifs in their target proteins. Two recent independent studies have used protein microarray or phage display technology to detect PDZ domain interactions with peptide ligands on a large scale. Several computational predictors of PDZ domain interactions have been developed, however they are trained using only protein microarray data and focus on limited subsets of PDZ domains. An accurate predictor of genomic PDZ domain interactions would allow the proteomes of organisms to be scanned for potential binders. Such an application would require an accurate and precise predictor to avoid generating too many false positive hits given the large amount of possible interactors in a given proteome. Once validated these predictions will help to increase the coverage of current PDZ domain interaction networks and further our understanding of the roles that PDZ domains play in a variety of biological processes.</p> <p>Results</p> <p>We developed a PDZ domain interaction predictor using a support vector machine (SVM) trained with both protein microarray and phage display data. In order to use the phage display data for training, which only contains positive interactions, we developed a method to generate artificial negative interactions. Using cross-validation and a series of independent tests, we showed that our SVM successfully predicts interactions in different organisms. We then used the SVM to scan the proteomes of human, worm and fly to predict binders for several PDZ domains. Predictions were validated using known genomic interactions and published protein microarray experiments. Based on our results, new protein interactions potentially associated with Usher and Bardet-Biedl syndromes were predicted. A comparison of performance measures (F1 measure and FPR) for the SVM and published predictors demonstrated our SVM's improved accuracy and precision at proteome scanning.</p> <p>Conclusions</p> <p>We built an SVM using mouse and human experimental training data to predict PDZ domain interactions. We showed that it correctly predicts known interactions from proteomes of different organisms and is more accurate and precise at proteome scanning compared with published state-of-the-art predictors.</p

    Predicting PDZ domain mediated protein interactions from structure

    Get PDF
    BACKGROUND: PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors. RESULTS: We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training–testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling. CONCLUSIONS: We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on training–testing domain sequence similarity. Using both predictors, we defined a functional map of human PDZ domain biology and predict novel PDZ domain function. Users may access our structure-based and previous sequence-based predictors at http://webservice.baderlab.org/domains/POW

    The multiple-specificity landscape of modular peptide recognition domains

    Get PDF
    Using large scale experimental datasets, the authors show how modular protein interaction domains such as PDZ, SH3 or WW domains, frequently display unexpected multiple binding specificity. The observed multiple specificity leads to new structural insights and accurately predicts new protein interactions

    The identification of short linear motif-mediated interfaces within the human interactome

    Get PDF
    Motivation: Eukaryotic proteins are highly modular, containing multiple interaction interfaces that mediate binding to a network of regulators and effectors. Recent advances in high-throughput proteomics have rapidly expanded the number of known protein–protein interactions (PPIs); however, the molecular basis for the majority of these interactions remains to be elucidated. There has been a growing appreciation of the importance of a subset of these PPIs, namely those mediated by short linear motifs (SLiMs), particularly the canonical and ubiquitous SH2, SH3 and PDZ domain-binding motifs. However, these motif classes represent only a small fraction of known SLiMs and outside these examples little effort has been made, either bioinformatically or experimentally, to discover the full complement of motif instances

    Putting into Practice Domain-Linear Motif Interaction Predictions for Exploration of Protein Networks

    Get PDF
    PDZ domains recognise short sequence motifs at the extreme C-termini of proteins. A model based on microarray data has been recently published for predicting the binding preferences of PDZ domains to five residue long C-terminal sequences. Here we investigated the potential of this predictor for discovering novel protein interactions that involve PDZ domains. When tested on real negative data assembled from published literature, the predictor displayed a high false positive rate (FPR). We predicted and experimentally validated interactions between four PDZ domains derived from the human proteins MAGI1 and SCRIB and 19 peptides derived from human and viral C-termini of proteins. Measured binding intensities did not correlate with prediction scores, and the high FPR of the predictor was confirmed. Results indicate that limitations of the predictor may arise from an incomplete model definition and improper training of the model. Taking into account these limitations, we identified several novel putative interactions between PDZ domains of MAGI1 and SCRIB and the C-termini of the proteins FZD4, ARHGAP6, NET1, TANC1, GLUT7, MARCH3, MAS, ABC1, DLL1, TMEM215 and CYSLTR2. These proteins are localised to the membrane or suggested to act close to it and are often involved in G protein signalling. Furthermore, we showed that, while extension of minimal interacting domains or peptides toward tandem constructs or longer peptides never suppressed their ability to interact, the measured affinities and inferred specificity patterns often changed significantly. This suggests that if protein fragments interact, the full length proteins are also likely to interact, albeit possibly with altered affinities and specificities. Therefore, predictors dealing with protein fragments are promising tools for discovering protein interaction networks but their application to predict binding preferences within networks may be limited

    Machine Learning based Protein Sequence to (un)Structure Mapping and Interaction Prediction

    Get PDF
    Proteins are the fundamental macromolecules within a cell that carry out most of the biological functions. The computational study of protein structure and its functions, using machine learning and data analytics, is elemental in advancing the life-science research due to the fast-growing biological data and the extensive complexities involved in their analyses towards discovering meaningful insights. Mapping of protein’s primary sequence is not only limited to its structure, we extend that to its disordered component known as Intrinsically Disordered Proteins or Regions in proteins (IDPs/IDRs), and hence the involved dynamics, which help us explain complex interaction within a cell that is otherwise obscured. The objective of this dissertation is to develop machine learning based effective tools to predict disordered protein, its properties and dynamics, and interaction paradigm by systematically mining and analyzing large-scale biological data. In this dissertation, we propose a robust framework to predict disordered proteins given only sequence information, using an optimized SVM with RBF kernel. Through appropriate reasoning, we highlight the structure-like behavior of IDPs in disease-associated complexes. Further, we develop a fast and effective predictor of Accessible Surface Area (ASA) of protein residues, a useful structural property that defines protein’s exposure to partners, using regularized regression with 3rd-degree polynomial kernel function and genetic algorithm. As a key outcome of this research, we then introduce a novel method to extract position specific energy (PSEE) of protein residues by modeling the pairwise thermodynamic interactions and hydrophobic effect. PSEE is found to be an effective feature in identifying the enthalpy-gain of the folded state of a protein and otherwise the neutral state of the unstructured proteins. Moreover, we study the peptide-protein transient interactions that involve the induced folding of short peptides through disorder-to-order conformational changes to bind to an appropriate partner. A suite of predictors is developed to identify the residue-patterns of Peptide-Recognition Domains from protein sequence that can recognize and bind to the peptide-motifs and phospho-peptides with post-translational-modifications (PTMs) of amino acid, responsible for critical human diseases, using the stacked generalization ensemble technique. The involved biologically relevant case-studies demonstrate possibilities of discovering new knowledge using the developed tools

    Molecular evolution of the LNX gene family

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>LNX (Ligand of Numb Protein-X) proteins typically contain an amino-terminal RING domain adjacent to either two or four PDZ domains - a domain architecture that is unique to the LNX family. LNX proteins function as E3 ubiquitin ligases and their domain organisation suggests that their ubiquitin ligase activity may be targeted to specific substrates or subcellular locations by PDZ domain-mediated interactions. Indeed, numerous interaction partners for LNX proteins have been identified, but the <it>in vivo </it>functions of most family members remain largely unclear.</p> <p>Results</p> <p>To gain insights into their function we examined the phylogenetic origins and evolution of the <it>LNX </it>gene family. We find that a <it>LNX1/LNX2</it>-like gene arose in an early metazoan lineage by gene duplication and fusion events that combined a RING domain with four PDZ domains. These PDZ domains are closely related to the four carboxy-terminal domains from multiple PDZ domain containing protein-1 (MUPP1). Duplication of the <it>LNX1/LNX2</it>-like gene and subsequent loss of PDZ domains appears to have generated a gene encoding a LNX3/LNX4-like protein, with just two PDZ domains. This protein has novel carboxy-terminal sequences that include a potential modular LNX3 homology domain. The two ancestral <it>LNX </it>genes are present in some, but not all, invertebrate lineages. They were, however, maintained in the vertebrate lineage, with further duplication events giving rise to five LNX family members in most mammals. In addition, we identify novel interactions of LNX1 and LNX2 with three known MUPP1 ligands using yeast two-hybrid asssays. This demonstrates conservation of binding specificity between LNX and MUPP1 PDZ domains.</p> <p>Conclusions</p> <p>The <it>LNX </it>gene family has an early metazoan origin with a LNX1/LNX2-like protein likely giving rise to a LNX3/LNX4-like protein through the loss of PDZ domains. The absence of LNX orthologs in some lineages indicates that LNX proteins are not essential in invertebrates. In contrast, the maintenance of both ancestral <it>LNX </it>genes in the vertebrate lineage suggests the acquisition of essential vertebrate specific functions. The revelation that the LNX PDZ domains are phylogenetically related to domains in MUPP1, and have common binding specificities, suggests that LNX and MUPP1 may have similarities in their cellular functions.</p

    Using Text Mining of PubMed Abstracts As An Evidence Source in Computational Predictions of WW Domain-Mediated Protein-Protein Interactions

    Get PDF
    Protein-protein interactions (PPIs) are a key regulatory mechanism in coordinating a multitude of processes vital to normal cellular function. There exist a number of wet-lab small-scale and high-throughput methods for accurately identifying PPIs; however, despite their accuracy, these methods are expensive both in terms of time and finances. Complementing experimental methods with computational predictions increases the effectiveness of wet-lab small scale methodologies in identifying high quality protein interaction networks. Computational predictions are made by applying bioinformatics and machine-learning algorithms to large-scale training sets obtained from wet-lab experiments, or by extracting information on PPIs from high volumes of published data that do not directly identify protein interactions but are nonetheless correlated with them. A disadvantage of computational predictions is their high degree of inaccuracy, namely too many false positives and false negatives. To improve the accuracy of computational predictions, it is important to consider interactions that are likely to occur in vivo under certain biological conditions, termed context. One technique for improving prediction accuracy is analyzing data obtained via different types of experiments that consider different features of the co-occurring proteins, such as co-localization, co-expression, correlated mutations, or semantic similarity. These experimental sources and their resulting data are called sources of evidence. Integrating data from multiple independent supporting evidence sources improves prediction accuracy. In this work, I used text mining of PubMed abstracts as an evidence source for protein interactions. I hypothesized that proteins whose names are frequently mentioned in the same abstract are more likely to interact in vivo compared to randomly chosen proteins. A comparison of three text mining techniques gene name co-occurrence, MeSH term indexing, and co-occurrence with a controlled vocabulary shows that co-occurrence with a controlled vocabulary yields the highest precision and recall. I concluded that gene name co-occurrence with a controlled vocabulary can, therefore, be used as a novel evidence source for prediction of WW domain-mediated PPIs

    A novel structure-based encoding for machine-learning applied to the inference of SH3 domain specificity

    Get PDF
    MOTIVATION: Unravelling the rules underlying protein-protein and protein-ligand interactions is a crucial step in understanding cell machinery. Peptide recognition modules (PRMs) are globular protein domains which focus their binding targets on short protein sequences and play a key role in the frame of protein-protein interactions. High-throughput techniques permit the whole proteome scanning of each domain, but they are characterized by a high incidence of false positives. In this context, there is a pressing need for the development of in silico experiments to validate experimental results and of computational tools for the inference of domain-peptide interactions. RESULTS: We focused on the SH3 domain family and developed a machine-learning approach for inferring interaction specificity. SH3 domains are well-studied PRMs which typically bind proline-rich short sequences characterized by the PxxP consensus. The binding information is known to be held in the conformation of the domain surface and in the short sequence of the peptide. Our method relies on interaction data from high-throughput techniques and benefits from the integration of sequence and structure data of the interacting partners. Here, we propose a novel encoding technique aimed at representing binding information on the basis of the domain-peptide contact residues in complexes of known structure. Remarkably, the new encoding requires few variables to represent an interaction, thus avoiding the 'curse of dimension'. Our results display an accuracy >90% in detecting new binders of known SH3 domains, thus outperforming neural models on standard binary encodings, profile methods and recent statistical predictors. The method, moreover, shows a generalization capability, inferring specificity of unknown SH3 domains displaying some degree of similarity with the known data
    corecore