9,688 research outputs found

    Binding Site Prediction for Protein-Protein Interactions and Novel Motif Discovery using Re-occurring Polypeptide Sequences

    Get PDF
    Background: While there are many methods for predicting protein-protein interaction, very few can determine the specific site of interaction on each protein. Characterization of the specific sequence regions mediating interaction (binding sites) is crucial for an understanding of cellular pathways. Experimental methods often report false binding sites due to experimental limitations, while computational methods tend to require data which is not available at the proteome-scale. Here we present PIPE-Sites, a novel method of protein specific binding site prediction based on pairs of re-occurring polypeptide sequences, which have been previously shown to accurately predict proteinprotein interactions. PIPE-Sites operates at high specificity and requires only the sequences of query proteins and a database of known binary interactions with no binding site data, making it applicable to binding site prediction at the proteome-scale. Results: PIPE-Sites was evaluated using a dataset of 265 yeast and 423 human interacting proteins pairs with experimentally-determined binding sites. We found that PIPE-Sites predictions were closer to the confirmed binding site than those of two existing binding site prediction methods based on domain-domain interactions, when applied to the same dataset. Finally, we applied PIPE-Sites to two datasets of 2347 yeast and 14,438 human novel interacting protein pairs predicted to interact with high confidence. An analysis of the predicted interaction sites revealed a number of protein subsequences which are highly re-occurring in binding sites and which may represent novel binding motifs. Conclusions: PIPE-Sites is an accurate method for predicting protein binding sites and is applicable to the proteome-scale. Thus, PIPE-Sites could be useful for exhaustive analysis of protein binding patterns in whole proteomes as well as discovery of novel binding motifs. PIPE-Sites is available online a

    Transcriptional Regulation of Cell-type Specific Expression in the Arabidopsis Root

    Get PDF
    Characterizing transcription factor interactions with their corresponding binding sites is crucial for understanding how gene expression is regulated by DNA sequence. A more comprehensive understanding of this process could have benefits in synthetic promoter design and creation of genetically modified organisms. Herein, the promoters of genes exhibiting cell-type specific expression within a single layer of the Arabidopsis root are analyzed to identify cis-regulatory motifs implicated in cell-type specific expression. De novo motif prediction identifies multiple motif candidates overly represented in the promoter sequences of co-expressed genes specific for epidermal, cortex, and endodermal expression. Several endodermal specific putative motifs are further analyzed for positional biases and tested in planta. A priori mapping of known cis-regulatory motifs catalogued in publicly available databases is also performed. Results show that cell-types contain different statistically significant enrichment patterns of both predicted and known cis-regulatory motifs. These results will help future research in designing cell-type specific synthetic promoters

    Detecting Remote Sequence Homology in Disordered Proteins: Discovery of Conserved Motifs in the N-Termini of Mononegavirales phosphoproteins

    Get PDF
    Paramyxovirinae are a large group of viruses that includes measles virus and parainfluenza viruses. The viral Phosphoprotein (P) plays a central role in viral replication. It is composed of a highly variable, disordered N-terminus and a conserved C-terminus. A second viral protein alternatively expressed, the V protein, also contains the N-terminus of P, fused to a zinc finger. We suspected that, despite their high variability, the N-termini of P/V might all be homologous; however, using standard approaches, we could previously identify sequence conservation only in some Paramyxovirinae. We now compared the N-termini using sensitive sequence similarity search programs, able to detect residual similarities unnoticeable by conventional approaches. We discovered that all Paramyxovirinae share a short sequence motif in their first 40 amino acids, which we called soyuz1. Despite its short length (11–16aa), several arguments allow us to conclude that soyuz1 probably evolved by homologous descent, unlike linear motifs. Conservation across such evolutionary distances suggests that soyuz1 plays a crucial role and experimental data suggest that it binds the viral nucleoprotein to prevent its illegitimate self-assembly. In some Paramyxovirinae, the N-terminus of P/V contains a second motif, soyuz2, which might play a role in blocking interferon signaling. Finally, we discovered that the P of related Mononegavirales contain similarly overlooked motifs in their N-termini, and that their C-termini share a previously unnoticed structural similarity suggesting a common origin. Our results suggest several testable hypotheses regarding the replication of Mononegavirales and suggest that disordered regions with little overall sequence similarity, common in viral and eukaryotic proteins, might contain currently overlooked motifs (intermediate in length between linear motifs and disordered domains) that could be detected simply by comparing orthologous proteins

    Flexibility and small pockets at protein-protein interfaces: New insights into druggability.

    Get PDF
    The transient assembly of multiprotein complexes mediates many aspects of cell regulation and signalling in living organisms. Modulation of the formation of these complexes through targeting protein-protein interfaces can offer greater selectivity than the inhibition of protein kinases, proteases or other post-translational regulatory enzymes using substrate, co-factor or transition state mimetics. However, capitalising on protein-protein interaction interfaces as drug targets has been hindered by the nature of interfaces that tend to offer binding sites lacking the well-defined large cavities of classical drug targets. In this review we posit that interfaces formed by concerted folding and binding (disorder-to-order transitions on binding) of one partner and other examples of interfaces where a protein partner is bound through a continuous epitope from a surface-exposed helix, flexible loop or chain extension may be more tractable for the development of "orthosteric", competitive chemical modulators; these interfaces tend to offer small-volume but deep pockets and/or larger grooves that may be bound tightly by small chemical entities. We discuss examples of such protein-protein interaction interfaces for which successful chemical modulators are being developed.We thank our colleagues Alicia Higueruelo, Douglas Pires, Bernardo Ochoa and Chris Radoux for helpful comments and discussions. D.B.A is the recipient of a C. J. Martin Research Fellowship from the National Health and Medical Research Council of Australia (APP1072476). H.J. is supported by a CASE Studentship from the UCB and the Biotechnology and Biological Sciences Research Council (BBSRC) (Grant: BB/J500574/1). T.L.B. receives funding from University of Cambridge and The Wellcome Trust for facilities and support.This is the accepted manuscript of a paper published in Progress in Biophysics and Molecular Biology (Jubb H, Blundell TL, Ascher DB, Progress in Biophysics and Molecular Biology 2015, doi:10.1016/j.pbiomolbio.2015.01.009). The final version is available at http://dx.doi.org/10.1016/j.pbiomolbio.2015.01.009

    Structural Annotation of Mycobacterium tuberculosis Proteome

    Get PDF
    Of the ∼4000 ORFs identified through the genome sequence of Mycobacterium tuberculosis (TB) H37Rv, experimentally determined structures are available for 312. Since knowledge of protein structures is essential to obtain a high-resolution understanding of the underlying biology, we seek to obtain a structural annotation for the genome, using computational methods. Structural models were obtained and validated for ∼2877 ORFs, covering ∼70% of the genome. Functional annotation of each protein was based on fold-based functional assignments and a novel binding site based ligand association. New algorithms for binding site detection and genome scale binding site comparison at the structural level, recently reported from the laboratory, were utilized. Besides these, the annotation covers detection of various sequence and sub-structural motifs and quaternary structure predictions based on the corresponding templates. The study provides an opportunity to obtain a global perspective of the fold distribution in the genome. The annotation indicates that cellular metabolism can be achieved with only 219 folds. New insights about the folds that predominate in the genome, as well as the fold-combinations that make up multi-domain proteins are also obtained. 1728 binding pockets have been associated with ligands through binding site identification and sub-structure similarity analyses. The resource (http://proline.physics.iisc.ernet.in/Tbstructuralannotation), being one of the first to be based on structure-derived functional annotations at a genome scale, is expected to be useful for better understanding of TB and for application in drug discovery. The reported annotation pipeline is fairly generic and can be applied to other genomes as well

    Alien Domains Shaped the Modular Structure of Plant NLR Proteins

    Get PDF
    Plant innate immunity mostly relies on nucleotide-binding (NB) and leucine-rich repeat (LRR) intracellular receptors to detect pathogen-derived molecules and to induce defense responses. A multitaxa reconstruction of NB-domain associations allowed us to identify the first NB-LRR arrangement in the Chlorophyta division of the Viridiplantae. Our analysis points out that the basic NOD-like receptor (NLR) unit emerged in Chlorophytes by horizontal transfer and its diversification started from Toll/interleukin receptor-NB-LRR members. The operon-based genomic structure of Chromochloris zofingiensis NLR copies suggests a functional origin of NLR clusters. Moreover, the transmembrane signatures of NLR proteins in the unicellular alga C. zofingiensis support the hypothesis that the NLR-based immunity system of plants derives from a cell-surface surveillance system. Taken together, our findings suggest that NLRs originated in unicellular algae and may have a common origin with cell-surface LRR receptors

    Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts

    Get PDF
    Over the last decades a revolution in novel measurement techniques has permeated the biological sciences filling the databases with unprecedented amounts of data ranging from genomics, transcriptomics, proteomics and metabolomics to structural and ecological data. In order to extract insights from the vast quantity of data, computational and statistical methods are nowadays crucial tools in the toolbox of every biological researcher. In this thesis I summarize my contributions in two data-rich fields in biological sciences: transcription factor binding to DNA and protein structure prediction from protein sequences with shared evolutionary ancestry. In the first part of my thesis I introduce our work towards a web server for analysing transcription factor binding data with Bayesian Markov Models. In contrast to classical PWM or di-nucleotide models, Bayesian Markov models can capture complex inter-nucleotide dependencies that can arise from shape-readout and alternative binding modes. In addition to giving access to our methods in an easy-to-use, intuitive web-interface, we provide our users with novel tools and visualizations to better evaluate the biological relevance of the inferred binding motifs. We hope that our tools will prove useful for investigating weak and complex transcription factor binding motifs which cannot be predicted accurately with existing tools. The second part discusses a statistical attempt to correct out the phylogenetic bias arising in co-evolution methods applied to the contact prediction problem. Co-evolution methods have revolutionized the protein-structure prediction field more than 10 years ago, and, until very recently, have retained their importance as crucial input features to deep neural networks. As the co-evolution information is extracted from evolutionarily related sequences, we investigated whether the phylogenetic bias to the signal can be corrected out in a principled way using a variation of the Felsenstein's tree-pruning algorithm applied in combination with an independent-pair assumption to derive pairwise amino counts that are corrected for the evolutionary history. Unfortunately, the contact prediction derived from our corrected pairwise amino acid counts did not yield a competitive performance.2021-09-2

    Multicofactor proteins: structure, prediction, function

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Discovery and Analysis of Aligned Pattern Clusters from Protein Family Sequences

    Get PDF
    Protein sequences are essential for encoding molecular structures and functions. Consequently, biologists invest substantial resources and time discovering functional patterns in proteins. Using high-throughput technologies, biologists are generating an increasing amount of data. Thus, the major challenge in biosequencing today is the ability to conduct data analysis in an effi cient and productive manner. Conserved amino acids in proteins reveal important functional domains within protein families. Conversely, less conserved amino acid variations within these protein sequence patterns reveal areas of evolutionary and functional divergence. Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search is used. However, at present, combinatorial methods of pattern search generate a large set of solutions, and probabilistic methods require richer representations. They require biological ground truth of the input sequences, such as gene name or taxonomic species, as class labels based on traditional classi fication practice to train a model for predicting unknown sequences. However, these algorithms are inherently biased by mislabelling and may not be able to reveal class characteristics in a detailed and succinct manner. A novel pattern representation called an Aligned Pattern Cluster (AP Cluster) as developed in this dissertation is compact yet rich. It captures conservations and variations of amino acids and covers more sequences with lower entropy and greatly reduces the number of patterns. AP Clusters contain statistically signi cant patterns with variations; their importance has been confi rmed by the following biological evidences: 1) Most of the discovered AP Clusters correspond to binding segments while their aligned columns correspond to binding sites as verifi ed by pFam, PROSITE, and the three-dimensional structure. 2) By compacting strong correlated functional information together, AP Clusters are able to reveal class characteristics for taxonomical classes, gene classes and other functional classes, or incorrect class labelling. 3) Co-occurrence of AP Clusters on the same homologous protein sequences are spatially close in the protein's three-dimensional structure. These results demonstrate the power and usefulness of AP Clusters. They bring in similar statistically signifi cance patterns with variation together and align them to reveal protein regional functionality, class characteristics, binding and interacting sites for the study of protein-protein and protein-drug interactions, for diff erentiation of cancer tumour types, targeted gene therapy as well as for drug target discovery.1 yea
    corecore