Search CORE

9,688 research outputs found

Binding Site Prediction for Protein-Protein Interactions and Novel Motif Discovery using Re-occurring Polypeptide Sequences

Author: Amos-Binks Adam
Dehne Frank
Golshani Ashkan
Green James R
Gui Yuan
Patulea Catalin
Pitre Sylvain
Schoenrock Andrew
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: While there are many methods for predicting protein-protein interaction, very few can determine the specific site of interaction on each protein. Characterization of the specific sequence regions mediating interaction (binding sites) is crucial for an understanding of cellular pathways. Experimental methods often report false binding sites due to experimental limitations, while computational methods tend to require data which is not available at the proteome-scale. Here we present PIPE-Sites, a novel method of protein specific binding site prediction based on pairs of re-occurring polypeptide sequences, which have been previously shown to accurately predict proteinprotein interactions. PIPE-Sites operates at high specificity and requires only the sequences of query proteins and a database of known binary interactions with no binding site data, making it applicable to binding site prediction at the proteome-scale. Results: PIPE-Sites was evaluated using a dataset of 265 yeast and 423 human interacting proteins pairs with experimentally-determined binding sites. We found that PIPE-Sites predictions were closer to the confirmed binding site than those of two existing binding site prediction methods based on domain-domain interactions, when applied to the same dataset. Finally, we applied PIPE-Sites to two datasets of 2347 yeast and 14,438 human novel interacting protein pairs predicted to interact with high confidence. An analysis of the predicted interaction sites revealed a number of protein subsequences which are highly re-occurring in binding sites and which may represent novel binding motifs. Conclusions: PIPE-Sites is an accurate method for predicting protein binding sites and is applicable to the proteome-scale. Thus, PIPE-Sites could be useful for exhaustive analysis of protein binding patterns in whole proteomes as well as discovery of novel binding motifs. PIPE-Sites is available online a

CiteSeerX

Crossref

Carleton University's Institutional Repository

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Transcriptional Regulation of Cell-type Specific Expression in the Arabidopsis Root

Author: Leckie Keegan M
Publication venue: Scholarship@Western
Publication date: 15/08/2017
Field of study

Characterizing transcription factor interactions with their corresponding binding sites is crucial for understanding how gene expression is regulated by DNA sequence. A more comprehensive understanding of this process could have benefits in synthetic promoter design and creation of genetically modified organisms. Herein, the promoters of genes exhibiting cell-type specific expression within a single layer of the Arabidopsis root are analyzed to identify cis-regulatory motifs implicated in cell-type specific expression. De novo motif prediction identifies multiple motif candidates overly represented in the promoter sequences of co-expressed genes specific for epidermal, cortex, and endodermal expression. Several endodermal specific putative motifs are further analyzed for positional biases and tested in planta. A priori mapping of known cis-regulatory motifs catalogued in publicly available databases is also performed. Results show that cell-types contain different statistically significant enrichment patterns of both predicted and known cis-regulatory motifs. These results will help future research in designing cell-type specific synthetic promoters

Scholarship@Western

Detecting Remote Sequence Homology in Disordered Proteins: Discovery of Conserved Motifs in the N-Termini of Mononegavirales phosphoproteins

Author: Belshaw Robert
Karlin David
Publication venue: Public Library of Science
Publication date: 05/03/2012
Field of study

Paramyxovirinae are a large group of viruses that includes measles virus and parainfluenza viruses. The viral Phosphoprotein (P) plays a central role in viral replication. It is composed of a highly variable, disordered N-terminus and a conserved C-terminus. A second viral protein alternatively expressed, the V protein, also contains the N-terminus of P, fused to a zinc finger. We suspected that, despite their high variability, the N-termini of P/V might all be homologous; however, using standard approaches, we could previously identify sequence conservation only in some Paramyxovirinae. We now compared the N-termini using sensitive sequence similarity search programs, able to detect residual similarities unnoticeable by conventional approaches. We discovered that all Paramyxovirinae share a short sequence motif in their first 40 amino acids, which we called soyuz1. Despite its short length (11–16aa), several arguments allow us to conclude that soyuz1 probably evolved by homologous descent, unlike linear motifs. Conservation across such evolutionary distances suggests that soyuz1 plays a crucial role and experimental data suggest that it binds the viral nucleoprotein to prevent its illegitimate self-assembly. In some Paramyxovirinae, the N-terminus of P/V contains a second motif, soyuz2, which might play a role in blocking interferon signaling. Finally, we discovered that the P of related Mononegavirales contain similarly overlooked motifs in their N-termini, and that their C-termini share a previously unnoticed structural similarity suggesting a common origin. Our results suggest several testable hypotheses regarding the replication of Mononegavirales and suggest that disordered regions with little overall sequence similarity, common in viral and eukaryotic proteins, might contain currently overlooked motifs (intermediate in length between linear motifs and disordered domains) that could be detected simply by comparing orthologous proteins

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Flexibility and small pockets at protein-protein interfaces: New insights into druggability.

Author: Ascher David B
Blundell Tom L
Jubb Harry
Publication venue: Prog Biophys Mol Biol
Publication date: 07/02/2015
Field of study

The transient assembly of multiprotein complexes mediates many aspects of cell regulation and signalling in living organisms. Modulation of the formation of these complexes through targeting protein-protein interfaces can offer greater selectivity than the inhibition of protein kinases, proteases or other post-translational regulatory enzymes using substrate, co-factor or transition state mimetics. However, capitalising on protein-protein interaction interfaces as drug targets has been hindered by the nature of interfaces that tend to offer binding sites lacking the well-defined large cavities of classical drug targets. In this review we posit that interfaces formed by concerted folding and binding (disorder-to-order transitions on binding) of one partner and other examples of interfaces where a protein partner is bound through a continuous epitope from a surface-exposed helix, flexible loop or chain extension may be more tractable for the development of "orthosteric", competitive chemical modulators; these interfaces tend to offer small-volume but deep pockets and/or larger grooves that may be bound tightly by small chemical entities. We discuss examples of such protein-protein interaction interfaces for which successful chemical modulators are being developed.We thank our colleagues Alicia Higueruelo, Douglas Pires, Bernardo Ochoa and Chris Radoux for helpful comments and discussions. D.B.A is the recipient of a C. J. Martin Research Fellowship from the National Health and Medical Research Council of Australia (APP1072476). H.J. is supported by a CASE Studentship from the UCB and the Biotechnology and Biological Sciences Research Council (BBSRC) (Grant: BB/J500574/1). T.L.B. receives funding from University of Cambridge and The Wellcome Trust for facilities and support.This is the accepted manuscript of a paper published in Progress in Biophysics and Molecular Biology (Jubb H, Blundell TL, Ascher DB, Progress in Biophysics and Molecular Biology 2015, doi:10.1016/j.pbiomolbio.2015.01.009). The final version is available at http://dx.doi.org/10.1016/j.pbiomolbio.2015.01.009

Elsevier - Publisher Connector

PubMed Central

Apollo (Cambridge)

University of Melbourne Institutional Repository

Structural Annotation of Mycobacterium tuberculosis Proteome

Of the ∼4000 ORFs identified through the genome sequence of Mycobacterium tuberculosis (TB) H37Rv, experimentally determined structures are available for 312. Since knowledge of protein structures is essential to obtain a high-resolution understanding of the underlying biology, we seek to obtain a structural annotation for the genome, using computational methods. Structural models were obtained and validated for ∼2877 ORFs, covering ∼70% of the genome. Functional annotation of each protein was based on fold-based functional assignments and a novel binding site based ligand association. New algorithms for binding site detection and genome scale binding site comparison at the structural level, recently reported from the laboratory, were utilized. Besides these, the annotation covers detection of various sequence and sub-structural motifs and quaternary structure predictions based on the corresponding templates. The study provides an opportunity to obtain a global perspective of the fold distribution in the genome. The annotation indicates that cellular metabolism can be achieved with only 219 folds. New insights about the folds that predominate in the genome, as well as the fold-combinations that make up multi-domain proteins are also obtained. 1728 binding pockets have been associated with ligands through binding site identification and sub-structure similarity analyses. The resource (http://proline.physics.iisc.ernet.in/Tbstructuralannotation), being one of the first to be based on structure-derived functional annotations at a genome scale, is expected to be useful for better understanding of TB and for application in drug discovery. The reported annotation pipeline is fairly generic and can be applied to other genomes as well

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Open Access Repository of IISc Research Publications

Alien Domains Shaped the Modular Structure of Plant NLR Proteins

Author: Andolfo Giuseppe
Chiaiese Pasquale
De Natale Antonino
Di Donato Antimo
Ercolano Maria Raffaella
Frusciante Luigi
Jones Jonathan D G
Pollio Antonino
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2019
Field of study

Plant innate immunity mostly relies on nucleotide-binding (NB) and leucine-rich repeat (LRR) intracellular receptors to detect pathogen-derived molecules and to induce defense responses. A multitaxa reconstruction of NB-domain associations allowed us to identify the first NB-LRR arrangement in the Chlorophyta division of the Viridiplantae. Our analysis points out that the basic NOD-like receptor (NLR) unit emerged in Chlorophytes by horizontal transfer and its diversification started from Toll/interleukin receptor-NB-LRR members. The operon-based genomic structure of Chromochloris zofingiensis NLR copies suggests a functional origin of NLR clusters. Moreover, the transmembrane signatures of NLR proteins in the unicellular alga C. zofingiensis support the hypothesis that the NLR-based immunity system of plants derives from a cell-surface surveillance system. Taken together, our findings suggest that NLRs originated in unicellular algae and may have a common origin with cell-surface LRR receptors

Archivio della ricerca - Università degli studi di Napoli Federico II

University of East Anglia digital repository

Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts

Author: Roth Christian
Publication venue: University Goettingen Repository
Publication date: 06/09/2021
Field of study

Over the last decades a revolution in novel measurement techniques has permeated the biological sciences filling the databases with unprecedented amounts of data ranging from genomics, transcriptomics, proteomics and metabolomics to structural and ecological data. In order to extract insights from the vast quantity of data, computational and statistical methods are nowadays crucial tools in the toolbox of every biological researcher. In this thesis I summarize my contributions in two data-rich fields in biological sciences: transcription factor binding to DNA and protein structure prediction from protein sequences with shared evolutionary ancestry. In the first part of my thesis I introduce our work towards a web server for analysing transcription factor binding data with Bayesian Markov Models. In contrast to classical PWM or di-nucleotide models, Bayesian Markov models can capture complex inter-nucleotide dependencies that can arise from shape-readout and alternative binding modes. In addition to giving access to our methods in an easy-to-use, intuitive web-interface, we provide our users with novel tools and visualizations to better evaluate the biological relevance of the inferred binding motifs. We hope that our tools will prove useful for investigating weak and complex transcription factor binding motifs which cannot be predicted accurately with existing tools. The second part discusses a statistical attempt to correct out the phylogenetic bias arising in co-evolution methods applied to the contact prediction problem. Co-evolution methods have revolutionized the protein-structure prediction field more than 10 years ago, and, until very recently, have retained their importance as crucial input features to deep neural networks. As the co-evolution information is extracted from evolutionarily related sequences, we investigated whether the phylogenetic bias to the signal can be corrected out in a principled way using a variation of the Felsenstein's tree-pruning algorithm applied in combination with an independent-pair assumption to derive pairwise amino counts that are corrected for the evolutionary history. Unfortunately, the contact prediction derived from our corrected pairwise amino acid counts did not yield a competitive performance.2021-09-2

Georg-August-University Göttingen

Multicofactor proteins: structure, prediction, function

Author: Hearnshaw Stephen J
Publication venue
Publication date: 01/01/2011
Field of study

EThOS - Electronic Theses Online ServiceGBUnited Kingdo

University of East Anglia digital repository

OpenGrey Repository

Recommended from our members

A yeast-based assay for protein tyrosine kinase substrate specificity and inhibitor resistance

Author: Taft Joseph Michael
Publication venue
Publication date: 26/03/2020
Field of study

Phosphorylation of tyrosines by protein kinases is a fundamental mode of signal transduction in all eukaryotic cells, leading to a wide variety of cellular outcomes, including proliferation, differentiation, transcriptional activation, and programmed cell death. Perturbations to tyrosine kinase signaling networks by activation, overexpression, or mutation is the driving factor in many diseases, most notably cancers. The development of tyrosine kinase inhibitors, 37 of which are currently FDA-approved, has led to a revolution in cancer treatment. Imatinib, the first FDA-approved kinase inhibitor, has drastically improved prognosis for patients with Bcr-abl-positive leukemias. Despite this unprecedented success, however, up to one-third of patients lose response to imatinib due to mutations within the tyrosine kinase domain of Bcr-abl. Subsequent generations of Bcr-abl inhibitors, including dasatinib and ponatinib, have been developed to overcome these resistance mutations, but in each case, novel resistance mutations have arisen. We present a high-throughput yeast-based assay for the prediction of dasatinib- and ponatinib-resistant mutations in the ABL1 kinase domain. Our results not only recapitulate all known dasatinib-resistant mutations, but confirm recent patient data emphasizing the importance of compound mutations in ponatinib resistance. Furthermore, with hundreds of kinase inhibitors in development for the treatment of a wide range of diseases, understanding the cellular pathway of each kinase is critically important to the selection of ideal drug targets and avoiding potentially toxic side effects. Discovery of novel tyrosine kinase substrates is hindered by the presence of 90 human tyrosine kinases, which are often active in the same pathways. Phosphoproteomics, chemical genetics, and in vitro assays have been used to great success, yet only 30% of phosphorylated tyrosines in the human proteome have been assigned to a specific kinase. Recent advances in predicting tyrosine kinase substrates have been made by combining large data sets on kinase domain specificity, cellular localization, and protein-protein interactions in probabilistic algorithms. However, the high-quality data sets required for accurate predictions are often lacking. In chapter 2, we present a high-throughput yeastbased assay for screening millions of putative kinase substrates, which we then use to build a probabilistic model to accurately predict the in vitro phosphorylation of candidate substratesBiochemistr

Texas ScholarWorks

Discovery and Analysis of Aligned Pattern Clusters from Protein Family Sequences

Author: Lee En-Shiun Annie
Publication venue: 'University of Waterloo'
Publication date: 01/01/2014
Field of study

Protein sequences are essential for encoding molecular structures and functions. Consequently, biologists invest substantial resources and time discovering functional patterns in proteins. Using high-throughput technologies, biologists are generating an increasing amount of data. Thus, the major challenge in biosequencing today is the ability to conduct data analysis in an effi cient and productive manner. Conserved amino acids in proteins reveal important functional domains within protein families. Conversely, less conserved amino acid variations within these protein sequence patterns reveal areas of evolutionary and functional divergence. Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search is used. However, at present, combinatorial methods of pattern search generate a large set of solutions, and probabilistic methods require richer representations. They require biological ground truth of the input sequences, such as gene name or taxonomic species, as class labels based on traditional classi fication practice to train a model for predicting unknown sequences. However, these algorithms are inherently biased by mislabelling and may not be able to reveal class characteristics in a detailed and succinct manner. A novel pattern representation called an Aligned Pattern Cluster (AP Cluster) as developed in this dissertation is compact yet rich. It captures conservations and variations of amino acids and covers more sequences with lower entropy and greatly reduces the number of patterns. AP Clusters contain statistically signi cant patterns with variations; their importance has been confi rmed by the following biological evidences: 1) Most of the discovered AP Clusters correspond to binding segments while their aligned columns correspond to binding sites as verifi ed by pFam, PROSITE, and the three-dimensional structure. 2) By compacting strong correlated functional information together, AP Clusters are able to reveal class characteristics for taxonomical classes, gene classes and other functional classes, or incorrect class labelling. 3) Co-occurrence of AP Clusters on the same homologous protein sequences are spatially close in the protein's three-dimensional structure. These results demonstrate the power and usefulness of AP Clusters. They bring in similar statistically signifi cance patterns with variation together and align them to reveal protein regional functionality, class characteristics, binding and interacting sites for the study of protein-protein and protein-drug interactions, for diff erentiation of cancer tumour types, targeted gene therapy as well as for drug target discovery.1 yea

University of Waterloo's Institutional Repository