Search CORE

266 research outputs found

A novel structure-based encoding for machine-learning applied to the inference of SH3 domain specificity

Author: Ausiello G
Ferraro E
Helmer-Citterich M
Via A
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

MOTIVATION: Unravelling the rules underlying protein-protein and protein-ligand interactions is a crucial step in understanding cell machinery. Peptide recognition modules (PRMs) are globular protein domains which focus their binding targets on short protein sequences and play a key role in the frame of protein-protein interactions. High-throughput techniques permit the whole proteome scanning of each domain, but they are characterized by a high incidence of false positives. In this context, there is a pressing need for the development of in silico experiments to validate experimental results and of computational tools for the inference of domain-peptide interactions. RESULTS: We focused on the SH3 domain family and developed a machine-learning approach for inferring interaction specificity. SH3 domains are well-studied PRMs which typically bind proline-rich short sequences characterized by the PxxP consensus. The binding information is known to be held in the conformation of the domain surface and in the short sequence of the peptide. Our method relies on interaction data from high-throughput techniques and benefits from the integration of sequence and structure data of the interacting partners. Here, we propose a novel encoding technique aimed at representing binding information on the basis of the domain-peptide contact residues in complexes of known structure. Remarkably, the new encoding requires few variables to represent an interaction, thus avoiding the 'curse of dimension'. Our results display an accuracy >90% in detecting new binders of known SH3 domains, thus outperforming neural models on standard binary encodings, profile methods and recent statistical predictors. The method, moreover, shows a generalization capability, inferring specificity of unknown SH3 domains displaying some degree of similarity with the known data

ART

Archivio della ricerca- Università di Roma La Sapienza

Bayesian machine learning methods for predicting protein-peptide interactions and detecting mosaic structures in DNA sequences alignments

Author: Lehrach Wolfgang
Publication venue: The University of Edinburgh
Publication date: 01/01/2010
Field of study

Short well-defined domains known as peptide recognition modules (PRMs) regulate many important protein-protein interactions involved in the formation of macromolecular complexes and biochemical pathways. High-throughput experiments like yeast two-hybrid and phage display are expensive and intrinsically noisy, therefore it would be desirable to target informative interactions and pursue in silico approaches. We propose a probabilistic discriminative approach for predicting PRM-mediated protein-protein interactions from sequence data. The model suffered from over-fitting, so Laplacian regularisation was found to be important in achieving a reasonable generalisation performance. A hybrid approach yielded the best performance, where the binding site motifs were initialised with the predictions of a generative model. We also propose another discriminative model which can be applied to all sequences present in the organism at a significantly lower computational cost. This is due to its additional assumption that the underlying binding sites tend to be similar.It is difficult to distinguish between the binding site motifs of the PRM due to the small number of instances of each binding site motif. However, closely related species are expected to share similar binding sites, which would be expected to be highly conserved. We investigated rate variation along DNA sequence alignments, modelling confounding effects such as recombination. Traditional approaches to phylogenetic inference assume that a single phylogenetic tree can represent the relationships and divergences between the taxa. However, taxa sequences exhibit varying levels of conservation, e.g. due to regulatory elements and active binding sites, and certain bacteria and viruses undergo interspecific recombination. We propose a phylogenetic factorial hidden Markov model to infer recombination and rate variation. We examined the performance of our model and inference scheme on various synthetic alignments, and compared it to state of the art breakpoint models. We investigated three DNA sequence alignments: one of maize actin genes, one bacterial (Neisseria), and the other of HIV-1. Inference is carried out in the Bayesian framework, using Reversible Jump Markov Chain Monte Carlo

Edinburgh Research Archive

Satisfiability, sequence niches, and molecular codes in cellular signaling

Author: Bijlsma
Brannetti
Burack
Burger
C.R. Myers
Castagnoli
Cesareni
Correale
Djordjevic
Edelman
Figge
Friedgut
Gerland
Gomes
Gravner
Hellingwerf
Itzkovitz
Kirkpatrick
Landgraf
Lau
Marathe
Mayer
McClean
Mezard
Mezard
Mitchell
Monasson
Morrison
Noirel
Pawson
Percus
Poelwijk
Ramani
Sear
Sengupta
Shannon
Shannon
Tlusty
van Nimwegen
Zarrinpar
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 18/12/2007
Field of study

Biological information processing as implemented by regulatory and signaling networks in living cells requires sufficient specificity of molecular interaction to distinguish signals from one another, but much of regulation and signaling involves somewhat fuzzy and promiscuous recognition of molecular sequences and structures, which can leave systems vulnerable to crosstalk. This paper examines a simple computational model of protein-protein interactions which reveals both a sharp onset of crosstalk and a fragmentation of the neutral network of viable solutions as more proteins compete for regions of sequence space, revealing intrinsic limits to reliable signaling in the face of promiscuity. These results suggest connections to both phase transitions in constraint satisfaction problems and coding theory bounds on the size of communication codes

arXiv.org e-Print Archive

Crossref

Using genome-wide measurements for computational prediction of SH2–peptide interactions

Author: Altuvia
Altuvia
Bergamin
Berman
Bock
Brannetti
Chen
Deeds
DeLano
Diella
Djordjevic
Donald
Edgar
Endres
Ferraro
Frese
Goldstein
Gomez
Grigoryan
Grucza
Havranek
Henriques
Hou
Hu
Jones
Kaplan
Kinney
Kolesov
Kuriyan
Lee
Lehrach
Leonid A. Mirny
Li
Liu
Liu
Lundegaard
Mandel-Gutfreund
McLaughlin
Mirny
Miyazawa
Morozov
Moult
Murzin
Obenauer
Pazos
Poy
Reiss
Sanchez
Sayle
Schleinkofer
Sheinerman
Songyang
Stiffler
Suenaga
Vendruscolo
Vendruscolo
Vendruscolo
Waksman
Wiedemann
Wollacott
Yaffe
Zeba Wunderlich
Zhang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/04/2009
Field of study

Peptide-recognition modules (PRMs) are used throughout biology to mediate protein–protein interactions, and many PRMs are members of large protein domain families. Recent genome-wide measurements describe networks of peptide–PRM interactions. In these networks, very similar PRMs recognize distinct sets of peptides, raising the question of how peptide-recognition specificity is achieved using similar protein domains. The analysis of individual protein complex structures often gives answers that are not easily applicable to other members of the same PRM family. Bioinformatics-based approaches, one the other hand, may be difficult to interpret physically. Here we integrate structural information with a large, quantitative data set of SH2 domain–peptide interactions to study the physical origin of domain–peptide specificity. We develop an energy model, inspired by protein folding, based on interactions between the amino-acid positions in the domain and peptide. We use this model to successfully predict which SH2 domains and peptides interact and uncover the positions in each that are important for specificity. The energy model is general enough that it can be applied to other members of the SH2 family or to new peptides, and the cross-validation results suggest that these energy calculations will be useful for predicting binding interactions. It can also be adapted to study other PRM families, predict optimal peptides for a given SH2 domain, or study other biological interactions, e.g. protein–DNA interactions.National Institutes of Health. National Centers for Biomedical Computing (Informatics for Integrating Biology and the Bedside)National Institutes of Health (U.S.) (grant U54LM008748

DSpace@MIT

Crossref

Harvard University - DASH

PubMed Central

eScholarship - University of California

Integrating structure-based machine learning and co-evolution to investigate specificity in plant sesquiterpene synthases

Author: Beekwilder J.
Bouwmeester H.J.
de Ridder D.
Durairaj J.
Melillo E.
van Dijk A.D.J.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 22/03/2021
Field of study

International Migration, Integration and Social Cohesion online publications

Phosphoproteomics Analyses to Identify the Candidate Substrates and Signaling Intermediates of the Non-Receptor Tyrosine Kinase, SRMS

Author: Goel Raghuveera Kumar 1988-
Publication venue: 'University of Saskatchewan Library'
Publication date: 23/08/2018
Field of study

SRMS (Src-related kinase lacking C-terminal regulatory tyrosine and N-terminal myristoylaton sites) is a non-receptor tyrosine kinase that belongs to the BRK family kinases (BFKs) and is evolutionarily related to the Src family kinases (SFKs). Like SFKs and BFKs, the SRMS protein comprises of two domains involved in protein-protein interactions, namely, the Src-homology 3 domain (SH3) and Src-homology 2 domain (SH2) and one catalytic kinase domain. Unlike members of the BFKs and SFKs, the biochemical and cellular role of SRMS is poorly understood primarily due to the lack of information on the substrates and signaling intermediates regulated by the kinase. Previous biochemical studies have shown that wild type SRMS is enzymatically active and leads to the tyrosine-phosphorylation of several proteins, when expressed exogenously in mammalian cells. These tyrosine-phosphorylated proteins represent the candidate cellular substrates of SRMS which are largely unknown. Further, previous studies have determined that the SRMS protein displays a characteristic punctate cytoplasmic localization pattern in mammalian cells. These SRMS cytoplasmic puncta are uncharacterized and may provide insights into the biochemical and cellular role of the kinase. Here, we utilized mass spectrometry-based quantitative label-free phosphoproteomics to (a) identify the candidate SRMS cellular substrates and (b) candidate signaling intermediates regulated by SRMS, in HEK293 cells expressing ectopic SRMS. Specifically, using a phosphotyrosine enrichment strategy we identified 663 candidate SRMS substrates and consensus substrate-motifs of SRMS. We used customized peptide arrays and performed the high-throughput validation of a subset of the identified candidate SRMS substrates. Further, we independently validated Vimentin and Sam68 as bonafide SRMS substrates. Next, using Titanium dioxide (TiO2)-based phosphopeptide enrichment columns, we identified multiple signaling intermediates of SRMS. Functional gene enrichment analyses revealed several common and unique cellular processes regulated by the candidate SRMS substrates and signaling intermediates. Overall, these studies led to the identification of a significant number of novel and biologically relevant SRMS candidate substrates and signaling intermediates, which mapped to a number of cellular and biological processes primarily involved in cell cycle regulation, apoptosis, RNA processing, DNA repair and protein synthesis. These findings provide an important resource for future mechanistic studies to investigate the cellular and physiological functions of the SRMS. Studies towards characterizing the SRMS cytoplasmic puncta showed that the SRMS punctate structures do not colocalize with some of the major cellular organelles investigated, such as the mitochondria, endoplasmic reticulum, golgi bodies and lysosomes. However, studies investigating the involvement of the SRMS domains in puncta-localization revealed that the SRMS SH2 domain partly regulates this localization pattern. These results highlight the potential role of the SRMS SH2 domain in the localization of SRMS to these cytoplasmic sites and lay important groundwork for future characterization studies

eCommons@USASK

University of Saskatchewan Research Archive

Caretta – A multiple protein structure alignment and feature extraction suite

Author: Akdel Mehmet
Dijk Aalt D.J., van
Durairaj Janani
Ridder Dick, de
Publication venue
Publication date
Field of study

The vast number of protein structures currently available opens exciting opportunities for machine learning on proteins, aimed at predicting and understanding functional properties. In particular, in combination with homology modelling, it is now possible to not only use sequence features as input for machine learning, but also structure features. However, in order to do so, robust multiple structure alignments are imperative. Here we present Caretta, a multiple structure alignment suite meant for homologous but sequentially divergent protein families which consistently returns accurate alignments with a higher coverage than current state-of-the-art tools. Caretta is available as a GUI and command-line application and additionally outputs an aligned structure feature matrix for a given set of input structures, which can readily be used in downstream steps for supervised or unsupervised machine learning. We show Caretta's performance on two benchmark datasets, and present an example application of Caretta in predicting the conformational state of cyclin-dependent kinases.</p

Wageningen University & Research Publications

Recommended from our members

Quantitative Approaches to the Genomics of Clonal Evolution

Author: Zairis Sakellarios
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

Many problems in the biological sciences reduce to questions of genetic evolution. Entire classes of medical pathology, such as malignant neoplasia or infectious disease, can be viewed in the light of Darwinian competition of genomes. With the benefit of today's maturing sequencing technologies we can observe and quantify genetic evolution with nucleotide resolution. This provides a molecular view of genetic material that has adapted, or is in the process of adapting, to its local selection pressures. A series of problems will be discussed in this thesis, all involving the mathematical modeling of genomic data derived from clonally evolving populations. We use a variety of computational approaches to characterize over-represented features in the data, with the underlying hypothesis that we may be detecting fitness-conferring features of the biology. In Part I we consider the cross-sectional sampling of human tumors via RNA-sequencing, and devise computational pipelines for detecting oncogenic gene fusions and oncovirus infections. Genomic translocation and oncovirus infection can each be a highly penetrant alteration in a tumor's evolutionary history, with famous examples of both populating the cancer biology literature. In order to exert a transforming influence over the host cell, gene fusions and viral genetic programs need to be expressed and thus can be detected via whole transcriptome sequencing of a malignant cell population. We describe our approaches to predicting oncogenic gene fusions (Chapter 2) and quantifying host-viral interactions (Chapter 3) in large panels of human tumor tissue. The alterations that we characterize prompt the larger question of how the genetics of tumors and viruses might vary in time, leading us to the study of serially sampled populations. In Part II we consider longitudinal sampling of a clonally evolving population. Phylogenetic trees are the standard representation of a clonal process, an evolutionary picture as old as Darwin's voyages on the Beagle. Chapter 4 first reviews phylogenetic inference and then introduces a certain phylogenetic tree space that forms the starting point of our work on the topic. Specifically, Chapter 4 describes the construction of our projective tree space along with an explicit implementation for visualizing point clouds of rescaled trees. The Chapter finishes by defining a method for stable dimensionality reduction of large phylogenies, which is useful for analyzing long genomic time series. In Chapter 5 we consider medically relevant instances of clonal evolution and the longitudinal genetic data sets to which they give rise. We analyze data from (i) the sequencing of cancers along their therapeutic course, (ii) the passaging of a xenografted tumor through a mouse model, and (iii) the seasonal surveillance of H3N2 influenza's hemagglutinin segment. A novel approach to predicting influenza vaccine effectiveness is demonstrated using statistics of point clouds in tree spaces. Our investigations into clonal processes may be extended beyond naturally occurring genomes. In Part III we focus on the directed clonal evolution of populations of synthetic RNAs in vitro. Analogous to the selection pressures exerted upon malignant cells or viral particles, these synthetic RNA genomes can be evolved against a desired fitness objective. We investigate fitness objectives related to reprogramming ribosomal translation. Chapter 6 identifies high fitness RNA pseudoknot geometries capable of inducing ribosomal frameshift, while Chapter 7 takes an unbiased approach to evolving sequence and structural elements that promote stop codon readthrough

Columbia University Academic Commons

Interpretability-oriented data-driven modelling of bladder cancer via computational intelligence

Author: De Alejandro Montalvo Julio Cesar
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 13/02/2015
Field of study

White Rose E-theses Online

Computational identification of new structured cis-regulatory elements in the 3'-untranslated region of human protein coding genes

Author: Brown C.
Chen X.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Messenger ribonucleic acids (RNAs) contain a large number of cis-regulatory RNA elements that function in many types of post-transcriptional regulation. These cis-regulatory elements are often characterized by conserved structures and/or sequences. Although some classes are well known, given the wide range of RNA-interacting proteins in eukaryotes, it is likely that many new classes of cis-regulatory elements are yet to be discovered. An approach to this is to use computational methods that have the advantage of analysing genomic data, particularly comparative data on a large scale. In this study, a set of structural discovery algorithms was applied followed by support vector machine (SVM) classification. We trained a new classification model (CisRNA-SVM) on a set of known structured cis-regulatory elements from 3′-untranslated regions (UTRs) and successfully distinguished these and groups of cis-regulatory elements not been strained on from control genomic and shuffled sequences. The new method outperformed previous methods in classification of cis-regulatory RNA elements. This model was then used to predict new elements from cross-species conserved regions of human 3′-UTRs. Clustering of these elements identified new classes of potential cis-regulatory elements. The model, training and testing sets and novel human predictions are available at: http://mRNA.otago.ac.nz/CisRNA-SVM

PubMed Central

MPG.PuRe