Search CORE

2,638 research outputs found

A structure filter for the Eukaryotic Linear Motif Resource

Author: A Salsmann
A Stein
AG Murzin
Allegra Via
AW Fenton
B Brannetti
B Petersen
C Chica
Cathryn M Gould
Christine Gemünd
CJ Sigrist
CM Gould
D Durocher
E Faraggi
E Gasteiger
E Petsalaki
ED Lowe
EK Hui
F Diella
H Dinkel
H Naderi-Manesh
HM Berman
J Kadlec
K Machida
K Roovers
M Fuxreiter
M Sheng
Manuela Helmer-Citterich
MB Yaffe
MC Rodriguez
MJ Macias
NE Davey
O Hantschel
P Puntervoll
R Apweiler
R Linding
RJ Edwards
S Balla
S Miller
SE Brenner
SF Altschul
SS Shapiro
SW Cowan-Jacob
T Pawson
Team RDC
TJ Gibson
Toby J Gibson
V Neduva
W Kabsch
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Many proteins are highly modular, being assembled from globular domains and segments of natively disordered polypeptides. Linear motifs, short sequence modules functioning independently of protein tertiary structure, are most abundant in natively disordered polypeptides but are also found in accessible parts of globular domains, such as exposed loops. The prediction of novel occurrences of known linear motifs attempts the difficult task of distinguishing functional matches from stochastically occurring non-functional matches. Although functionality can only be confirmed experimentally, confidence in a putative motif is increased if a motif exhibits attributes associated with functional instances such as occurrence in the correct taxonomic range, cellular compartment, conservation in homologues and accessibility to interacting partners. Several tools now use these attributes to classify putative motifs based on confidence of functionality. Results Current methods assessing motif accessibility do not consider much of the information available, either predicting accessibility from primary sequence or regarding any motif occurring in a globular region as low confidence. We present a method considering accessibility and secondary structural context derived from experimentally solved protein structures to rectify this situation. Putatively functional motif occurrences are mapped onto a representative domain, given that a high quality reference SCOP domain structure is available for the protein itself or a close relative. Candidate motifs can then be scored for solvent-accessibility and secondary structure context. The scores are calibrated on a benchmark set of experimentally verified motif instances compared with a set of random matches. A combined score yields 3-fold enrichment for functional motifs assigned to high confidence classifications and 2.5-fold enrichment for random motifs assigned to low confidence classifications. The structure filter is implemented as a pipeline with both a graphical interface via the ELM resource <url>http://elm.eu.org/</url> and through a Web Service protocol. Conclusion New occurrences of known linear motifs require experimental validation as the bioinformatics tools currently have limited reliability. The ELM structure filter will aid users assessing candidate motifs presenting in globular structural regions. Most importantly, it will help users to decide whether to expend their valuable time and resources on experimental testing of interesting motif candidates.</p

Crossref

Directory of Open Access Journals

PubMed Central

ART

Archivio della ricerca- Università di Roma La Sapienza

In silico discovery of transcription regulatory elements in Plasmodium falciparum

Author: Benner Chris
Chen Kaisheng
Johnson Jeffery R
Le Roch Karine G
Winzeler Elizabeth A
Yan S Frank
Young Jason A
Zhou Yingyao
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background With the sequence of the <it>Plasmodium falciparum </it>genome and several global mRNA and protein life cycle expression profiling projects now completed, elucidating the underlying networks of transcriptional control important for the progression of the parasite life cycle is highly pertinent to the development of new anti-malarials. To date, relatively little is known regarding the specific mechanisms the parasite employs to regulate gene expression at the mRNA level, with studies of the <it>P. falciparum </it>genome sequence having revealed few <it>cis</it>-regulatory elements and associated transcription factors. Although it is possible the parasite may evoke mechanisms of transcriptional control drastically different from those used by other eukaryotic organisms, the extreme AT-rich nature of <it>P. falciparum </it>intergenic regions (~90% AT) presents significant challenges to <it>in silico cis</it>-regulatory element discovery. Results We have developed an algorithm called Gene Enrichment Motif Searching (GEMS) that uses a hypergeometric-based scoring function and a position-weight matrix optimization routine to identify with high-confidence regulatory elements in the nucleotide-biased and repeat sequence-rich <it>P. falciparum </it>genome. When applied to promoter regions of genes contained within 21 co-expression gene clusters generated from <it>P. falciparum </it>life cycle microarray data using the semi-supervised clustering algorithm Ontology-based Pattern Identification, GEMS identified 34 putative <it>cis</it>-regulatory elements associated with a variety of parasite processes including sexual development, cell invasion, antigenic variation and protein biosynthesis. Among these candidates were novel motifs, as well as many of the elements for which biological experimental evidence already exists in the <it>Plasmodium </it>literature. To provide evidence for the biological relevance of a cell invasion-related element predicted by GEMS, reporter gene and electrophoretic mobility shift assays were conducted. Conclusion This GEMS analysis demonstrates that <it>in silico </it>regulatory element discovery can be successfully applied to challenging repeat-sequence-rich, base-biased genomes such as that of <it>P. falciparum</it>. The fact that regulatory elements were predicted from a diverse range of functional gene clusters supports the hypothesis that <it>cis</it>-regulatory elements play a role in the transcriptional control of many <it>P. falciparum </it>biological processes. The putative regulatory elements described represent promising candidates for future biological investigation into the underlying transcriptional control mechanisms of gene regulation in malaria parasites.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Molecular recognition of intrinsically disordered proteins

Author: Mészáros Bálint
Publication venue
Publication date: 01/01/2012
Field of study

ELTE Digital Institutional Repository (EDIT)

The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications

Author: Altman Russ B
Glazer Dariya S
Halperin Inbal
Wu Shirley
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Structural genomics efforts contribute new protein structures that often lack significant sequence and fold similarity to known proteins. Traditional sequence and structure-based methods may not be sufficient to annotate the molecular functions of these structures. Techniques that combine structural and functional modeling can be valuable for functional annotation. FEATURE is a flexible framework for modeling and recognition of functional sites in macromolecular structures. Here, we present an overview of the main components of the FEATURE framework, and describe the recent developments in its use. These include automating training sets selection to increase functional coverage, coupling FEATURE to structural diversity generating methods such as molecular dynamics simulations and loop modeling methods to improve performance, and using FEATURE in large-scale modeling and structure determination efforts

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Analysis of class C G-protein coupled receptors using supervised classification methods

Author: König Caroline Leonore
Publication venue: Universitat Politècnica de Catalunya
Publication date: 30/10/2018
Field of study

G protein-coupled receptors (GPCRs) are cell membrane proteins with a key role in regulating the function of cells. This is the result of their ability to transmit extracellular signals, which makes them relevant for pharmacology and has led, over the last decade, to active research in the field of proteomics. The current thesis specifically targets class C of GPCRs, which are relevant in therapies for various central nervous system disorders, such as Alzheimer’s disease, anxiety, Parkinson’s disease and schizophrenia. The investigation of protein functionality often relies on the knowledge of crystal three dimensional (3-D) structures, which determine the receptor’s ability for ligand binding responsible for the activation of certain functionalities in the protein. The structural information is therefore paramount, but it is not always known or easily unravelled, which is the case of eukaryotic cell membrane proteins such as GPCRs. In the face of the lack of information about the 3-D structure, research is often bound to the analysis of the primary amino acid sequences of the proteins, which are commonly known and available from curated databases. Much research on sequence analysis has focused on the quantitative analysis of their aligned versions, although, recently, alternative approaches using machine learning techniques for the analysis of alignment-free sequences have been proposed. In this thesis, we focus on the differentiation of class C GPCRs into functional and structural related subgroups based on the alignment-free analysis of their sequences using supervised classification models. In the first part of the thesis, the main topic is the construction of supervised classification models for unaligned protein sequences based on physicochemical transformations and n-gram representations of their amino acid sequences. These models are useful to assess the internal data quality of the externally labeled dataset and to manage the label noise problem from a data curation perspective. In its second part, the thesis focuses on the analysis of the sequences to discover subtype- and region-speci¿c sequence motifs. For that, we carry out a systematic analysis of the topological sequence segments with supervised classification models and evaluate the subtype discrimination capability of each region. In addition, we apply different types of feature selection techniques to the n-gram representation of the amino acid sequence segments to find subtype and region specific motifs. Finally, we compare the findings of this motif search with the partially known 3D crystallographic structures of class C GPCRs.Los receptores acoplados a proteínas G (GPCRs) son proteínas de la membrana celular con un papel clave para la regulación del funcionamiento de una célula. Esto es consecuencia de su capacidad de transmisión de señales extracelulares, lo que les hace relevante en la farmacología y que ha llevado a investigaciones activas en la última década en el área de la proteómica. Esta tesis se centra específicamente en la clase C de GPCRs, que son relevante para terapias de varios trastornos del sistema nervioso central, como la enfermedad de Alzheimer, ansiedad, enfermedad de Parkinson y esquizofrenia. La investigación de la funcionalidad de proteínas muchas veces se basa en el conocimiento de la estructura cristalina tridimensional (3-D), que determina la capacidad del receptor para la unión con ligandos, que son responsables para la activación de ciertas funcionalidades en la proteína. El análisis de secuencias de amino ácidos se ha centrado en muchas investigaciones en el análisis cuantitativo de las versiones alineados de las secuencias, aunque, recientemente, se han propuesto métodos alternativos usando métodos de aprendizaje automático aplicados a las versiones no-alineadas de las secuencias. En esta tesis, nos centramos en la diferenciación de los GPCRs de la clase C en subgrupos funcionales y estructurales basado en el análisis de las secuencias no-alineadas utilizando modelos de clasificación supervisados. Estos modelos son útiles para evaluar la calidad interna de los datos a partir del conjunto de datos etiquetados externamente y para gestionar el problema del 'ruido de datos' desde la perspectiva de la curación de datos. En su segunda parte, la tesis enfoca el análisis de las secuencias para descubrir motivos de secuencias específicos a nivel de subtipo o región. Para eso, llevamos a cabo un análisis sistemático de los segmentos topológicos de la secuencia con modelos supervisados de clasificación y evaluamos la capacidad de discriminar entre subtipos de cada región. Adicionalmente, aplicamos diferentes tipos de técnicas de selección de atributos a las representaciones mediante n-gramas de los segmentos de secuencias de amino ácidos para encontrar motivos específicos a nivel de subtipo y región. Finalmente, comparamos los descubrimientos de la búsqueda de motivos con las estructuras cristalinas parcialmente conocidas para la clase C de GPCRs

Tesis Doctorals en Xarxa

Prediction of Cyclin-Dependent Kinase Phosphorylation Substrates

Author: Begum Rashida
Chait Brian T.
Chang Emmanuel J.
Gaasterland Terry
Publication venue: Public Library of Science
Publication date: 01/08/2007
Field of study

Protein phosphorylation, mediated by a family of enzymes called cyclin-dependent kinases (Cdks), plays a central role in the cell-division cycle of eukaryotes. Phosphorylation by Cdks directs the cell cycle by modifying the function of regulators of key processes such as DNA replication and mitotic progression. Here, we present a novel computational procedure to predict substrates of the cyclin-dependent kinase Cdc28 (Cdk1) in the Saccharomyces cerevisiae. Currently, most computational phosphorylation site prediction procedures focus solely on local sequence characteristics. In the present procedure, we model Cdk substrates based on both local and global characteristics of the substrates. Thus, we define the local sequence motifs that represent the Cdc28 phosphorylation sites and subsequently model clustering of these motifs within the protein sequences. This restraint reflects the observation that many known Cdk substrates contain multiple clustered phosphorylation sites. The present strategy defines a subset of the proteome that is highly enriched for Cdk substrates, as validated by comparing it to a set of bona fide, published, experimentally characterized Cdk substrates which was to our knowledge, comprehensive at the time of writing. To corroborate our model, we compared its predictions with three experimentally independent Cdk proteomic datasets and found significant overlap. Finally, we directly detected in vivo phosphorylation at Cdk motifs for selected putative substrates using mass spectrometry

Public Library of Science (PLOS)

City University of New York

Directory of Open Access Journals

PubMed Central

Computational Discovery of Structured Non-coding RNA Motifs in Bacteria

Author: Brewer Kenneth Ivan
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/04/2021
Field of study

This dissertation describes a range of computational efforts to discover novel structured non-coding RNA (ncRNA) motifs in bacteria and generate hypotheses regarding their potential functions. This includes an introductory description of key advances in comparative genomics and RNA structure prediction as well as some of the most commonly found ncRNA candidates. Beyond that, I describe efforts for the comprehensive discovery of ncRNA candidates in 25 bacterial genomes and a catalog of the various functions hypothesized for these new motifs. Finally, I describe the Discovery of Intergenic Motifs PipeLine (DIMPL) which is a new computational toolset that harnesses the power of support vector machine (SVM) classifiers to identify bacterial intergenic regions most likely to contain novel structured ncRNA and automates the bulk of the subsequent analysis steps required to predict function. In totality, the body of work will enable the large scale discovery of novel structured ncRNA motifs at a far greater pace than possible before

Yale University

Contextual Specificity in Peptide-Mediated Protein Interactions

Author: Aloy Patrick
Stein Amelie
Publication venue: Public Library of Science
Publication date: 02/07/2008
Field of study

Most biological processes are regulated through complex networks of transient protein interactions where a globular domain in one protein recognizes a linear peptide from another, creating a relatively small contact interface. Although sufficient to ensure binding, these linear motifs alone are usually too short to achieve the high specificity observed, and additional contacts are often encoded in the residues surrounding the motif (i.e. the context). Here, we systematically identified all instances of peptide-mediated protein interactions of known three-dimensional structure and used them to investigate the individual contribution of motif and context to the global binding energy. We found that, on average, the context is responsible for roughly 20% of the binding and plays a crucial role in determining interaction specificity, by either improving the affinity with the native partner or impeding non-native interactions. We also studied and quantified the topological and energetic variability of interaction interfaces, finding a much higher heterogeneity in the context residues than in the consensus binding motifs. Our analysis partially reveals the molecular mechanisms responsible for the dynamic nature of peptide-mediated interactions, and suggests a global evolutionary mechanism to maximise the binding specificity. Finally, we investigated the viability of non-native interactions and highlight cases of potential cross-reaction that might compensate for individual protein failure and establish backup circuits to increase the robustness of cell networks

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Transcriptome analysis of Alzheimer\u27s disease identifies links to cardiovascular disease

Author: Ray Monika
Ruan Jianhua
Zhang Weixiong
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2008
Field of study

Washington University St. Louis: Open Scholarship