Search CORE

37,295 research outputs found

Mining protein loops using a structural alphabet and statistical exceptionality

Author: A Dembo
A Efimov
A Golovin
A Sacan
A Via
AC Camproux
AC Camproux
AC Camproux
Anne-Claude Camproux
AR Panchenko
AR Panchenko
B Oliva
BJ Polacco
BL Sibanda
BL Sibanda
BL Sibanda
BW Matthews
C Kiss
CG Hunter
CM Venkatachalam
D Leader
D Stuart
DF Burke
E Rocha
EG Hutchinson
EJ Milner-White
EJ Milner-White
F den Hollander
G Ausiello
G Ausiello
G Nuel
G Nuel
G Nuel
G Pugalenthi
GD Rose
Gregory Nuel
J Espadaler
J Martin
J Martin
J van Helden
J Wojcik
JF Leszczynski
JM Kwasigroch
JS Fetrow
JS Richardson
Juliette Martin
JW Sammon
JW Torrance
KC Chou
L Regad
LE Donate
Leslie Regad
LN Johnson
LR Rabiner
LS Bernstein
M Hollander
M Mönnigmann
M Saraste
MY Leung
N Colloc'h
N Fernandez-Fuentes
N Fernandez-Fuentes
O Sander
P Fuchs
PA Rice
PN Lewis
R Kolodny
S Karlin
S Kim
S Kullback
S Sourice
SA Benner
SA Benner
SD Rufino
V Pavone
W Kabsch
W Li
W Li
WL DeLano
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. Results We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints. Conclusions We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at <url>http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

A structural classification of protein-protein interactions for detection of convergently evolved motifs and for prediction of protein binding sites on sequence level

Author: Henschel Andreas
Publication venue: Technische Universität Dresden
Publication date: 17/10/2008
Field of study

BACKGROUND: A long-standing challenge in the post-genomic era of Bioinformatics is the prediction of protein-protein interactions, and ultimately the prediction of protein functions. The problem is intrinsically harder, when only amino acid sequences are available, but a solution is more universally applicable. So far, the problem of uncovering protein-protein interactions has been addressed in a variety of ways, both experimentally and computationally. MOTIVATION: The central problem is: How can protein complexes with solved threedimensional structure be utilized to identify and classify protein binding sites and how can knowledge be inferred from this classification such that protein interactions can be predicted for proteins without solved structure? The underlying hypothesis is that protein binding sites are often restricted to a small number of residues, which additionally often are well-conserved in order to maintain an interaction. Therefore, the signal-to-noise ratio in binding sites is expected to be higher than in other parts of the surface. This enables binding site detection in unknown proteins, when homology based annotation transfer fails. APPROACH: The problem is addressed by first investigating how geometrical aspects of domain-domain associations can lead to a rigorous structural classification of the multitude of protein interface types. The interface types are explored with respect to two aspects: First, how do interface types with one-sided homology reveal convergently evolved motifs? Second, how can sequential descriptors for local structural features be derived from the interface type classification? Then, the use of sequential representations for binding sites in order to predict protein interactions is investigated. The underlying algorithms are based on machine learning techniques, in particular Hidden Markov Models. RESULTS: This work includes a novel approach to a comprehensive geometrical classification of domain interfaces. Alternative structural domain associations are found for 40% of all family-family interactions. Evaluation of the classification algorithm on a hand-curated set of interfaces yielded a precision of 83% and a recall of 95%. For the first time, a systematic screen of convergently evolved motifs in 102.000 protein-protein interactions with structural information is derived. With respect to this dataset, all cases related to viral mimicry of human interface bindings are identified. Finally, a library of 740 motif descriptors for binding site recognition - encoded as Hidden Markov Models - is generated and cross-validated. Tests for the significance of motifs are provided. The usefulness of descriptors for protein-ligand binding sites is demonstrated for the case of &quot;ATP-binding&quot;, where a precision of 89% is achieved, thus outperforming comparable motifs from PROSITE. In particular, a novel descriptor for a P-loop variant has been used to identify ATP-binding sites in 60 protein sequences that have not been annotated before by existing motif databases

Technische Universität Dresden: Qucosa

Graph Theory and Networks in Biology

Author: Mason Oliver
Verwoerd Mark
Publication venue
Publication date: 06/04/2006
Field of study

In this paper, we present a survey of the use of graph theoretical techniques in Biology. In particular, we discuss recent work on identifying and modelling the structure of bio-molecular networks, as well as the application of centrality measures to interaction networks and research on the hierarchical structure of such networks and network motifs. Work on the link between structural network properties and dynamics is also described, with emphasis on synchronization and disease propagation.Comment: 52 pages, 5 figures, Survey Pape

arXiv.org e-Print Archive

CiteSeerX

MURAL - Maynooth University Research Archive Library

NUI Maynooth Eprint Archive

Maynooth University ePrints and eTheses Archive

Genome-wide analysis of 30 -untranslated regions supports the existence of post-transcriptional regulons controlling gene expression in trypanosomes

Author: Agüero Fernan Gonzalo
Carmona Santiago Javier
de Gaudenzi Javier Gerardo
Frasch Alberto Carlos C.
Publication venue: 'PeerJ'
Publication date: 01/07/2013
Field of study

In eukaryotic cells, a group of messenger ribonucleic acids (mRNAs) encoding functionally interrelated proteins together with the trans-acting factors that coordinately modulate their expression is termed a post-transcriptional regulon, due to their partial analogy to a prokaryotic polycistron. This mRNA clustering is organized by sequence-specific RNA-binding proteins (RBPs) that bind cis-regulatory elements in the noncoding regions of genes, and mediates the synchronized control of their fate. These recognition motifs are often characterized by conserved sequences and/or RNA structures, and it is likely that various classes of cis-elements remain undiscovered. Current evidence suggests that RNA regulons govern gene expression in trypanosomes, unicellular parasites which mainly use post-transcriptional mechanisms to control protein synthesis. In this study, we used motif discovery tools to test whether groups of functionally related trypanosomatid genes contain a common cis-regulatory element. We obtained conserved structured RNA motifs statistically enriched in the noncoding region of 38 out of 53 groups of metabolically related transcripts in comparison with a random control. These motifs have a hairpin loop structure, a preferred sense orientation and are located in close proximity to the open reading frames. We found that 15 out of these 38 groups represent unique motifs in which most 30 -UTR signature elements were group-specific. Two extensively studied Trypanosoma cruzi RBPs, TcUBP1 and TcRBP3 were found associated with a few candidate RNA regulons. Interestingly, 13 motifs showed a strong correlation with clusters of developmentally co-expressed genes and six RNA elements were enriched in gene clusters affected after hyperosmotic stress. Here we report a systematic genome-wide in silico screen to search for novel RNA-binding sites in transcripts, and describe an organized network of several coordinately regulated cohorts of mRNAs in T. cruzi. Moreover, we found that structured RNA elements are also conserved in other human pathogens. These results support a model of regulation of gene expression by multiple post-transcriptional regulons in trypanosomes.Fil: de Gaudenzi, Javier Gerardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús). Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús); ArgentinaFil: Carmona, Santiago Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús). Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús); ArgentinaFil: Agüero, Fernan Gonzalo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús). Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús); ArgentinaFil: Frasch, Alberto Carlos C.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús). Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús); Argentin

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Directory of Open Access Journals

PubMed Central

Protein Repeats from First Principles

Author: Becher Veronica Andrea
Espada Rocío
Ferreiro Diego
Parra Rodrigo Gonzalo
Turjanski Pablo Guillermo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2016
Field of study

Some natural proteins display recurrent structural patterns. Despite being highly similar at the tertiary structure level, repeating patterns within a single repeat protein can be extremely variable at the sequence level. We use a mathematical definition of a repetition and investigate the occurrences of these in sequences of different protein families. We found that long stretches of perfect repetitions are infrequent in individual natural proteins, even for those which are known to fold into structures of recurrent structural motifs. We found that natural repeat proteins are indeed repetitive in their families, exhibiting abundant stretches of 6 amino acids or longer that are perfect repetitions in the reference family. We provide a systematic quantification for this repetitiveness. We show that this form of repetitiveness is not exclusive of repeat proteins, but also occurs in globular domains. A by-product of this work is a fast quantification of the likelihood of a protein to belong to a family.Fil: Turjanski, Pablo Guillermo. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Parra, Rodrigo Gonzalo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Espada, Rocío. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Becher, Veronica Andrea. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Ferreiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; Argentin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

Author: Abeel
Afflerbach
Angarica
Bailey
Bart Hooghe
Bauer
Benos
Breiman
Bulyk
Burden
Calladine
Camenisch
Chen
Cho
Cordell
Davis
Dickerson
Ehret
Ernst
Frans van Roy
Friedel
Fujii
Fulton
Gama-Castro
Gardiner
Gartenberg
Gershenzon
Goodsell
Gorin
Gowrisankar
Greenbaum
Gunewardena
Hall
Hendrickson
Hu
Juo
Kajimura
Kaplan
Karas
Kel
Kim
Lavery
Lewis
Liu
Liu
Liu
Long
Lu
Lu
Lu
Lunetta
Man
Marco
Marinescu
Martinez-Hackert
Matys
Medina-Rivera
Meysman
Michel
Mokry
Morozov
Narang
Naughton
O'Flanagan
Olson
Paillard
Pan
Parker
Parvin
Pieter De Bleser
Ponomarenko
Portales-Casamar
Powell
Pudimat
Ramsey
Rohs
Rohs
Rohs
Ruiz
Satchwell
Schneider
Shakked
Sharon
Shi
Spolar
Stefan Broos
Stormo
Svozil
Thayer
Tomovic
Toro-Roman
Travers
Tullius
Wunderlich
Zhang
Zhang
Zhu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Transcription factor binding sites (TFBSs) are DNA sequences of 6-15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding

Crossref

Ghent University Academic Bibliography

PubMed Central

iLIR : a web resource for prediction of Atg8-family interacting proteins

Author: Johansen Terje
Kalvari Ioanna
Mulakkal Nitha Charles
Nezis I. P.
Osgood Richard
Promponas Vasilis J.
Tsompanis Stelios
Publication venue: 'Informa UK Limited'
Publication date: 26/02/2014
Field of study

Macroautophagy was initially considered to be a nonselective process for bulk breakdown of cytosolic material. However, recent evidence points toward a selective mode of autophagy mediated by the so-called selective autophagy receptors (SARs). SARs act by recognizing and sorting diverse cargo substrates (e.g., proteins, organelles, pathogens) to the autophagic machinery. Known SARs are characterized by a short linear sequence motif (LIR-, LRS-, or AIM-motif) responsible for the interaction between SARs and proteins of the Atg8 family. Interestingly, many LIR-containing proteins (LIRCPs) are also involved in autophagosome formation and maturation and a few of them in regulating signaling pathways. Despite recent research efforts to experimentally identify LIRCPs, only a few dozen of this class of—often unrelated—proteins have been characterized so far using tedious cell biological, biochemical, and crystallographic approaches. The availability of an ever-increasing number of complete eukaryotic genomes provides a grand challenge for characterizing novel LIRCPs throughout the eukaryotes. Along these lines, we developed iLIR, a freely available web resource, which provides in silico tools for assisting the identification of novel LIRCPs. Given an amino acid sequence as input, iLIR searches for instances of short sequences compliant with a refined sensitive regular expression pattern of the extended LIR motif (xLIR-motif) and retrieves characterized protein domains from the SMART database for the query. Additionally, iLIR scores xLIRs against a custom position-specific scoring matrix (PSSM) and identifies potentially disordered subsequences with protein interaction potential overlapping with detected xLIR-motifs. Here we demonstrate that proteins satisfying these criteria make good LIRCP candidates for further experimental verification. Domain architecture is displayed in an informative graphic, and detailed results are also available in tabular form. We anticipate that iLIR will assist with elucidating the full complement of LIRCPs in eukaryotes

Crossref

PubMed Central

Warwick Research Archives Portal Repository

Colored Motifs Reveal Computational Building Blocks in the C. elegans Brain

Author: AL Barabasi
AL Barabási
Arend Hintze
Christoph Adami
D Chase
D Hall
E Niebur
EL White
J Karbowski
J Richmond
J White
JG White
Jifeng Qian
JJ Rice
JJ Tyson
JW Lichtman
LH Hartwell
LR Varshney
M Newman
M Reigl
MEJ Newman
O Sporns
Olaf Sporns
P Westfall
PJ Ingram
R Milo
R Milo
RJ Prill
S Dudoit
S Song
S Wernicke
S Wernicke
S Wuchty
SS Shen-Orr
TB Achacoso
U Alon
W Callebaut
WP Lee
Y Yoshimura
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/12/2010
Field of study

Background: Complex networks can often be decomposed into less complex sub-networks whose structures can give hints about the functional organization of the network as a whole. However, these structural motifs can only tell one part of the functional story because in this analysis each node and edge is treated on an equal footing. In real networks, two motifs that are topologically identical but whose nodes perform very different functions will play very different roles in the network. Methodology/Principal Findings: Here, we combine structural information derived from the topology of the neuronal network of the nematode C. elegans with information about the biological function of these nodes, thus coloring nodes by function. We discover that particular colorations of motifs are significantly more abundant in the worm brain than expected by chance, and have particular computational functions that emphasize the feed-forward structure of information processing in the network, while evading feedback loops. Interneurons are strongly over-represented among the common motifs, supporting the notion that these motifs process and transduce the information from the sensor neurons towards the muscles. Some of the most common motifs identified in the search for significant colored motifs play a crucial role in the system of neurons controlling the worm's locomotion. Conclusions/Significance: The analysis of complex networks in terms of colored motifs combines two independent data sets to generate insight about these networks that cannot be obtained with either data set alone. The method is general and should allow a decomposition of any complex networks into its functional (rather than topological) motifs as long as both wiring and functional information is available

arXiv.org e-Print Archive

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Caltech Authors

Dalarna University College Electronic Archive

Digitala Vetenskapliga Arkivet - Academic Archive On-line