Search CORE

Predicting protein linkages in bacteria: Which method is best depends on task

Author: A Karimpour-Fard
A Karimpour-Fard
A Karimpour-Fard
AJ Enright
AK Ramani
Anis Karimpour-Fard
B Rost
BP Westover
C von Mering
CM Fraser
D Barker
D Eisenberg
DJ Watts
E Nabieva
EM Marcotte
G Kolesov
G Moreno-Hagelsieb
G Moreno-Hagelsieb
H Salgado
H Salgado
I Shah
I Yanai
J Bockhorst
J Bockhorst
J Sun
J Sun
JC Mellor
L Wang
Lawrence E Hunter
M Craven
M Huynen
M Pellegrini
M Strong
MA Huynen
MD Ermolaeva
OG Troyanskaya
OX Cordero
P Shannon
PD Karp
PM Bowers
PR Romero
R Jansen
R Jothi
R Overbeek
R Overbeek
RL Tatusov
Ryan T Gill
S Leach
S Tsoka
SC Janga
Sonia M Leach
SV Date
T Dandekar
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations. Results Using <it>Escherichia coli </it>K12 and <it>Bacillus subtilis</it>, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in <it>E. coli </it>K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in <it>E. coli </it>K12 and 88% (333/418)in <it>B. subtilis</it>. Comparing two versions of the <it>E. coli </it>K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction. Conclusion A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task.</p

Fusion and Fission of Genes Define a Metric between Fungal Genomes

Author: A Courseaux
A Kamburov
AC Darby
AJ Enright
AJ Enright
Anton James Enright
B Dujon
B Dujon
B Snel
BJ Loftus
C d'Enfert
C Hall
C Notredame
C Vogel
CA Paulding
D Sherman
DA Fitzpatrick
David Sherman
DM Burnst
EE Eichler
EM Marcotte
EV Koonin
G Bourque
I Yanai
IJ Davis
J Schacherer
J Söding
JE Galagan
JE Galagan
JL Souciet
L Bonen
L Hermida
M Carapeti
M Krzywinski
M Nikolski
Macha Nikolski
P Akiva
P Cliften
Pascal Durrens
R Balakrishnan
RD Finn
S Hua
S Pasek
S Tsoka
SA Tomlins
SF Altschul
SK Kummerfeld
V Wood
VJ Promponas
W Wang
Z Zhang
Publication venue: Public Library of Science
Publication date: 01/10/2008
Field of study

Gene fusion and fission events are key mechanisms in the evolution of gene architecture, whose effects are visible in protein architecture when they occur in coding sequences. Until now, the detection of fusion and fission events has been performed at the level of protein sequences with a post facto removal of supernumerary links due to paralogy, and often did not include looking for events defined only in single genomes. We propose a method for the detection of these events, defined on groups of paralogs to compensate for the gene redundancy of eukaryotic genomes, and apply it to the proteomes of 12 fungal species. We collected an inventory of 1,680 elementary fusion and fission events. In half the cases, both composite and element genes are found in the same species. Per-species counts of events correlate with the species genome size, suggesting a random mechanism of occurrence. Some biological functions of the genes involved in fusion and fission events are slightly over- or under-represented. As already noted in previous studies, the genes involved in an event tend to belong to the same functional category. We inferred the position of each event in the evolution tree of the 12 fungal species. The event localization counts for all the segments of the tree provide a metric that depicts the “recombinational” phylogeny among fungi. A possible interpretation of this metric as distance in adaptation space is proposed

INRIA a CCSD electronic archive server

Stratification of co-evolving genomic groups using ranked phylogenetic profiles

Author: A Karimpour-Fard
A Muller
A Tsirigos
AC McHardy
AJ Enright
AJ Enright
Assaf Gottlieb
C Nieto
C Ouzounis
CA Ouzounis
Christos A Ouzounis
CM Fraser
DC Krakauer
DP Kreil
Eric Blanc
ES Snitkin
EV Koonin
GS Chang
H Teeling
I Cases
J Reidl
J Wu
L Goldovsky
L Goldovsky
LB Koski
Leon Goldovsky
M Pellegrini
MA Ragan
MR Graham
P Hugenholtz
R Chenna
RJ Case
RL Tatusov
S Cokus
S Freilich
S Garcia-Vallve
S Karlin
S Karlin
S Karlin
S Karlin
S Podell
SA Shelburne
SF Altschul
SG Tringe
Shiri Freilich
Sophia Tsoka
T Abe
TZ DeSantis
V Kunin
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present <it>rank-BLAST</it>, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples.</p

Public Library of Science (PLOS)

A Systems Model for Immune Cell Interactions Unravels the Mechanism of Inflammation in Human Skin

Inflammation is characterized by altered cytokine levels produced by cell populations in a highly interdependent manner. To elucidate the mechanism of an inflammatory reaction, we have developed a mathematical model for immune cell interactions via the specific, dose-dependent cytokine production rates of cell populations. The model describes the criteria required for normal and pathological immune system responses and suggests that alterations in the cytokine production rates can lead to various stable levels which manifest themselves in different disease phenotypes. The model predicts that pairs of interacting immune cell populations can maintain homeostatic and elevated extracellular cytokine concentration levels, enabling them to operate as an immune system switch. The concept described here is developed in the context of psoriasis, an immune-mediated disease, but it can also offer mechanistic insights into other inflammatory pathologies as it explains how interactions between immune cell populations can lead to disease phenotypes

Kazan Federal University Digital Repository

Digital.CSIC

Coverage of whole proteome by structural genomics observed through protein homology modeling database

Author: A Andreeva
A Krogh
A McPherson
A Stark
A Yamaguchi
AE Todd
Akihiro Yamaguchi
B Contreras-Moreira
B Contreras-Moreira
B John
C Chothia
CA Orengo
CJ Oldfield
D Baker
D Petrey
D Vitkup
DD Leipe
E Dobrovetsky
Editorial Board
EV Koonin
FS Domingues
G Liu
HJ Dyson
HM Berman
IM Wallace
J Kopp
J Kyte
J-M Chandonia
K Kinoshita
K Lundstrom
Kei Yura
L Lo Conte
L Stein
L Xie
LJ DeLucas
M Iwadate
M Ota
MA Marti-Renom
Mitiko Go
MJ Sippl
MO Dayhoff
N O’Toole
O Lichtarge
P Walian
R Linding
R Service
RA Laskowski
RA Laskowski
RF Doolittle
RR Copley
S Goldsmith-Fischman
S Tsoka
S Yokoyama
S-H Kim
S-H Kim
SE Brenner
SJ Campbell
SJ Wodak
SK Burley
SK Burley
T Hirokawa
T Kawabata
U Pieper
V Serre
Y Kyogoku
Publication venue: Kluwer Academic Publishers
Publication date: 01/01/2006
Field of study

We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE (http://daisy.nagahama-i-bio.ac.jp/Famsbase/), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics

Protein coalitions in a core mammalian biochemical network linked by rapidly evolving proteins

Abstract Background Cellular ATP levels are generated by glucose-stimulated mitochondrial metabolism and determine metabolic responses, such as glucose-stimulated insulin secretion (GSIS) from the β-cells of pancreatic islets. We describe an analysis of the evolutionary processes affecting the core enzymes involved in glucose-stimulated insulin secretion in mammals. The proteins involved in this system belong to ancient enzymatic pathways: glycolysis, the TCA cycle and oxidative phosphorylation. Results We identify two sets of proteins, or protein coalitions, in this group of 77 enzymes with distinct evolutionary patterns. Members of the glycolysis, TCA cycle, metabolite transport, pyruvate and NADH shuttles have low rates of protein sequence evolution, as inferred from a human-mouse comparison, and relatively high rates of evolutionary gene duplication. Respiratory chain and glutathione pathway proteins evolve faster, exhibiting lower rates of gene duplication. A small number of proteins in the system evolve significantly faster than co-pathway members and may serve as rapidly evolving adapters, linking groups of co-evolving genes. Conclusions Our results provide insights into the evolution of the involved proteins. We find evidence for two coalitions of proteins and the role of co-adaptation in protein evolution is identified and could be used in future research within a functional context.</p

Institute of Cancer Research Repository

Characterization of functional methylomes by next-generation capture sequencing identifies novel disease-associated variants.

Author: Alexandra C. Nica
Alfonso Buil
Alicja Wilk
Amy Barrett
André Tchernof
Antigone S. Dimas
Bing Ge
Catherine Ingle
Cecilia M. Lindgren
Christopher E. Lowe
Chrysanthi Ainali
Daniel Burgess
Daniel Glass
David Knowles
Elin Grundberg
Emmanouil T. Dermitzakis
Eshwar Meduri
Fiona Allum
Frank O. Nestle
Frédéric Guénard
Gabriela Surdulescu
James Nisbet
Johanna Sandling
John Lambourne
Jordana T. Bell
Josine L. Min
Julie Lessard
Karolina Tandre
Kerrin S. Small
Kourosh R. Ahmadi
Krina T. Zondervan
Lars Rönnblom
Leopold Parts
Loukia Tsaprouni
Magdalena Sekowska
Maria Krestyaninova
Marie-Claude Vohl
Marie-Michelle Simon
Mark I. McCarthy
Mark Lathrop
Mary E. Travers
Maxime Caron
Neelam Hassanali
Nicole Soranzo
null null
Panos Deloukas
Paola di Meglio
Richard Durbin
Simon Marceau
Simon Potter
So-Youn Shin
Sophia Tsoka
Stephan Busche
Stephen B. Montgomery
Stephen O'Rahilly
Timothy D. Spector
Todd Richmond
Tomi Pastinen
Tony Kwan
Tsun-Po Yang
Veronique Bataille
Xiaojian Shao
Åsa K. Hedman
Publication venue: Nat Commun
Publication date: 01/01/2015
Field of study

Most genome-wide methylation studies (EWAS) of multifactorial disease traits use targeted arrays or enrichment methodologies preferentially covering CpG-dense regions, to characterize sufficiently large samples. To overcome this limitation, we present here a new customizable, cost-effective approach, methylC-capture sequencing (MCC-Seq), for sequencing functional methylomes, while simultaneously providing genetic variation information. To illustrate MCC-Seq, we use whole-genome bisulfite sequencing on adipose tissue (AT) samples and public databases to design AT-specific panels. We establish its efficiency for high-density interrogation of methylome variability by systematic comparisons with other approaches and demonstrate its applicability by identifying novel methylation variation within enhancers strongly correlated to plasma triglyceride and HDL-cholesterol, including at CD36. Our more comprehensive AT panel assesses tissue methylation and genotypes in parallel at ∼4 and ∼3 M sites, respectively. Our study demonstrates that MCC-Seq provides comparable accuracy to alternative approaches but enables more efficient cataloguing of functional and disease-relevant epigenetic and genetic variants for large-scale EWAS.This work was supported by a Canadian Institute of Health Research (CIHR) team grant awarded to E.G., A.T., M.C.V. and M.L. (TEC-128093) and the CIHR funded Epigeneome Mapping Centre at McGill University (EP1-120608) awarded to T.P. and M.L., and the Swedish Research Council, Knut and Alice Wallenberg Foundation and the Torsten Söderberg Foundation awarded to L.R. F.A. holds studentship from The Research Institute of the McGill University Health Center (MUHC). F.G. is a recipient of a research fellowship award from the Heart and Stroke Foundation of Canada. A.T. is the director of a Research Chair in Bariatric and Metabolic Surgery. M.C.V. is the recipient of the Canada Research Chair in Genomics Applied to Nutrition and Health (Tier 1). E.G. and T.P. are recipients of a Canada Research Chair Tier 2 award. The MuTHER Study was funded by a programme grant from the Wellcome Trust (081917/Z/07/Z) and core funding for the Wellcome Trust Centre for Human Genetics (090532). TwinsUK was funded by the Wellcome Trust; European Community's Seventh Framework Programme (FP7/2007-2013). The study also receives support from the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy's and St Thomas' NHS Foundation Trust in partnership with King's College London. T.D.S. is a holder of an ERC Advanced Principal Investigator award. SNP genotyping was performed by The Wellcome Trust Sanger Institute and National Eye Institute via NIH/CIDR. Finally, we thank the NIH Roadmap Epigenomics Consortium and the Mapping Centers (http://nihroadmap.nih.gov/epigenomics/) for the production of publicly available reference epigenomes. Specifically, we thank the mapping centre at MGH/BROAD for generation of human adipose reference epigenomes used in this study.This is the final version. It was first published by NPG at http://www.nature.com/ncomms/2015/150529/ncomms8211/full/ncomms8211.html#abstrac

OA@INAF - Istituto Nazionale di Astrofisica

CorpusUL

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Spiral - Imperial College Digital Repository

arXiv.org e-Print Archive

Birmingham City University Open Access Repository

Publikationer från Uppsala Universitet

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

BCU Open Access

Apollo (Cambridge)

The Complete Genome Sequence of Thermoproteus tenax: A Physiologically Versatile Member of the Crenarchaeota

Author: A Hiller
A Schramm
A Swiatek
A Veith
Andrea Rosinus
André Plagens
Arnulf Kletzin
B Boeckmann
B Eikmanns
B Ewing
B Linke
B Siebers
Bettina Siebers
Britta Tjaden
C Baar
C Brochier-Armanet
CB Walker
Cecile Fairhead
CH Verhees
Christa Lanz
D Gordon
D Gordon
DG Ahn
DH Haft
DJ Naether
DR Smith
E Waters
ER Barry
EV Koonin
EV Koonin
F Fischer
F Li
F Meyer
F Sanger
F Werner
Fabian Blombach
Guenter Raddatz
H Huber
H Neumann
Hans-Peter Klenk
HP Klenk
I Anderson
I Orita
I Orita
J Easter Jr
J Van der Oost
JG Elkins
JG Elkins
JH Badger
JN Reeve
K Julenius
K Lewalter
K Liolios
KD Pruitt
Kira S. Makarova
KS Makarova
KS Makarova
KS Makarova
L Aravind
L Craig
LA Marraffini
LA Sazanov
M Csurös
M Eppinger
M Graupner
M Haering
M Kanehisa
M Selig
M Sumper
M Tsubaki
M Vaupel
M Zaparty
M Zaparty
Markus Rampp
Mathias von Jan
Melanie Zaparty
MGL Elferink
MT Facciotti
N Marinsek
N Yutin
Nikos Kyrpides
NP Robinson
NS Baliga
O Emanuelsson
P Rice
P Yarza
PP Chan
PP Gardner
Q Ren
R Barrangou
R Dirmeier
R Hedderich
R Jansen
RD Fleischmann
Reinhard Hensel
RH White
RK Lillestol
RL Tatusov
RW Rose
RY Samson
S Higuchi
S Kurtz
S Laska
S Paytubi
S Schaefer
S Tsoka
SA Qureshi
SD Bell
SD Bell
SF Altschul
SJ Hallam
SJJ Brouns
SL Salzberg
Sonja-Verena Albers
ST Fitz-Gibbon
Stephan C. Schuster
Steve D. Bell
SV Albers
SV Albers
T Coenye
T Lowe
T Mogi
T Soderberg
TL Born
TM Bandeiras
TM Zabriskie
U Jahn
V Müller
VM Markowitz
W Baumeister
W Wildhaber
W Zillig
WH Ramos-Vera
WH Ramos-Vera
X Luo
YM Drozdowicz
YM Drozdowicz
Z Szabó
Publication venue: PUBLIC LIBRARY SCIENCE
Publication date: 01/01/2011
Field of study

Here, we report on the complete genome sequence of the hyperthermophilic Crenarchaeum Thermoproteus tenax (strain Kra 1, DSM 2078(T)) a type strain of the crenarchaeotal order Thermoproteales. Its circular 1.84-megabase genome harbors no extrachromosomal elements and 2,051 open reading frames are identified, covering 90.6% of the complete sequence, which represents a high coding density. Derived from the gene content, T. tenax is a representative member of the Crenarchaeota. The organism is strictly anaerobic and sulfur-dependent with optimal growth at 86 degrees C and pH 5.6. One particular feature is the great metabolic versatility, which is not accompanied by a distinct increase of genome size or information density as compared to other Crenarchaeota. T. tenax is able to grow chemolithoautotrophically (CO2/H-2) as well as chemoorganoheterotrophically in presence of various organic substrates. All pathways for synthesizing the 20 proteinogenic amino acids are present. In addition, two presumably complete gene sets for NADH:quinone oxidoreductase (complex I) were identified in the genome and there is evidence that either NADH or reduced ferredoxin might serve as electron donor. Beside the typical archaeal A(0)A(1)-ATP synthase, a membrane-bound pyrophosphatase is found, which might contribute to energy conservation. Surprisingly, all genes required for dissimilatory sulfate reduction are present, which is confirmed by growth experiments. Mentionable is furthermore, the presence of two proteins (ParA family ATPase, actin-like protein) that might be involved in cell division in Thermoproteales, where the ESCRT system is absent, and of genes involved in genetic competence (DprA, ComF) that is so far unique within Archaea

Wageningen University & Research Publications

MPG.PuRe

CiteSeerX

Public Library of Science (PLOS)

TUbiblio

University of Regensburg Publication Server

UCL Discovery