Search CORE

14 research outputs found

SIMAP—structuring the network of protein similarities

Author: B. Wachinger
Bendtsen
Benson
Deshpande
Emanuelsson
Enright
F. Hamberger
Henikoff
J. Krebs
J. Krumsiek
Kaplan
Kriventseva
Krogh
Mulder
P. Tischler
Pruitt
R. Arnold
Schmidt
T. Rattei
V. Stumpflen
W. Mewes
Wu
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Protein sequences are the most important source of evolutionary and functional information for new proteins. In order to facilitate the computationally intensive tasks of sequence analysis, the Similarity Matrix of Proteins (SIMAP) database aims to provide a comprehensive and up-to-date dataset of the pre-calculated sequence similarity matrix and sequence-based features like InterPro domains for all proteins contained in the major public sequence databases. As of September 2007, SIMAP covers ∼17 million proteins and more than 6 million non-redundant sequences and provides a complete annotation based on InterPro 16. Novel features of SIMAP include a new, portlet-based web portal providing multiple, structured views on retrieved proteins and integration of protein clusters and a unique search method for similar domain architectures. Access to SIMAP is freely provided for academic use through the web portal for individuals at http://mips.gsf.de/simap/and through Web Services for programmatic access at http://mips.gsf.de/webservices/services/SimapService2.0?wsdl

Crossref

University of Birmingham Research Portal

PubMed Central

PuSH

Carotid Plaque Age Is a Feature of Plaque Stability Inversely Related to Levels of Plasma Insulin

C-declination curve (a result of the atomic bomb tests in the 1950s and 1960s) to determine the average biological age of carotid plaques.C content by accelerator mass spectrometry. The average plaque age (i.e. formation time) was 9.6±3.3 years. All but two plaques had formed within 5–15 years before surgery. Plaque age was not associated with the chronological ages of the patients but was inversely related to plasma insulin levels (p = 0.0014). Most plaques were echo-lucent rather than echo-rich (2.24±0.97, range 1–5). However, plaques in the lowest tercile of plaque age (most recently formed) were characterized by further instability with a higher content of lipids and macrophages (67.8±12.4 vs. 50.4±6.2, p = 0.00005; 57.6±26.1 vs. 39.8±25.7, p<0.0005, respectively), less collagen (45.3±6.1 vs. 51.1±9.8, p<0.05), and fewer smooth muscle cells (130±31 vs. 141±21, p<0.05) than plaques in the highest tercile. Microarray analysis of plaques in the lowest tercile also showed increased activity of genes involved in immune responses and oxidative phosphorylation.C, can improve our understanding of carotid plaque stability and therefore risk for clinical complications. Our results also suggest that levels of plasma insulin might be involved in determining carotid plaque age

Public Library of Science (PLOS)

Crossref

PubMed Central

Semantic integration to identify overlapping functional modules in protein interaction networks

Author: A Barrat
A Tanay
A-C Gavin
A-L Barabási
AD King
Aidong Zhang
AW Rives
C von Mering
CA Ball
CM Deane
D Bu
E Ravasz
G Palla
H Jeong
HW Mewes
L Salwinski
LH Hartwell
M Girvan
MP Samanta
Murali Ramanathan
P Pei
P Resnik
P Uetz
R Dunn
S Tornow
T Ideker
T Ito
The Gene Ontology Consortium
TR Hvidsten
V Arnau
V Spirin
Woochang Hwang
Y Ho
Y-R Cho
Young-Rae Cho
Z Fang
Z Lubovac
Publication venue: BioMed Central
Publication date: 01/07/2007
Field of study

Abstract Background The systematic analysis of protein-protein interactions can enable a better understanding of cellular organization, processes and functions. Functional modules can be identified from the protein interaction networks derived from experimental data sets. However, these analyses are challenging because of the presence of unreliable interactions and the complex connectivity of the network. The integration of protein-protein interactions with the data from other sources can be leveraged for improving the effectiveness of functional module detection algorithms. Results We have developed novel metrics, called semantic similarity and semantic interactivity, which use Gene Ontology (GO) annotations to measure the reliability of protein-protein interactions. The protein interaction networks can be converted into a weighted graph representation by assigning the reliability values to each interaction as a weight. We presented a flow-based modularization algorithm to efficiently identify overlapping modules in the weighted interaction networks. The experimental results show that the semantic similarity and semantic interactivity of interacting pairs were positively correlated with functional co-occurrence. The effectiveness of the algorithm for identifying modules was evaluated using functional categories from the MIPS database. We demonstrated that our algorithm had higher accuracy compared to other competing approaches. Conclusion The integration of protein interaction networks with GO annotation data and the capability of detecting overlapping modules substantially improve the accuracy of module identification.</p

Crossref

Directory of Open Access Journals

PubMed Central

CLUSS: Clustering of protein sequences based on a new similarity measure

Author: A Krause
Abdellali Kelil
AJ Enright
Alain Fleury
C Notredame
D Higgins
ELL Sonnhammer
F Titgemeyer
G Reinert
G Yona
H Lodish
IV Tetko
J Felsenstein
J Heringa
J Rocha
JD Thompson
JD Thompson
JH Ward
JH Ward
JS Varré
K Katoh
K Sjölander
K Sjölander
M Ike
M Kimura
MO Dayhoff
MY Leung
N Côté
N Wicker
P Pipenbacher
R Jothi
RC Edgar
RC Edgar
RO Duda
Ryszard Brzezinski
S Fanning
S Henikoff
S Karlin
S Karlin
S Karlin
S Vinga
SF Altschul
SF Altschul
Shengrui Wang
T Fukamizo
T Ishimizu
V Batagelj
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "<it>phylogenetic</it>" in the sense of "<it>relatedness of biological functions</it>". Results To show the effectiveness of CLUSS, we performed an extensive clustering on COG database. To demonstrate its ability to deal with hard-to-align sequences, we tested it on the GH2 family. In addition, we carried out experimental comparisons of CLUSS with a variety of mainstream algorithms. These comparisons were made on hard-to-align and easy-to-align protein sequences. The results of these experiments show the superiority of CLUSS in yielding clusters of proteins with similar functional activity. Conclusion We have developed an effective method and tool for clustering protein sequences to meet the needs of biologists in terms of phylogenetic analysis and prediction of biological functions. Compared to existing clustering methods, CLUSS more accurately highlights the functional characteristics of the clustered families. It provides biologists with a new and plausible instrument for the analysis of protein sequences, especially those that cause problems for the alignment-dependent algorithms.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data

Author: Linard Benjamin
Nguyen Ngoc Hoan
Poch Olivier
Prosdocimi Francisco
Thompson Julie D.
Publication venue: Libertas Academica
Publication date: 01/12/2011
Field of study

Evolutionary systems biology aims to uncover the general trends and principles governing the evolution of biological networks. An essential part of this process is the reconstruction and analysis of the evolutionary histories of these complex, dynamic networks. Unfortunately, the methodologies for representing and exploiting such complex evolutionary histories in large scale studies are currently limited. Here, we propose a new formalism, called EvoluCode (Evolutionary barCode), which allows the integration of different evolutionary parameters (eg, sequence conservation, orthology, synteny …) in a unifying format and facilitates the multilevel analysis and visualization of complex evolutionary histories at the genome scale. The advantages of the approach are demonstrated by constructing barcodes representing the evolution of the complete human proteome. Two large-scale studies are then described: (i) the mapping and visualization of the barcodes on the human chromosomes and (ii) automatic clustering of the barcodes to highlight protein subsets sharing similar evolutionary histories and their functional analysis. The methodologies developed here open the way to the efficient application of other data mining and knowledge extraction techniques in evolutionary systems biology studies. A database containing all EvoluCode data is available at: http://lbgi.igbmc.fr/barcodes

CiteSeerX

Crossref

HAL-Inserm

Directory of Open Access Journals

PubMed Central

Genome-Wide Comparative Gene Family Classification

Author: A Barriere
A Heger
A Jaccard
A Kelil
A Krause
A Krause
A Paccanaro
AJ Enright
AJ Enright
AJ Vilella
C-Y Chen
CF Higgins
CH Wu
Christian Frech
CP Ponting
D Lee
E Bolten
E Jacoby
ER Troemel
ES Lander
EV Kriventseva
EV Kriventseva
EV Kriventseva
F Abascal
F Tekaia
G Yona
H Li
HM Robertson
HM Robertson
HM Robertson
HM Robertson
IV Tetko
J Huerta-Cepas
J Schultz
JA Sheps
JC Venter
JD Thompson
JH Thomas
JH Thomas
JH Thomas
JP Demuth
K Tamura
LD Stein
MO Dayhoff
N Chen
N Hulo
N Kaplan
Nansheng Chen
P Pipenbacher
P Sperisen
PK Wall
Q Ma
RD Finn
Robert DeSalle
S Aftab
S Kim
S Nakanishi
SA Rahman
T Meinel
T Wittkop
TJ Harlow
Y Chen
Y Loewenstein
Z Zhao
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Simon Fraser University Institutional Repository

Multi-Organ Expression Profiling Uncovers a Gene Module in Coronary Artery Disease Involving Transendothelial Migration of Leukocytes and LIM Domain Binding 2: The Stockholm Atherosclerosis Gene Expression (STAGE) Study

Author: A Bateman
A Moise
A Zernecke
AD Agulnick
AD Hauer
AD Hauer
AH Berg
AJ Lusis
Anders Franco-Cereceda
Anders Hamsten
Angela Silveira
Ann Samnegård
B Efron
BH Mecham
Björn Brinne
Bruna Gigante
C von Mering
CC Zielinski
CJ Storbeck
CJ Thompson
D Stengel
EE Schadt
EE Schadt
Eric E. Schadt
FH Sims
G Dennis Jr
G Getz
G Gibson
GK Hansson
GP Fazio
GS Ginsburg
H Kojima
H Mizunuma
HD Lieu
HK Wu
Hua Zhong
HY Stevens
HZ Sheng
I Wendelhag
IV Tetko
J Eeckhoute
J Palmblad
J Skogsberg
J Tegner
J Tegner
Jan Liska
Jesper Lundström
Jesper Tegnér
Johan Björkegren
Josefin Skogsberg
JR Bradley
Karin Leander
Kathleen Kerr
Lee M. Kaplan
LW Jurata
M Blatt
Maria Bradshaw
MB Eisen
Ming-Mei Shang
ML Bots
PA Koni
Peri Noori
Peter Konrad
Q Cai
Q Cai
RA Irizarry
Rabbe Takolander
Roland Nilsson
S Gredmark
S Hallerstam
Sara Hägg
Shohreh Maleki
Stefan Rosfors
T Mazurek
Torbjörn Ivert
TW Mak
Ulf de Faire
Ulf Lockowandt
V Braunersreuther
V Matys
Vladimir B. Bajic
WG Austen
Y Adler
Y Xu
Y Yamada
Y Yamada
Publication venue: Public Library of Science
Publication date: 01/12/2009
Field of study

Environmental exposures filtered through the genetic make-up of each individual alter the transcriptional repertoire in organs central to metabolic homeostasis, thereby affecting arterial lipid accumulation, inflammation, and the development of coronary artery disease (CAD). The primary aim of the Stockholm Atherosclerosis Gene Expression (STAGE) study was to determine whether there are functionally associated genes (rather than individual genes) important for CAD development. To this end, two-way clustering was used on 278 transcriptional profiles of liver, skeletal muscle, and visceral fat (n = 66/tissue) and atherosclerotic and unaffected arterial wall (n = 40/tissue) isolated from CAD patients during coronary artery bypass surgery. The first step, across all mRNA signals (n = 15,042/12,621 RefSeqs/genes) in each tissue, resulted in a total of 60 tissue clusters (n = 3958 genes). In the second step (performed within tissue clusters), one atherosclerotic lesion (n = 49/48) and one visceral fat (n = 59) cluster segregated the patients into two groups that differed in the extent of coronary stenosis (P = 0.008 and P = 0.00015). The associations of these clusters with coronary atherosclerosis were validated by analyzing carotid atherosclerosis expression profiles. Remarkably, in one cluster (n = 55/54) relating to carotid stenosis (P = 0.04), 27 genes in the two clusters relating to coronary stenosis were confirmed (n = 16/17, P<10−27and−30). Genes in the transendothelial migration of leukocytes (TEML) pathway were overrepresented in all three clusters, referred to as the atherosclerosis module (A-module). In a second validation step, using three independent cohorts, the A-module was found to be genetically enriched with CAD risk by 1.8-fold (P<0.004). The transcription co-factor LIM domain binding 2 (LDB2) was identified as a potential high-hierarchy regulator of the A-module, a notion supported by subnetwork analysis, by cellular and lesion expression of LDB2, and by the expression of 13 TEML genes in Ldb2–deficient arterial wall. Thus, the A-module appears to be important for atherosclerosis development and, together with LDB2, merits further attention in CAD research

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Detecting Network Communities: An Application to Phylogenetic Analysis

This paper proposes a new method to identify communities in generally weighted complex networks and apply it to phylogenetic analysis. In this case, weights correspond to the similarity indexes among protein sequences, which can be used for network construction so that the network structure can be analyzed to recover phylogenetically useful information from its properties. The analyses discussed here are mainly based on the modular character of protein similarity networks, explored through the Newman-Girvan algorithm, with the help of the neighborhood matrix . The most relevant networks are found when the network topology changes abruptly revealing distinct modules related to the sets of organisms to which the proteins belong. Sound biological information can be retrieved by the computational routines used in the network approach, without using biological assumptions other than those incorporated by BLAST. Usually, all the main bacterial phyla and, in some cases, also some bacterial classes corresponded totally (100%) or to a great extent (>70%) to the modules. We checked for internal consistency in the obtained results, and we scored close to 84% of matches for community pertinence when comparisons between the results were performed. To illustrate how to use the network-based method, we employed data for enzymes involved in the chitin metabolic pathway that are present in more than 100 organisms from an original data set containing 1,695 organisms, downloaded from GenBank on May 19, 2007. A preliminary comparison between the outcomes of the network-based method and the results of methods based on Bayesian, distance, likelihood, and parsimony criteria suggests that the former is as reliable as these commonly used methods. We conclude that the network-based method can be used as a powerful tool for retrieving modularity information from weighted networks, which is useful for phylogenetic analysis

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Digital.CSIC