Search CORE

17 research outputs found

InParanoid 6: eukaryotic ortholog clusters with inparalogs

Author: A.-C. Berglund
Dessimoz
E. L. L. Sonnhammer
E. Sjolund
Fitch
G. Ostlund
Hulsen
O'Brien
Sonnhammer
Tatusov
Publication venue: Oxford University Press
Publication date
Field of study

The InParanoid eukaryotic ortholog database (http://InParanoid.sbc.su.se/) has been updated to version 6 and is now based on 35 species. We collected all available ‘complete’ eukaryotic proteomes and Escherichia coli, and calculated ortholog groups for all 595 species pairs using the InParanoid program. This resulted in 2 642 187 pairwise ortholog groups in total. The orthology-based species relations are presented in an orthophylogram. InParanoid clusters contain one or more orthologs from each of the two species. Multiple orthologs in the same species, i.e. inparalogs, result from gene duplications after the species divergence. A new InParanoid website has been developed which is optimized for speed both for users and for updating the system. The XML output format has been improved for efficient processing of the InParanoid ortholog clusters

Crossref

PubMed Central

Algorithm of OMA for large-scale orthology inference

Author: A Alexeyenko
A Bateman
A Schneider
AC Berglund-Sonnhammer
AK Bjorklund
Alexander CJ Roth
AM Altenhoff
AR Mushegian
C Dessimoz
C Dessimoz
C Dessimoz
CEV Storm
Christophe Dessimoz
CM Zmasek
D Fulton
DA Benson
DP Wall
ELL Sonnhammer
Gaston H Gonnet
K Chen
L Jensen
L Li
M Dayhoff
M Farrar
M Gil
M Remm
P Flicek
R Balasubramanian
RA Notebaart
RL Tatusov
RL Tatusov
RTJMvan der Heijden
TF DeLuca
TF Smith
WM Fitch
Publication venue: BioMed Central
Publication date: 01/12/2008
Field of study

Since the publication of our article (Roth, Gonnet, and Dessimoz: BMC Bioinformatics 2008 9: 518), we have noticed several errors, which we correct in the following

Repository for Publications and Research Data

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UCL Discovery

eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations

Author: A. Roth
Altschul
Aurrecoechea
Berglund
C. von Mering
D. Szklarczyk
Datta
Edgar
Eyre
Felsenstein
Finn
Fitch
Gilbert
Guindon
Harris
Hubbard
Huerta-Cepas
I. Letunic
J. Muller
Jensen
Jensen
Kanehisa
Katoh
Koonin
Kriventseva
Kuhn
Kuzniar
L. J. Jensen
Letunic
Letunic
Li
Loytynoja
M. Kuhn
Makarova
P. Bork
P. Julien
Pruitt
Roth
S. Powell
Saebo
Sonnhammer
Swarbreck
T. Doerks
Tatusov
Tatusov
Thompson
Thompson
Uchiyama
van der Heijden
Vilella
Wapinski
Waterhouse
Zmasek
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

The identification of orthologous relationships forms the basis for most comparative genomics studies. Here, we present the second version of the eggNOG database, which contains orthologous groups (OGs) constructed through identification of reciprocal best BLAST matches and triangular linkage clustering. We applied this procedure to 630 complete genomes (529 bacteria, 46 archaea and 55 eukaryotes), which is a 2-fold increase relative to the previous version. The pipeline yielded 224 847 OGs, including 9724 extended versions of the original COG and KOG. We computed OGs for different levels of the tree of life; in addition to the species groups included in our first release (i.e. fungi, metazoa, insects, vertebrates and mammals), we have now constructed OGs for archaea, fishes, rodents and primates. We automatically annotate the non-supervised orthologous groups (NOGs) with functional descriptions, protein domains, and functional categories as defined initially for the COG/KOG database. In-depth analysis is facilitated by precomputed high-quality multiple sequence alignments and maximum-likelihood trees for each of the available OGs. Altogether, eggNOG covers 2 242 035 proteins (built from 2 590 259 proteins) and provides a broad functional description for at least 1 966 709 (88%) of them. Users can access the complete set of orthologous groups via a web interface at: http://eggnog.embl.de

Crossref

PubMed Central

UCL Discovery

Copenhagen University Research Information System

ZORA

MDC Repository

Ortho2ExpressMatrix—a web server that interprets cross-species gene expression data by gene family information

Author: A Krause
A Krause
A Valencia
AC Berglund
AJ Enright
AJ Enright
AJ Vilella
Andreas H Ludewig
BY Liao
C Frech
EL Sonnhammer
EV Koonin
G Ostlund
H Edwards
H Parkinson
HS Le
I Rivals
J Michaud
KI Goh
L Huminiecki
M Kanehisa
M Kapushesky
M Pellegrini
M Remm
Michal R Schweiger
P Flicek
Ralf Herwig
Ramu Chenna
RC Friedman
RD Finn
RL Tatusov
S Abhiman
S Griffiths-Jones
S Haider
SF Altschul
Sylvia Krobitsch
T Barrett
T Domazet-Loso
T Meinel
T Meinel
Thomas Meinel
TJ Hubbard
TW Harris
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The study of gene families is pivotal for the understanding of gene evolution across different organisms and such phylogenetic background is often used to infer biochemical functions of genes. Modern high-throughput experiments offer the possibility to analyze the entire transcriptome of an organism; however, it is often difficult to deduct functional information from that data. Results To improve functional interpretation of gene expression we introduce Ortho2ExpressMatrix, a novel tool that integrates complex gene family information, computed from sequence similarity, with comparative gene expression profiles of two pre-selected biological objects: gene families are displayed with two-dimensional matrices. Parameters of the tool are object type (two organisms, two individuals, two tissues, etc.), type of computational gene family inference, experimental meta-data, microarray platform, gene annotation level and genome build. Family information in Ortho2ExpressMatrix bases on computationally different protein family approaches such as EnsemblCompara, InParanoid, SYSTERS and Ensembl Family. Currently, respective all-against-all associations are available for five species: human, mouse, worm, fruit fly and yeast. Additionally, microRNA expression can be examined with respect to miRBase or TargetScan families. The visualization, which is typical for Ortho2ExpressMatrix, is performed as matrix view that displays functional traits of genes (differential expression) as well as sequence similarity of protein family members (BLAST e-values) in colour codes. Such translations are intended to facilitate the user's perception of the research object. Conclusions Ortho2ExpressMatrix integrates gene family information with genome-wide expression data in order to enhance functional interpretation of high-throughput analyses on diseases, environmental factors, or genetic modification or compound treatment experiments. The tool explores differential gene expression in the light of orthology, paralogy and structure of gene families up to the point of ambiguity analyses. Results can be used for filtering and prioritization in functional genomic, biomedical and systems biology applications. The web server is freely accessible at <url>http://bioinf-data.charite.de/o2em/cgi-bin/o2em.pl</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MPG.PuRe

A Computational Study of Elongation Factor G (EFG) Duplicated Genes: Diverged Nature Underlying the Innovation on the Same Structural Template

BACKGROUND: Elongation factor G (EFG) is a core translational protein that catalyzes the elongation and recycling phases of translation. A more complex picture of EFG's evolution and function than previously accepted is emerging from analyzes of heterogeneous EFG family members. Whereas the gene duplication is postulated to be a prominent factor creating functional novelty, the striking divergence between EFG paralogs can be interpreted in terms of innovation in gene function. METHODOLOGY/PRINCIPAL FINDINGS: We present a computational study of the EFG protein family to cover the role of gene duplication in the evolution of protein function. Using phylogenetic methods, genome context conservation and insertion/deletion (indel) analysis we demonstrate that the EFG gene copies form four subfamilies: EFG I, spdEFG1, spdEFG2, and EFG II. These ancient gene families differ by their indispensability, degree of divergence and number of indels. We show the distribution of EFG subfamilies and describe evidences for lateral gene transfer and recent duplications. Extended studies of the EFG II subfamily concern its diverged nature. Remarkably, EFG II appears to be a widely distributed and a much-diversified subfamily whose subdivisions correlate with phylum or class borders. The EFG II subfamily specific characteristics are low conservation of the GTPase domain, domains II and III; absence of the trGTPase specific G2 consensus motif "RGITI"; and twelve conserved positions common to the whole subfamily. The EFG II specific functional changes could be related to changes in the properties of nucleotide binding and hydrolysis and strengthened ionic interactions between EFG II and the ribosome, particularly between parts of the decoding site and loop I of domain IV. CONCLUSIONS/SIGNIFICANCE: Our work, for the first time, comprehensively identifies and describes EFG subfamilies and improves our understanding of the function and evolution of EFG duplicated genes

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A quality metric for homology modeling: the H-factor

Author: A Berglund
A Ilari
A Kolinski
A Sali
A Tramontano
A Wlodawer
AC Paiva
AE Keating
AE Torda
AG Murzin
AR Subramanian
AT Brunger
AT Brunger
B Wallner
BW Matthews
C Chothia
C Venclovas
C Venclovas
CG Roessler
CM Summa
CM Summa
D Baker
D Cozzetto
D Frishman
D Petrey
DH Ohlendorf
DT Jones
E di Luccio
E Di Luccio
E Saccenti
EL Sonnhammer
EN Brown
Eric di Luccio
G Chopra
G Vriend
GJ Kleywegt
GJ Kleywegt
H Yang
HW van Vlijmen
I Friedberg
IY Koh
J Kopp
J Moult
J Moult
J Moult
J Warringer
J Zhu
JC Kendrew
JD Thompson
JW Ponder
K Fidelis
K Wuthrich
K Wuthrich
K Wuthrich
KM Misura
KR Acharya
KR Acharya
LJ McGuffin
M Levitt
M Levitt
M Levitt
M Levitt
M Tress
M Tress
M Vasquez
M Wiederstein
MA Hanson
MA Olson
MJ Sippl
MY Shen
N Eswar
N Guex
N Siew
NV Buchete
ON Jensen
P Benkert
P Koehl
P Koehl
P Koehl
P Koehl
PA Alexander
Patrice Koehl
Q Fang
RA Laskowski
RC Edgar
RL Dunbrack Jr
RL Dunbrack Jr
RL Dunbrack Jr
S Grzesiek
SC Lovell
SC Lovell
SR Eddy
T Schwede
WJ Browne
X Yu
X Zhang
Publication venue: BioMed Central
Publication date: 01/02/2011
Field of study

Abstract Background The analysis of protein structures provides fundamental insight into most biochemical functions and consequently into the cause and possible treatment of diseases. As the structures of most known proteins cannot be solved experimentally for technical or sometimes simply for time constraints, <it>in silico </it>protein structure prediction is expected to step in and generate a more complete picture of the protein structure universe. Molecular modeling of protein structures is a fast growing field and tremendous works have been done since the publication of the very first model. The growth of modeling techniques and more specifically of those that rely on the existing experimental knowledge of protein structures is intimately linked to the developments of high resolution, experimental techniques such as NMR, X-ray crystallography and electron microscopy. This strong connection between experimental and <it>in silico </it>methods is however not devoid of criticisms and concerns among modelers as well as among experimentalists. Results In this paper, we focus on homology-modeling and more specifically, we review how it is perceived by the structural biology community and what can be done to impress on the experimentalists that it can be a valuable resource to them. We review the common practices and provide a set of guidelines for building better models. For that purpose, we introduce the H-factor, a new indicator for assessing the quality of homology models, mimicking the R-factor in X-ray crystallography. The methods for computing the H-factor is fully described and validated on a series of test cases. Conclusions We have developed a web service for computing the H-factor for models of a protein structure. This service is freely accessible at <url>http://koehllab.genomecenter.ucdavis.edu/toolkit/h-factor</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

The human phylome

Author: A Meyer
A Rokas
A Rokas
AC Berglund-Sonnhammer
AM Aguinaldo
C Roth
C Seoighe
C Vogel
CG Kurland
CG Kurland
CM Zmasek
CM Zmasek
D Penny
DT Jones
ES Lander
EV Koonin
F Delsuc
F Ronquist
FD Ciccarelli
G Panopoulou
G Ricard
H Akaike
H Dopazo
H Philippe
Hernán Dopazo
I Humphery-Smith
J Adachi
J Nielsen
J Zhang
JA Bailey
JA Eisen
Jaime Huerta-Cepas
JC Chiu
JC Venter
JD McPherson
JE Blair
JO Andersson
Joaquín Dopazo
JW Thomas
K Misawa
L Arvestad
L Bromham
L Duret
L Li
M Hallet
M Kullberg
M Pruess
MA Huynen
MA Huynen
MR Goldsmith
N Alvarez
NW Blackstone
O Gascuel
O Jeffroy
PJ Keeling
PS Dehal
RC Edgar
RL Tatusov
S Guindon
S Henikoff
S Ohno
S Whelan
SA Benner
SE Fisher
SL Salzberg
T Blomme
T Cavalier-Smith
T Dagan
T Gabaldón
T Gabaldón
T Gabaldón
T Gabaldón
T Gabaldón
T Hulsen
T Müller
T Ohta
T Sicheritz-Ponten
TF Smith
TK Gandhi
TM Keane
Toni Gabaldón
TR Buckley
U Bergthorsson
V van Noort
WJ Bruno
WJ Murphy
WM Fitch
Y Suzuki
YI Wolf
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

The human phylome, which includes evolutionary relationships of all human proteins and their homologs among thirty-nine fully sequenced eukaryotes, is reconstructed

CiteSeerX

Crossref

PubMed Central