Search CORE

1,035 research outputs found

Three-Dimensional Phylogeny Explorer: Distinguishing paralogs, lateral transfer, and violation of "molecular clock" assumption with 3D visualization

Author: AJ Saldanha
Christopher Lee
CM Zmasek
CS Parr
DL Swofford
DL Wheeler
EV Koonin
G Trooskens
JD Retief
M Stallmann
MJ Sanderson
Namshin Kim
PL Lott
R Chenna
RD Page
RL Tatusov
RL Tatusov
RL Tatusov
RL Tatusov
S Kumar
SW Graham
Y Zhai
Z Du
Publication venue: BioMed Central
Publication date: 01/06/2007
Field of study

Abstract Background Construction and interpretation of phylogenetic trees has been a major research topic for understanding the evolution of genes. Increases in sequence data and complexity are creating a need for more powerful and insightful tree visualization tools. Results We have developed 3D Phylogeny Explorer (3DPE), a novel phylogeny tree viewer that maps trees onto three spatial axes (species on the X-axis; paralogs on Z; evolutionary distance on Y), enabling one to distinguish at a glance evolutionary features such as speciation; gene duplication and paralog evolution; lateral gene transfer; and violation of the "molecular clock" assumption. Users can input any tree on the online 3DPE, then rotate, scroll, rescale, and explore it interactively as "live" 3D views. All objects in 3DPE are clickable to display subtrees, connectivity path highlighting, sequence alignments, and gene summary views, and etc. To illustrate the value of this visualization approach for microbial genomes, we also generated 3D phylogeny analyses for all clusters from the public COG database. We constructed tree views using well-established methods and graph algorithms. We used Scientific Python to generate VRML2 3D views viewable in any web browser. Conclusion 3DPE provides a novel phylogenetic tree projection method into 3D space and its web-based implementation with live 3D features for reconstruction of phylogenetic trees of COG database.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Towards validating the hypothesis of phylogenetic profiling

Author: D Lin
EM Marcotte
J Handl
J Jäkel
J Seo
J Sun
J Sun
J Wu
M Pellegrini
Mazen Atwi
N Bolshakova
P Resnik
R Loganantharaj
Raja Loganantharaj
RL Tatusov
RL Tatusov
SF Altschul
SV Date
Publication venue: BioMed Central
Publication date: 01/11/2007
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Automatically extracting functionally equivalent proteins from SwissProt

Author: A Amores
A Meyer
A Wagner
AA Akindahunsi
Andrew CR Martin
CH Wu
E Kretschmann
EJ Stellwag
EV Koonin
F Chen
GX Yu
II Artamonova
JM Hurst
KP O'Brien
LB Koski
Lisa EM McMillan
MC Lill
MY Galperin
RA Notebaart
RL Tatusov
RL Tatusov
S Shibata
SB Rice
SF Altschul
T Hulsen
T Hulsen
V Kunin
V van Noort
WM Fitch
Y Lee
Y Yaron
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

UCL Discovery

PubMed Central

Enlighten

Syntenator: Multiple gene order alignments with a gene-specific scoring function

Author: A Alexeyenko
BJ Haas
C Grasso
C Lee
Christian Rödelsperger
Christoph Dieterich
F Boyer
FA Kondrashov
L Goodstadt
LD Stein
M Brudno
P Pevzner
RA Notebaart
RL Tatusov
RL Tatusov
S Schwartz
TF Smith
W Miller
WJ Murphy
X Wang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Identification of homologous regions or conserved syntenies across genomes is one crucial step in comparative genomics. This task is usually performed by genome alignment softwares like WABA or blastz. In case of conserved syntenies, such regions are defined as conserved gene orders. On the gene order level, homologous regions can even be found between distantly related genomes, which do not align on the nucleotide sequence level. Results We present a novel approach to identify regions of conserved synteny across multiple genomes. Syntenator represents genomes and alignments thereof as partial order graphs (POGs). These POGs are aligned by a dynamic programming approach employing a gene-specific scoring function. The scoring function reflects the level of protein sequence similarity for each possible gene pair. Our method consistently defines larger homologous regions in pairwise gene order alignments than nucleotide-level comparisons. Our method is superior to methods that work on predefined homology gene sets (as implemented in Blockfinder). Syntenator successfully reproduces 80% of the EnsEMBL man-mouse conserved syntenic blocks. The full potential of our method becomes visible by comparing remotely related genomes and multiple genomes. Gene order alignments potentially resolve up to 75% of the EnsEMBL 1:many orthology relations and 27% of the many:many orthology relations. Conclusion We propose Syntenator as a software solution to reliably infer conserved syntenies among distantly related genomes. The software is available from <url>http://www2.tuebingen.mpg.de/abt4/plone</url>.</p

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

MDC Repository

MPG.PuRe

Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes

Author: A Alexeyenko
A Hadgu
Aaron J. Mackey
AJ Enright
AJ Enright
CE Storm
CE Storm
Cecile Fairhead
CG Elsik
CM Zmasek
CM Zmasek
David S. Roos
DP Wall
EL Sonnhammer
EV Koonin
EV Koonin
F Chen
Feng Chen
H Hegyi
J Gouzy
J Magidson
JD Thompson
Jeroen K. Vermunt
JK Vermunt
JK Vermunt
KP O'Brien
L Li
LB Koski
M Remm
RF Doolittle
RL Tatusov
RL Tatusov
RL Tatusov
RL Tatusov
S Bandyopadhyay
S Henikoff
S Van Dongen
SF Altschul
SL Hui
T Hulsen
TF Deluca
WM Fitch
WM Fitch
Y Lee
Y Qu
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Orthology detection is critically important for accurate functional annotation, and has been widely used to facilitate studies on comparative and evolutionary genomics. Although various methods are now available, there has been no comprehensive analysis of performance, due to the lack of a genomic-scale ‘gold standard’ orthology dataset. Even in the absence of such datasets, the comparison of results from alternative methodologies contains useful information, as agreement enhances confidence and disagreement indicates possible errors. Latent Class Analysis (LCA) is a statistical technique that can exploit this information to reasonably infer sensitivities and specificities, and is applied here to evaluate the performance of various orthology detection methods on a eukaryotic dataset. Overall, we observe a trade-off between sensitivity and specificity in orthology detection, with BLAST-based methods characterized by high sensitivity, and tree-based methods by high specificity. Two algorithms exhibit the best overall balance, with both sensitivity and specificity>80%: INPARANOID identifies orthologs across two species while OrthoMCL clusters orthologs from multiple species. Among methods that permit clustering of ortholog groups spanning multiple genomes, the (automated) OrthoMCL algorithm exhibits better within-group consistency with respect to protein function and domain architecture than the (manually curated) KOG database, and the homolog clustering algorithm TribeMCL as well. By way of using LCA, we are also able to comprehensively assess similarities and statistical dependence between various strategies, and evaluate the effects of parameter settings on performance. In summary, we present a comprehensive evaluation of orthology detection on a divergent set of eukaryotic genomes, thus providing insights and guides for method selection, tuning and development for different applications. Many biological questions have been addressed by multiple tests yielding binary (yes/no) outcomes but no clear definition of truth, making LCA an attractive approach for computational biology

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Tilburg University Repository

Statistically validated networks in bipartite complex systems

Author: A McCallum
AL Barabási
AL Barabási
CM Song
DJ Watts
DY Kenett
Eshel Ben-Jacob
F Reed-Tsochas
F Schweitzer
Fabrizio Lillo
FD Ciccarelli
G Bonanno
J Bascompte
JP Onnela
JP Onnela
Jyrki Piilo
M Girvan
M Rosvall
M Tumminello
M Tumminello
M Tumminello
M Tumminello
MEJ Newman
MEJ Newman
MEJ Newman
Michele Tumminello
R Guimera
RG Miller
RL Tatusov
RL Tatusov
RN Mantegna
Rosario N. Mantegna
S Fortunato
Salvatore Miccichè
V Colizza
W Feller
Y Benjamini
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2010
Field of study

Many complex systems present an intrinsic bipartite nature and are often described and modeled in terms of networks [1-5]. Examples include movies and actors [1, 2, 4], authors and scientific papers [6-9], email accounts and emails [10], plants and animals that pollinate them [11, 12]. Bipartite networks are often very heterogeneous in the number of relationships that the elements of one set establish with the elements of the other set. When one constructs a projected network with nodes from only one set, the system heterogeneity makes it very difficult to identify preferential links between the elements. Here we introduce an unsupervised method to statistically validate each link of the projected network against a null hypothesis taking into account the heterogeneity of the system. We apply our method to three different systems, namely the set of clusters of orthologous genes (COG) in completely sequenced genomes [13, 14], a set of daily returns of 500 US financial stocks, and the set of world movies of the IMDb database [15]. In all these systems, both different in size and level of heterogeneity, we find that our method is able to detect network structures which are informative about the system and are not simply expression of its heterogeneity. Specifically, our method (i) identifies the preferential relationships between the elements, (ii) naturally highlights the clustered structure of investigated systems, and (iii) allows to classify links according to the type of statistically validated relationships between the connected nodes.Comment: Main text: 13 pages, 3 figures, and 1 Table. Supplementary information: 15 pages, 3 figures, and 2 Table

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

Archivio istituzionale della Ricerca - Scuola Normale Superiore

PubMed Central

Archivio istituzionale della ricerca - Università di Palermo

On strongly chordal graphs that are not leaf powers

Author: A Brandstädt
A Brandstädt
B Shutters
D Fulkerson
E Bibelnieks
H-J Bandelt
JP Spinrad
L Li
M Farber
M Lafond
M Steel
N Nishimura
R Nevries
R Nevries
R Paige
RL Tatusov
T Calamoneri
V Berry
W Kennedy
Publication venue
Publication date: 02/07/2017
Field of study

A common task in phylogenetics is to find an evolutionary tree representing proximity relationships between species. This motivates the notion of leaf powers: a graph G = (V, E) is a leaf power if there exist a tree T on leafset V and a threshold k such that uv is an edge if and only if the distance between u and v in T is at most k. Characterizing leaf powers is a challenging open problem, along with determining the complexity of their recognition. This is in part due to the fact that few graphs are known to not be leaf powers, as such graphs are difficult to construct. Recently, Nevries and Rosenke asked if leaf powers could be characterized by strong chordality and a finite set of forbidden subgraphs. In this paper, we provide a negative answer to this question, by exhibiting an infinite family \G of (minimal) strongly chordal graphs that are not leaf powers. During the process, we establish a connection between leaf powers, alternating cycles and quartet compatibility. We also show that deciding if a chordal graph is \G-free is NP-complete, which may provide insight on the complexity of the leaf power recognition problem

arXiv.org e-Print Archive

Crossref

GHOSTM: A GPU-Accelerated Homology Search Tool for Metagenomics

Author: AD Smith
B Langmead
H Li
H Li
JC Wooley
JC Wootton
JP Walters
K Kurokawa
Ken Kurokawa
M Kanehisa
M Kanehisa
M Kanehisa
Narcis Fernandez-Fuentes
PD Vouzis
PJ Turnbaugh
RD Finn
RL Tatusov
RL Tatusov
SF Altschul
SF Altschul
SF Altschul
Shuji Suzuki
Takashi Ishida
TF Smith
W Liu
WJ Kent
WR Pearson
Y Liu
Y Liu
Yutaka Akiyama
Publication venue: Public Library of Science
Publication date: 04/05/2012
Field of study

A large number of sensitive homology searches are required for mapping DNA sequence fragments to known protein sequences in public and private databases during metagenomic analysis. BLAST is currently used for this purpose, but its calculation speed is insufficient, especially for analyzing the large quantities of sequence data obtained from a next-generation sequencer. However, faster search tools, such as BLAT, do not have sufficient search sensitivity for metagenomic analysis. Thus, a sensitive and efficient homology search tool is in high demand for this type of analysis.We developed a new, highly efficient homology search algorithm suitable for graphics processing unit (GPU) calculations that was implemented as a GPU system that we called GHOSTM. The system first searches for candidate alignment positions for a sequence from the database using pre-calculated indexes and then calculates local alignments around the candidate positions before calculating alignment scores. We implemented both of these processes on GPUs. The system achieved calculation speeds that were 130 and 407 times faster than BLAST with 1 GPU and 4 GPUs, respectively. The system also showed higher search sensitivity and had a calculation speed that was 4 and 15 times faster than BLAT with 1 GPU and 4 GPUs.We developed a GPU-optimized algorithm to perform sensitive sequence homology searches and implemented the system as GHOSTM. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We developed GHOSTM, which is a cost-efficient tool, and offer this tool as a potential solution to this problem

Public Library of Science (PLOS)

Crossref

PubMed Central

Partial Homology Relations - Satisfiability in terms of Di-Cographs

Author: A Brandstädt
AM Altenhoff
AM Altenhoff
AM Altenhoff
C Crespelle
C Dessimoz
DG Corneil
DG Corneil
F Chen
F Gurski
G Östlund
J Engelfriet
J Sukumaran
JG Lawrence
K Hartmann
K Trachana
M Hellmuth
M Hellmuth
M Hellmuth
M Hellmuth
M Lafond
M Lafond
M Lafond
M Lechner
M Lechner
M Ravenhall
R Dondi
RL Tatusov
RM McConnell
S Böcker
WM Fitch
Y Gao
Y Liu
Publication venue
Publication date: 03/05/2018
Field of study

Directed cographs (di-cographs) play a crucial role in the reconstruction of evolutionary histories of genes based on homology relations which are binary relations between genes. A variety of methods based on pairwise sequence comparisons can be used to infer such homology relations (e.g.\ orthology, paralogy, xenology). They are \emph{satisfiable} if the relations can be explained by an event-labeled gene tree, i.e., they can simultaneously co-exist in an evolutionary history of the underlying genes. Every gene tree is equivalently interpreted as a so-called cotree that entirely encodes the structure of a di-cograph. Thus, satisfiable homology relations must necessarily form a di-cograph. The inferred homology relations might not cover each pair of genes and thus, provide only partial knowledge on the full set of homology relations. Moreover, for particular pairs of genes, it might be known with a high degree of certainty that they are not orthologs (resp.\ paralogs, xenologs) which yields forbidden pairs of genes. Motivated by this observation, we characterize (partial) satisfiable homology relations with or without forbidden gene pairs, provide a quadratic-time algorithm for their recognition and for the computation of a cotree that explains the given relations

arXiv.org e-Print Archive

Crossref

Syddansk Universitets Forskerportal

Algorithm of OMA for large-scale orthology inference

Author: A Alexeyenko
A Bateman
A Schneider
AC Berglund-Sonnhammer
AK Bjorklund
Alexander CJ Roth
AM Altenhoff
AR Mushegian
C Dessimoz
C Dessimoz
C Dessimoz
CEV Storm
Christophe Dessimoz
CM Zmasek
D Fulton
DA Benson
DP Wall
ELL Sonnhammer
Gaston H Gonnet
K Chen
L Jensen
L Li
M Dayhoff
M Farrar
M Gil
M Remm
P Flicek
R Balasubramanian
RA Notebaart
RL Tatusov
RL Tatusov
RTJMvan der Heijden
TF DeLuca
TF Smith
WM Fitch
Publication venue: BioMed Central
Publication date: 01/12/2008
Field of study

Since the publication of our article (Roth, Gonnet, and Dessimoz: BMC Bioinformatics 2008 9: 518), we have noticed several errors, which we correct in the following

Repository for Publications and Research Data

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UCL Discovery