Search CORE

6 research outputs found

Proteinortho: Detection of (Co-)orthologs in large-scale analysis

Author: A Alexeyenko
A Force
A Nakabachi
A Schneider
AE Hirsh
AJ Enright
C Lanczos
D Cornaz
DM Kristensen
E Pruesse
EV Koonin
IK Jordan
J Hopcroft
JP McCutcheon
L Li
Lydia Steiner
M Fiedler
M Fiedler
M Remm
M Sikdar
Manja Marz
Marcus Lechner
MC Rivera
P Bork
Peter F Stadler
RL Tatusov
S Guattery
SM van Dongen
Sonja J Prohaska
Sven Findeiß
TJ Hubbard
WM Fitch
Z Fu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases. Results The program <monospace>Proteinortho</monospace> described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply <monospace>Proteinortho</monospace> to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes. Conclusions <monospace>Proteinortho</monospace> significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Fraunhofer-ePrints

PubMed Central

Permanent Hosting, Archiving and Indexing of Digital Resources and Assets

Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species

Carleton University's Institutional Repository