Search CORE

352 research outputs found

FastBLAST: Homology Relationships for Millions of Proteins

Author: A Marchler-Bauer
AA Schaffer
Adam P. Arkin
BE Suzek
Cecile Fairhead
CH Wu
CM Zmasek
D Wilson
F Pearl
H Mi
I Letunic
JD Selengut
LB Koski
M Remm
MN Price
Morgan N. Price
NJ Mulder
Paramvir S. Dehal
PS Dehal
R Durbin
RD Finn
RL Tatusov
S Yooseph
SF Altschul
W Gish
W Li
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

BackgroundAll-versus-all BLAST, which searches for homologous pairs of sequences in a database of proteins, is used to identify potential orthologs, to find new protein families, and to provide rapid access to these homology relationships. As DNA sequencing accelerates and data sets grow, all-versus-all BLAST has become computationally demanding.Methodology/principal findingsWe present FastBLAST, a heuristic replacement for all-versus-all BLAST that relies on alignments of proteins to known families, obtained from tools such as PSI-BLAST and HMMer. FastBLAST avoids most of the work of all-versus-all BLAST by taking advantage of these alignments and by clustering similar sequences. FastBLAST runs in two stages: the first stage identifies additional families and aligns them, and the second stage quickly identifies the homologs of a query sequence, based on the alignments of the families, before generating pairwise alignments. On 6.53 million proteins from the non-redundant Genbank database ("NR"), FastBLAST identifies new families 25 times faster than all-versus-all BLAST. Once the first stage is completed, FastBLAST identifies homologs for the average query in less than 5 seconds (8.6 times faster than BLAST) and gives nearly identical results. For hits above 70 bits, FastBLAST identifies 98% of the top 3,250 hits per query.Conclusions/significanceFastBLAST enables research groups that do not have supercomputers to analyze large protein sequence data sets. FastBLAST is open source software and is available at http://microbesonline.org/fastblast

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

MGEScan-non-LTR: computational identification and classification of autonomous non-LTR retrotransposons in eukaryotic genomes

Author: Adams
Altschul
Blesa
Blesa
Burke
Burke
Dehal
Eddy
Edgar
H. Tang
Hessa
Hizer
Krogh
Kyte
Lander
Lovsin
Luan
M. Rho
Malik
McCarthy
Novikova
Permanyer
Sea Urchin Genome Sequencing Consortium
Tu
Unge
Volff
Wimley
Publication venue: Oxford University Press
Publication date
Field of study

Computational methods for genome-wide identification of mobile genetic elements (MGEs) have become increasingly necessary for both genome annotation and evolutionary studies. Non-long terminal repeat (non-LTR) retrotransposons are a class of MGEs that have been found in most eukaryotic genomes, sometimes in extremely high numbers. In this article, we present a computational tool, MGEScan-non-LTR, for the identification of non-LTR retrotransposons in genomic sequences, following a computational approach inspired by a generalized hidden Markov model (GHMM). Three different states represent two different protein domains and inter-domain linker regions encoded in the non-LTR retrotransposons, and their scores are evaluated by using profile hidden Markov models (for protein domains) and Gaussian Bayes classifiers (for linker regions), respectively. In order to classify the non-LTR retrotransposons into one of the 12 previously characterized clades using the same model, we defined separate states for different clades. MGEScan-non-LTR was tested on the genome sequences of four eukaryotic organisms, Drosophila melanogaster, Daphnia pulex, Ciona intestinalis and Strongylocentrotus purpuratus. For the D. melanogaster genome, MGEScan-non-LTR found all known ‘full-length’ elements and simultaneously classified them into the clades CR1, I, Jockey, LOA and R1. Notably, for the D. pulex genome, in which no non-LTR retrotransposon has been annotated, MGEScan-non-LTR found a significantly larger number of elements than did RepeatMasker, using the current version of the RepBase Update library. We also identified novel elements in the other two genomes, which have only been partially studied for non-LTR retrotransposons

Crossref

PubMed Central

MicrobesOnline: an integrated portal for comparative and functional genomics

Author: A. P. Arkin
Alm
Badger
Berman
Bland
D. Chivian
E. J. Alm
G. D. Friedland
I. L. Dubchak
J. K. Baumohl
J. T. Bates
K. H. Huang
K. Keller
Kanehisa
Lowe
M. N. Price
M. P. Joachimiak
Mi
Nikolskaya
P. S. Dehal
P. S. Novichkov
Price
Tatusov
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/09/2009
Field of study

Since 2003, MicrobesOnline (http://www.microbesonline.org) has been providing a community resource for comparative and functional genome analysis. The portal includes over 1000 complete genomes of bacteria, archaea and fungi and thousands of expression microarrays from diverse organisms ranging from model organisms such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as Desulfovibrio vulgaris and Shewanella oneidensis. To assist in annotating genes and in reconstructing their evolutionary history, MicrobesOnline includes a comparative genome browser based on phylogenetic trees for every gene family as well as a species tree. To identify co-regulated genes, MicrobesOnline can search for genes based on their expression profile, and provides tools for identifying regulatory motifs and seeing if they are conserved. MicrobesOnline also includes fast phylogenetic profile searches, comparative views of metabolic pathways, operon predictions, a workbench for sequence analysis and integration with RegTransBase and other microbial genome resources. The next update of MicrobesOnline will contain significant new functionality, including comparative analysis of metagenomic sequence data. Programmatic access to the database, along with source code and documentation, is available at http://microbesonline.org/programmers.html.United States. Dept. of Energy (Genomics: GTL program (grant DE-AC02-05CH11231)

PhyloPat: phylogenetic pattern analysis of eukaryotic genes

Author: A Kasprzyk
C Minguillon
DA Natale
DL Wheeler
E Birney
F Al-Shahrour
F Chen
GP Wagner
H Li
Jacob de Vlieg
JF Dufayard
JO Korbel
K Reichard
M Ashburner
Peter MA Groenen
PS Dehal
R Fredriksson
RC Edgar
S Guindon
T Hulsen
TA Eyre
Tim Hulsen
V Matys
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Phylogenetic patterns show the presence or absence of certain genes or proteins in a set of species. They can also be used to determine sets of genes or proteins that occur only in certain evolutionary branches. Phylogenetic patterns analysis has routinely been applied to protein databases such as COG and OrthoMCL, but not upon gene databases. Here we present a tool named PhyloPat which allows the complete Ensembl gene database to be queried using phylogenetic patterns. DESCRIPTION: PhyloPat is an easy-to-use webserver, which can be used to query the orthologies of all complete genomes within the EnsMart database using phylogenetic patterns. This enables the determination of sets of genes that occur only in certain evolutionary branches or even single species. We found in total 446,825 genes and 3,164,088 orthologous relationships within the EnsMart v40 database. We used a single linkage clustering algorithm to create 147,922 phylogenetic lineages, using every one of the orthologies provided by Ensembl. PhyloPat provides the possibility of querying with either binary phylogenetic patterns (created by checkboxes) or regular expressions. Specific branches of a phylogenetic tree of the 21 included species can be selected to create a branch-specific phylogenetic pattern. Users can also input a list of Ensembl or EMBL IDs to check which phylogenetic lineage any gene belongs to. The output can be saved in HTML, Excel or plain text format for further analysis. A link to the FatiGO web interface has been incorporated in the HTML output, creating easy access to functional information. Finally, lists of omnipresent, polypresent and oligopresent genes have been included. CONCLUSION: PhyloPat is the first tool to combine complete genome information with phylogenetic pattern querying. Since we used the orthologies generated by the accurate pipeline of Ensembl, the obtained phylogenetic lineages are reliable. The completeness and reliability of these phylogenetic lineages will further increase with the addition of newly found orthologous relationships within each new Ensembl release

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Radboud Repository

2R and remodeling of vertebrate signal transduction engine

Author: A Ma'ayan
A McLysaght
A Méjat
AI Su
AJ Vilella
AW Murray
B Papp
BA Hug
C Simillion
Carl Henrik Heldin
CM Zmasek
DM Krylov
F Pontén
GC Conant
GC Conant
H Chen
H Jaaro
H Li
JA Birchler
JH Kim
K Kuida
KH Wolfe
L Giot
L Huminiecki
L Huminiecki
Lukasz Huminiecki
M Freeling
M Kasahara
NH Putnam
O Jaillon
P Dehal
P Rakic
Q Cui
R Kafri
RB Vega
RD Emes
S Chang
S Falcon
S Kuraku
T Pawson
VR Chintapalli
Y Nakatani
Publication venue: BioMed Central
Publication date: 01/12/2010
Field of study

Abstract Background Whole genome duplication (WGD) is a special case of gene duplication, observed rarely in animals, whereby all genes duplicate simultaneously through polyploidisation. Two rounds of WGD (2R-WGD) occurred at the base of vertebrates, giving rise to an enormous wave of genetic novelty, but a systematic analysis of functional consequences of this event has not yet been performed. Results We show that 2R-WGD affected an overwhelming majority (74%) of signalling genes, in particular developmental pathways involving receptor tyrosine kinases, Wnt and transforming growth factor-β ligands, G protein-coupled receptors and the apoptosis pathway. 2R-retained genes, in contrast to tandem duplicates, were enriched in protein interaction domains and multifunctional signalling modules of Ras and mitogen-activated protein kinase cascades. 2R-WGD had a fundamental impact on the cell-cycle machinery, redefined molecular building blocks of the neuronal synapse, and was formative for vertebrate brains. We investigated 2R-associated nodes in the context of the human signalling network, as well as in an inferred ancestral pre-2R (AP2R) network, and found that hubs (particularly involving negative regulation) were preferentially retained, with high connectivity driving retention. Finally, microarrays and proteomics demonstrated a trend for gradual paralog expression divergence independent of the duplication mechanism, but inferred ancestral expression states suggested preferential subfunctionalisation among 2R-ohnologs (2ROs). Conclusions The 2R event left an indelible imprint on vertebrate signalling and the cell cycle. We show that 2R-WGD preferentially retained genes are associated with higher organismal complexity (for example, locomotion, nervous system, morphogenesis), while genes associated with basic cellular functions (for example, translation, replication, splicing, recombination; with the notable exception of cell cycle) tended to be excluded. 2R-WGD set the stage for the emergence of key vertebrate functional novelties (such as complex brains, circulatory system, heart, bone, cartilage, musculature and adipose tissue). A full explanation of the impact of 2R on evolution, function and the flow of information in vertebrate signalling networks is likely to have practical consequences for regenerative medicine, stem cell therapies and cancer treatment.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Finished genome of the fungal wheat pathogen Mycosphaerella graminicola Reveals dispensome structure, chromosome plasticity, and stealth pathogenesis.

201

Repository Open Access to Scientific Information from Embrapa

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

RCAAP - Repositório Científico de Acesso Aberto de Portugal

TreeFam: 2008 Update

Author: A. Coghlan
A. J. Vilella
A. Moses
A. Ureta-Vidal
Brown
Chen
Dehal
Edgar
Fitch
Guindon
H. Li
Haas
Haas
Hertz-Fowler
Huerta-Cepas
J. Qin
J. Ruan
J. Wang
J.-K. Heriche
K. Kristiansen
Koonin
Krishnamurthy
L. Bolund
L. J. M. Coin
Li
Meinel
O'Brien
Povey
R. Durbin
R. Li
S. Vang
T. Liu
Tatusov
Wu
Y. Guo
Y. Hu
Yu
Z. Chen
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

TreeFam (http://www.treefam.org) was developed to provide curated phylogenetic trees for all animal gene families, as well as orthologue and paralogue assignments. Release 4.0 of TreeFam contains curated trees for 1314 families and automatically generated trees for another 14 351 families. We have expanded TreeFam to include 25 fully sequenced animal genomes, as well as four genomes from plant and fungal outgroup species. We have also introduced more accurate approaches for automatically grouping genes into families, for building phylogenetic trees, and for inferring orthologues and paralogues. The user interface for viewing phylogenetic trees and family information has been improved. Furthermore, a new perl API lets users easily extract data from the TreeFam mysql database

Crossref

PubMed Central

University of Southern Denmark Research Output

University of Melbourne Institutional Repository

University of Queensland eSpace

Ultra-fast sequence clustering from similarity networks with SiLiX

Author: A Krishnamurthy
AJ Enright
AJ Vilella
AY Signorovitch
F Servant
H Li
HJ Atkinson
I Katriel
J Ruan
JL Boore
JM Joseph
KD Pruitt
Laurent Duret
MH Alsuwaiyel
PK Wall
PS Dehal
R Petryszak
R Tarjan
RD Finn
RE Tarjan
S Hartmann
S Hunter
S Penel
S Vishwanathan
SF Altschul
Simon Penel
SK Das
T Meinel
T Wittkop
Vincent Miele
Y Bramoulle
Y Han
Y Loewenstein
Y Tian
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The number of gene sequences that are available for comparative genomics approaches is increasing extremely quickly. A current challenge is to be able to handle this huge amount of sequences in order to build families of homologous sequences in a reasonable time. Results We present the software package <monospace>SiLiX</monospace> that implements a novel method which reconsiders single linkage clustering with a graph theoretical approach. A parallel version of the algorithms is also presented. As a demonstration of the ability of our software, we clustered more than 3 millions sequences from about 2 billion BLAST hits in 7 minutes, with a high clustering quality, both in terms of sensitivity and specificity. Conclusions Comparing state-of-the-art software, <monospace>SiLiX</monospace> presents the best up-to-date capabilities to face the problem of clustering large collections of sequences. <monospace>SiLiX</monospace> is freely available at <url>http://lbbe.univ-lyon1.fr/SiLiX</url>.</p

Crossref

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

HAL Descartes

Expansion of voltage-dependent Na+ channel gene family in early tetrapods coincided with the emergence of terrestriality and increased brain complexity

Author: Akopian
Angelino
Blomme
Caldwell
Clack
Cummins
Cummins
Dehal
Dib-Hajj
Dorward
Duflocq
Farré
Glenner
Goldin
Hains
Harold H. Zakon
Hasenstaub
Hedges
Hellsten
Herzog
Hill
Hoegg
Hu
Hunt
Hurley
Jackson
Jarnot
Kellis
Kuraku
Lopreato
Lorincz
Madsen
Maeda
Maier
Manda C. Jost
Meyer
Milinkovitch
Northcutt
Novak
Ogiwara
Okamura
Panopoulou
Piontkivska
Plummer
Proske
Ross
Saito
Sallan
Schmidt-Hieber
Shedlock
Sneddon
Sneddon
Soares
van Wart
von Düring
Watanabe
Westenbroek
Whitaker
Whitaker
Ying Lu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 29/11/2010
Field of study

Author Posting. © The Authors, 2010. This is the author's version of the work. It is posted here by permission of Oxford University Press for personal use, not for redistribution. The definitive version was published in Molecular Biology and Evolution 28 (2011): 1415-1424, doi:10.1093/molbev/msq325.Mammals have 10 voltage-dependent sodium (Nav) channel genes. Nav channels are expressed in different cell types with different sub-cellular distributions and are critical for many aspects of neuronal processing. The last common ancestor of teleosts and tetrapods had four Nav channel genes presumably on four different chromosomes. In the lineage leading to mammals a series of tandem duplications on two of these chromosomes more than doubled the number of Nav channel genes. It is unknown when these duplications occurred, whether they occurred against a backdrop of duplication of flanking genes on their chromosomes, or as an expansion of ion channel genes in general. We estimated key dates of the Nav channel gene family expansion by phylogenetic analysis using teleost, elasmobranch, lungfish, amphibian, avian, lizard, and mammalian Nav channel sequences, as well as chromosomal synteny for tetrapod genes. We tested, and exclude, the null hypothesis that Nav channel genes reside in regions of chromosomes prone to duplication by demonstrating the lack of duplication or duplicate retention of surrounding genes. We also find no comparable expansion in other voltage dependent ion channel gene families of tetrapods following the teleost-tetrapod divergence. We posit a specific expansion of the Nav channel gene family in the Devonian and Carboniferous periods when tetrapods evolved, diversified, and invaded the terrestrial habitat. During this time the amniote forebrain evolved greater anatomical complexity and novel tactile sensory receptors appeared. The duplication of Nav channel genes allowed for greater regional specialization in Nav channel expression, variation in sub-cellular localization, and enhanced processing of somatosensory input.This work was funded by the National Science Foundation (IBN 0236147 to H.H.Z and M.C.J), and the National Institutes of Health (R01GM084879 to H.H.Z)

Crossref

Woods Hole Open Access Server

PubMed Central

Cross-validated methods for promoter/transcription start site mapping in SL trans-spliced genes, established using the Ciona intestinalis troponin I gene

Author: Blumenthal
Bucher
C. L. Cleto
Carninci
Conrad
Conrad
Conrad
Corbo
Dehal
Deng
Graham
Hastings
Hikosaka
Juven-Gershon
K. E. M. Hastings
K. Nakai
K. Okamura
Krause
Kusakabe
Kusakabe
MacLean
Matthews
Nilsen
P. Khare
Park
Park
Patikoglou
Reese
S. I. Mortimer
Satou
Satou
T. H. Meedel
T. Kusakabe
Tawe
The FANTOM Consortium
Vandenberghe
Y. Suzuki
Publication venue: Oxford University Press
Publication date
Field of study

In conventionally-expressed eukaryotic genes, transcription start sites (TSSs) can be identified by mapping the mature mRNA 5′-terminal sequence onto the genome. However, this approach is not applicable to genes that undergo pre-mRNA 5′-leader trans-splicing (SL trans-splicing) because the original 5′-segment of the primary transcript is replaced by the spliced leader sequence during the trans-splicing reaction and is discarded. Thus TSS mapping for trans-spliced genes requires different approaches. We describe two such approaches and show that they generate precisely agreeing results for an SL trans-spliced gene encoding the muscle protein troponin I in the ascidian tunicate chordate Ciona intestinalis. One method is based on experimental deletion of trans-splice acceptor sites and the other is based on high-throughput mRNA 5′-RACE sequence analysis of natural RNA populations in order to detect minor transcripts containing the pre-mRNA’s original 5′-end. Both methods identified a single major troponin I TSS located ∼460 nt upstream of the trans-splice acceptor site. Further experimental analysis identified a functionally important TATA element 31 nt upstream of the start site. The two methods employed have complementary strengths and are broadly applicable to mapping promoters/TSSs for trans-spliced genes in tunicates and in trans-splicing organisms from other phyla

Crossref

PubMed Central