Search CORE

63 research outputs found

Improving pan-genome annotation using whole genome multiple alignment

Author: Angiuoli Samuel V
Dunning Hotopp Julie C
Salzberg Steven L
Tettelin Herve
Publication venue
Publication date: 30/06/2011
Field of study

Background: Rapid annotation and comparisons of genomes from multiple isolates (pan-genomes) is becoming commonplace due to advances in sequencing technology. Genome annotations can contain inconsistencies and errors that hinder comparative analysis even within a single species. Tools are needed to compare and improve annotation quality across sets of closely related genomes. Results: We introduce a new tool, Mugsy-Annotator, that identifies orthologs and evaluates annotation quality in prokaryotic genomes using whole genome multiple alignment. Mugsy-Annotator identifies anomalies in annotated gene structures, including inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of species pan-genomes using the tool indicates that such anomalies are common, especially at translation initiation sites. Mugsy-Annotator reports alternate annotations that improve consistency and are candidates for further review. Conclusions: Whole genome multiple alignment can be used to efficiently identify orthologs and annotation problem areas in a bacterial pan-genome. Comparisons of annotated gene structures within a species may show more variation than is actually present in the genome, indicating errors in genome annotation. Our new tool Mugsy-Annotator assists re-annotation efforts by highlighting edits that improve annotation consistency.https://doi.org/10.1186/1471-2105-12-27

Springer - Publisher Connector

PubMed Central

Digital Repository at the University of Maryland

Using Sybil for interactive comparative genomics of microbes on the web

Author: Altschul
Angiuoli
Angiuoli
Angiuoli
Arnaud
Bendtsen
Blanchette
Crabtree
Darling
David R. Riley
Eddy
Frazer
Hervé Tettelin
Hsiao
Jaccard
Jonathan Crabtree
Julie C. Dunning Hotopp
Kent
Kolaskar
Krogh
Krzywinski
Larsen
McKay
Mera
Mungall
Nielsen
O'Brien
Orvis
Pan
Samuel V. Angiuoli
Sette
Smedley
Smit
Stajich
Stein
Wang
Yu
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Analysis of multiple genomes requires sophisticated tools that provide search, visualization, interactivity and data export. Comparative genomics datasets tend to be large and complex, making development of these tools difficult. In addition to scalability, comparative genomics tools must also provide user-friendly interfaces such that the research scientist can explore complex data with minimal technical expertise

Crossref

PubMed Central

Recommended from our members

Towards a Library of Standard Operating Procedures (SOPs) for (meta)genomic annotation

Author: Angiuoli Samuel V.
Cochrane Guy
Field Dawn
Garrity George
Gussman Aaron
Klimke William
Kodira Chinnappa D.
Kyrpides Nikos
Kyrpides Nikos
Madupu Ramana
Markowitz Victor
Tatusova Tatiana
Thomson Nick
White Owen
Publication venue: Lawrence Berkeley National Laboratory
Publication date: 01/04/2008
Field of study

Genome annotations describe the features of genomes and accompany sequences in genome databases. The methodologies used to generate genome annotation are diverse and typically vary amongst groups. Descriptions of the annotation procedure are helpful in interpreting genome annotation data. Standard Operating Procedures (SOPs) for genome annotation describe the processes that generate genome annotations. Some groups are currently documenting procedures but standards are lacking for structure and content of annotation SOPs. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse a central online repository of SOPs

UNT Digital Library

Mugsy: fast multiple alignment of closely related whole genomes

Author: Ahn
Batzoglou
Blanchette
Bourque
Bradley
Bray
Chen
Corel
Darling
Darling
Deloger
Dewey
Dewey
Doring
Dubchak
Edgar
Edmonds
Elias
Ford
Gusfield
Hohl
IHGSC
Jacobson
Kent
Kurtz
Levy
Li
Margulies
Medini
Notredame
Paten
Paten
Pevzner
Raphael
Rausch
Rosenbloom
Samuel V. Angiuoli
Schwartz
Sherry
Steven L. Salzberg
Thompson
Treangen
Wang
Wheeler
Zhang
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: The relative ease and low cost of current generation sequencing technologies has led to a dramatic increase in the number of sequenced genomes for species across the tree of life. This increasing volume of data requires tools that can quickly compare multiple whole-genome sequences, millions of base pairs in length, to aid in the study of populations, pan-genomes, and genome evolution

Crossref

PubMed Central

CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

Author: A Bateman
A Bateman
A Tridgell
Aaron Gussman
AC Stewart
AL Delcher
B Langmead
B Langmead
BE Suzek
C Hemmerich
C Rapier
Cesar Arze
D Field
D Hull
David R Riley
DL Wheeler
DR Zerbino
E Afgan
EE Schadt
F Meyer
J Dean
J Goecks
J Orvis
J White
J White
J White
James R White
JD Selengut
JG Caporaso
JP Mesirov
JR Cole
JR Miller
JR White
JT Dudley
K Galens
K Keahey
K Lagesen
Kevin Galens
LD Stein
M Reich
Mahesh Vangala
Malcolm Matalka
MC Schatz
MC Schatz
MC Schatz
O Trelles
Owen White
PD Schloss
RC Edgar
RK Aziz
RL Tatusov
S Angiuoli
Samuel V Angiuoli
SD Kahn
SF Altschul
SF Altschul
SR Eddy
TM Lowe
W Florian Fricke
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.https://doi.org/10.1186/1471-2105-12-35

Crossref

Springer - Publisher Connector

PubMed Central

Digital Repository at the University of Maryland

Draft Genome of the Filarial Nematode Parasite \u3ci\u3eBrugia malayi\u3c/i\u3e

Author: al et
Allen Jonathan E.
Amedeo Paolo
Angiuoli Samuel V.
Barton Geoffrey J.
Caler Elisabet
Carlow Clotilde K.S.
Crabtree Jonathan
Crawford Michael J.
Creasy Todd
Daub Jennifer
Delcher Arthur L.
El-Sayed Najib M.
Feldblyum Tamara
Ghedin Elodie
Guiliano David B.
Haas Brian
Koo Hean
Miranda-Saavedra Diego
Pertea Mihaela
Pop Mihai
Salzberg Steven L.
Schatz Michael
Schobel Seth
Shumway Martin
Spiro David
Tallon Luke
Wang Shiliang
White Owen
Williams Steven A.
Wortman Jennifer R.
Zhao Qi
Publication venue: Smith ScholarWorks
Publication date: 21/09/2007
Field of study

Parasitic nematodes that cause elephantiasis and river blindness threaten hundreds of millions of people in the developing world. We have sequenced the ∼90 megabase (Mb) genome of the human filarial parasite Brugia malayi and predict ∼11,500 protein coding genes in 71 Mb of robustly assembled sequence. Comparative analysis with the free-living, model nematode Caenorhabditis elegans revealed that, despite these genes having maintained little conservation of local synteny during ∼350 million years of evolution, they largely remain in linkage on chromosomal units. More than 100 conserved operons were identified. Analysis of the predicted proteome provides evidence for adaptations of B. malayi to niches in its human and vector hosts and insights into the molecular basis of a mutualistic relationship with its Wolbachia endosymbiont. These findings offer a foundation for rational drug design

Smith College: Smith ScholarWorks

Correction: Comparative Genomics of Emerging Human Ehrlichiosis Agents

Crossref

Directory of Open Access Journals

PubMed Central

Comparative Genomics of Emerging Human Ehrlichiosis Agents

Anaplasma (formerly Ehrlichia) phagocytophilum, Ehrlichia chaffeensis, and Neorickettsia (formerly Ehrlichia) sennetsu are intracellular vector-borne pathogens that cause human ehrlichiosis, an emerging infectious disease. We present the complete genome sequences of these organisms along with comparisons to other organisms in the Rickettsiales order. Ehrlichia spp. and Anaplasma spp. display a unique large expansion of immunodominant outer membrane proteins facilitating antigenic variation. All Rickettsiales have a diminished ability to synthesize amino acids compared to their closest free-living relatives. Unlike members of the Rickettsiaceae family, these pathogenic Anaplasmataceae are capable of making all major vitamins, cofactors, and nucleotides, which could confer a beneficial role in the invertebrate vector or the vertebrate host. Further analysis identified proteins potentially involved in vacuole confinement of the Anaplasmataceae, a life cycle involving a hematophagous vector, vertebrate pathogenesis, human pathogenesis, and lack of transovarial transmission. These discoveries provide significant insights into the biology of these obligate intracellular pathogens

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

The Princeton Protein Orthology Database (P-POD): A Comparative Genomics Analysis Tool for Biologists

Author: A Alexeyenko
A Alexeyenko
A Chatterjee
A Shaag
AB Clark
AJ Herr
AK Agarwal
AS Payne
B Garavaglia
B Samuel Lattimore
Berend Snel
C Garbers
C Srinivasan
CA Hu
CE Grubenmann
CG Frank
CH Kocken
Charles Lu
CJ Loewen
CJ Penkett
DA Benson
DA Pearce
David Botstein
DC Gowda
DJ Kelleher
DL Wheeler
E Catoni
EI Boyle
EJ Vonarx
EV Koonin
F Chen
F Chen
F Liang
Fan Kang
G Hsi
G Schaffar
GF Xu
H Bussey
HS Feiler
I Mayordomo
J Archambault
J Brzeski
J Gecz
J Jantti
J Lenffer
JD Thompson
JF Mercer
JJ Heinisch
K Lai
K Lillard-Wetherell
K Okada
K Yamagata
Kara Dolinski
KP O'Brien
KP O'Brien
L Covic
L Desmyter
L Li
M Forsgren
M Geisler
M Raymond
M Schiott
M Schwarz
M Takeuchi
ME Lucas
MH Kedees
Michael S. Livstone
MM Lanterman
MT Geraghty
N Mamiya
N Raben
N Wagner
NF Neff
O Johnstone
Owen White
P Cavadini
P Poullet
P Sung
PA Colussi
PG Morgante
PJ Keeling
PJ Schmidt
PM Krumpelman
R Ballester
R Boyum
R Jothi
R Kellermayer
R Mancini
R Portmann
R Tommasini
RD Saunders
RK McEwen
RK Raymond
RL Tatusov
Rose Oughtred
S Hofmann
S Nomoto
S Roje
S Tomita
S van Wilpe
S Willingham
Samuel V. Angiuoli
SJ Kron
SK Dutcher
SN Guzder
SS Dwight
Sven Heinicke
T Kataoka
T Kleinow
T Kulikova
T Morita
T Sone
U Rothbauer
V Lumbreras
VK Ton
VK Ton
WK Schmidt
WY Song
XD Gao
Y Chen
Y Kida
Y Lee
Y Onodera
Y Sambongi
Z Peng
Publication venue: Public Library of Science
Publication date: 01/08/2007
Field of study

Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic relationships among predicted orthologs (based on the OrthoMCL method) to a query gene from any of eight eukaryotic organisms, and to see the orthologs in a wider evolutionary context (based on the Jaccard clustering method). In addition to the phylogenetic information, the database contains experimental results manually collected from the literature that can be compared to the computational analyses, as well as links to relevant human disease and gene information via the OMIM, model organism, and sequence databases. Our aim is for the P-POD resource to be extremely useful to typical experimental biologists wanting to learn more about the evolutionary context of their favorite genes. P-POD is based on the commonly used Generic Model Organism Database (GMOD) schema and can be downloaded in its entirety for installation on one's own system. Thus, bioinformaticians and software developers may also find P-POD useful because they can use the P-POD database infrastructure when developing their own comparative genomics resources and database tools

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species

Author: A Muzzi
A Polissi
A Stamatakis
A. B. Dalia
AB Brueggemann
AB Brueggemann
AB Brueggemann
AC Darling
AJ Drummond
Alessandro Muzzi
Antonello Covacci
B Henriques-Normark
BA Green
BR Levin
C Fraser
C Giefing
C Obert
C Steinfort
Claudio Donati
D Chiavolini
D Rolo
DA Baltrus
David R Riley
DE Briles
DH Huson
DL Hava
E Paradis
E Richard Moxon
E Swiatlo
EJ Feil
EJ Feil
EJ Feil
F Bagnoli
F Chi
F Ding
F Iannelli
Fen Z Hu
G McVean
Garth D Ehrlich
GD Ehrlich
H Shimodaira
H Tettelin
H Tettelin
H Tettelin
Hervé Tettelin
J Dagerhamn
J Feng
J Hamel
J Hein
J Hoskins
JA Lanie
JC Paton
JG Lawrence
JS Hogg
Julie C Dunning Hotopp
K Overweg
K Tamura
K Tamura
K Tamura
KA Jolley
M Friendly
M Kilian
M Kilian
M Moschioni
M Touchon
Marco Oggioni
MC Enright
Morgens Kilian
N Luisa Hiller
Nicholas J Croucher
NJ Croucher
NL Hiller
R Development Core Team
R Hakenbeck
R Suzuki
RC Lewontin
Rino Rappuoli
RJ Redfield
Samuel V Angiuoli
SJ King
SK Hollingshead
SS Tai
Stephen D Bentley
T Lefebure
TF Cooper
Tim J Mitchell
Vega Masignani
WR Pearson
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background Streptococcus pneumoniae is one of the most important causes of microbial diseases in humans. The genomes of 44 diverse strains of S. pneumoniae were analyzed and compared with strains of non-pathogenic streptococci of the Mitis group. Results Despite evidence of extensive recombination, the S. pneumoniae phylogenetic tree revealed six major lineages. With the exception of serotype 1, the tree correlated poorly with capsular serotype, geographical site of isolation and disease outcome. The distribution of dispensable genes, genes present in not all, but more than one strain, was consistent with phylogeny, although horizontal gene transfer events attenuated this correlation in the case of ancient lineages. Homologous recombination, involving short stretches of DNA, was the dominant 13 evolutionary process of the core genome of S. pneumoniae. Genetic exchange occurred both within and across the borders of the species, and S. mitis was the main reservoir of genetic diversity of S. pneumoniae. The pan-genome size of S. pneumoniae increased logarithmically with the number of strains and linearly with the number of polymorphic sites of the sampled genomes, suggesting that acquired genes accumulate proportionately to the age of clones. Most genes associated with pathogenicity were shared by all S. pneumoniae strains, but were also present in S. mitis, S. oralis and S. infantis, indicating that these genes are not sufficient to determine virulence. Conclusion Genetic exchange with related species sharing the same ecological niche is the main mechanism of evolution of S. pneumoniae. The open pan genome guarantees the species a quick and economical response to diverse environments

Crossref

Springer - Publisher Connector

University of Birmingham Research Portal

PubMed Central

Enlighten