Search CORE

207 research outputs found

GHOSTM: A GPU-Accelerated Homology Search Tool for Metagenomics

Author: AD Smith
B Langmead
H Li
H Li
JC Wooley
JC Wootton
JP Walters
K Kurokawa
Ken Kurokawa
M Kanehisa
M Kanehisa
M Kanehisa
Narcis Fernandez-Fuentes
PD Vouzis
PJ Turnbaugh
RD Finn
RL Tatusov
RL Tatusov
SF Altschul
SF Altschul
SF Altschul
Shuji Suzuki
Takashi Ishida
TF Smith
W Liu
WJ Kent
WR Pearson
Y Liu
Y Liu
Yutaka Akiyama
Publication venue: Public Library of Science
Publication date: 04/05/2012
Field of study

A large number of sensitive homology searches are required for mapping DNA sequence fragments to known protein sequences in public and private databases during metagenomic analysis. BLAST is currently used for this purpose, but its calculation speed is insufficient, especially for analyzing the large quantities of sequence data obtained from a next-generation sequencer. However, faster search tools, such as BLAT, do not have sufficient search sensitivity for metagenomic analysis. Thus, a sensitive and efficient homology search tool is in high demand for this type of analysis.We developed a new, highly efficient homology search algorithm suitable for graphics processing unit (GPU) calculations that was implemented as a GPU system that we called GHOSTM. The system first searches for candidate alignment positions for a sequence from the database using pre-calculated indexes and then calculates local alignments around the candidate positions before calculating alignment scores. We implemented both of these processes on GPUs. The system achieved calculation speeds that were 130 and 407 times faster than BLAST with 1 GPU and 4 GPUs, respectively. The system also showed higher search sensitivity and had a calculation speed that was 4 and 15 times faster than BLAT with 1 GPU and 4 GPUs.We developed a GPU-optimized algorithm to perform sensitive sequence homology searches and implemented the system as GHOSTM. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We developed GHOSTM, which is a cost-efficient tool, and offer this tool as a potential solution to this problem

Public Library of Science (PLOS)

Crossref

PubMed Central

Interactive metagenomic visualization in a Web browser

Author: A Brady
A Brady
A Dix
Adam M Phillippy
B Johnson
B Shneiderman
Brian D Ondov
DH Huson
EW Sayers
F Meyer
G Draper
GW Tyson
J Goecks
J Goll
J Qin
J Stasko
J Yang
JC Wooley
K Andrews
Nicholas H Bergman
Q Wang
S Mitra
SF Altschul
VD Pham
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables. Results Here we present Krona, a new visualization tool that allows intuitive exploration of relative abundances and confidences within the complex hierarchies of metagenomic classifications. Krona combines a variant of radial, space-filling displays with parametric coloring and interactive polar-coordinate zooming. The HTML5 and JavaScript implementation enables fully interactive charts that can be explored with any modern Web browser, without the need for installed software or plug-ins. This Web-based architecture also allows each chart to be an independent document, making them easy to share via e-mail or post to a standard Web server. To illustrate Krona's utility, we describe its application to various metagenomic data sets and its compatibility with popular metagenomic analysis tools. Conclusions Krona is both a powerful metagenomic visualization tool and a demonstration of the potential of HTML5 for highly accessible bioinformatic visualizations. Its rich and interactive displays facilitate more informed interpretations of metagenomic analyses, while its implementation as a browser-based application makes it extremely portable and easily adopted into existing analysis packages. Both the Krona rendering code and conversion tools are freely available under a BSD open-source license, and available from: <url>http://krona.sourceforge.net</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Useful Web sites for researchers studying proteins

Author: Appel RD
Brenner SE
C.L. Mafra
Franco GR
Goostein M
Harper R
J.H. Patarroyo
Jacobson D
L.S. Ozaki
Peters R
S.O. Paula
Sikorski RS
Wooley JC
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

Crossref

Statistical evaluation of methods for identification of differentially abundant genes in comparative metagenomics

Author: A Schwartzman
A Sjögren
B Beszteri
B Rodriguez-Brito
C Soneson
CW Law
DH Haft
DH Parks
DH Parks
DJ McCarthy
E Dugat-Bony
E Kristiansson
EL Lehmann
EM Ross
Erik Kristiansson
F Boulund
F Meyer
FH Karlsson
FH Karlsson
FJ Anscombe
G Casella
GK Smyth
I Nookaew
J Alneberg
J Handelsman
J Qin
J Qin
JA Frank
JC Wooley
JC Wooley
JD Storey
JN Paulson
JP Brooks
JR White
K Sanli
M Chafee
MB Sohn
MD Robinson
MD Robinson
MD Robinson
MI Love
MR Rondon
N Segata
Olle Nerman
P McCullagh
PD Schloss
PJ McMurdie
R Knight
R Liu
S Anders
SR Eddy
ST Kelley
T Fawcett
T Yatsunenko
TO Delmont
Tobias Österlund
Viktor Jonsson
VM Markowitz
WJ Kent
Y Benjamini
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data

Author: A Brady
A Charuvaka
A Lopez-Bueno
AE Darling
Andrés Moya
D Hernandez
DB Jaffe
DB Rusch
DC Richter
DD Sommer
DH Haft
DH Huson
DR Zerbino
EW Myers
F Meyer
GG Sutton
GW Tyson
I Letunic
I Maccallum
J Laserson
J Qin
JA Huber
JC Dohm
JC Dohm
JC Wooley
JO Korbel
Jonathan H. Badger
JR Miller
JR Miller
JT Simpson
K Liolios
K Mavromatis
KE Wommack
L Krause
M de la Bastide
M Margulies
M Pop
M Stark
M Wu
Miguel Pignatelli
MJ Chaisson
NN Diaz
OU Nalbantoglu
PJ Turnbaugh
PJ Turnbaugh
R Li
R Seshadri
RD Finn
RL Tatusov
RL Warren
S Batzoglou
S Levy
S Yooseph
SM Huse
SR Gill
T Schoenfeld
TS Ghosh
VM Markowitz
WJ Kent
WR Jeck
X Huang
X Huang
Y Ye
Publication venue: Public Library of Science
Publication date: 23/05/2011
Field of study

A frequent step in metagenomic data analysis comprises the assembly of the sequenced reads. Many assembly tools have been published in the last years targeting data coming from next-generation sequencing (NGS) technologies but these assemblers have not been designed for or tested in multi-genome scenarios that characterize metagenomic studies. Here we provide a critical assessment of current de novo short reads assembly tools in multi-genome scenarios using complex simulated metagenomic data. With this approach we tested the fidelity of different assemblers in metagenomic studies demonstrating that even under the simplest compositions the number of chimeric contigs involving different species is noticeable. We further showed that the assembly process reduces the accuracy of the functional classification of the metagenomic data and that these errors can be overcome raising the coverage of the studied metagenome. The results presented here highlight the particular difficulties that de novo genome assemblers face in multi-genome scenarios demonstrating that these difficulties, that often compromise the functional classification of the analyzed data, can be overcome with a high sequencing effort

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

An Efficient Rank Based Approach for Closest String and Closest Substring

Author: A Ben-Dor
A Dinu
AS Fraser
AS Fraser
AWC Liew
C de la Higuera
Chuhsing Kate Hsiao
DJ States
EV Koonin
F Nicolas
F Nicolas
J Gramm
J Palmer
JC Wooley
K Lanctot
L Schmitt
L Wang
Liviu P. Dinu
LP Dinu
LP Dinu
LP Dinu
LP Dinu
LP Dinu
LP Dinu
M Chimani
M Frances
M Karpovsky
M Li
P Diaconis
R Holmquist
Radu Ionescu
S Roman
VI Levenshtein
VY Popov
W Banzhaf
X Deng
X Liu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Efficacy of topical cobalt chelate CTC-96 against adenovirus in a cell culture model and against adenovirus keratoconjunctivitis in a rabbit model

Author: A Böttcher
AA Carmine
AH Wander
CBR de Oliveira
Charlie Srivilasa
DA Lennette
David Gershon
EG Romanoswki
EG Romanowski
EG Romanowski
GB Zamansky
Irene Winicov
IuF Maichuk
J Hillenkamp
J Hillenkamp
JA Schwartz
JC Hierhozer
JS Gordon
Katarina J Kristic
MD Trousdale
PA Asbell
PA Asbell
Penny A Asbell
PH Hoel
PH Hoel
PH Wooley
S Siegel
S Siegel
S Siegel
S Siegel
SD Cook
Seth P Epstein
T Takeuchi
WJZ Dixon
WP Rowe
WP Rowe
Yevgenia Y Pashinsky
YJ Gordon
YJ Gordon
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Adenovirus (Ad), associated with significant morbidity, has no topical treatment. A leading CTC compound (CTC-96), a Co(III )chelate, was found to have potent in vitro and in vivo antiviral efficacy against herpes viruses. In this study CTC-96 is being tested for possible anti-Adenovirus activity. METHODS: The biological anti-adenovirus activity of CTC-96 in concentrations from 5 to 250 ug/ml, was evaluated initially by viral inactivation (viral exposure to CTC-96 followed by dilution and inoculation of cells), virucidal (viral exposure to CTC-96 and inoculation of cells without dilution) and antiviral (effect of CTC-96 on previously adsorbed virus) plaque assays on HeLa (human cervical carcinoma), A549 (human lung carcinoma) and SIRC (rabbit corneal) cells. After verifying the antiviral activity, New Zealand White rabbits were infected with Ad-5 into: 1) the anterior cul-de-sac scarifying the conjunctiva (Group "C+"); 2) the anterior cul-de-sac scarifying the conjunctiva and cornea (Group "CC+"); 3) the stroma (Group "CI+"). Controls were sham-infected ("C-", "CC-", "CI-"). Other rabbits, after "CC", were treated for 21 days with: 1) placebo, 9x/day ("-"); 2) CTC-96, 50 ug/ml, 9x/day ("50/9"); CTC-96, 50 ug/ml, 6x/day ("50/6"); CTC-96, 25 ug/ml, 6x/day ("25/6"). All animals were monitored via examination and plaque assays. RESULTS: In vitro viral inactivation, virucidal and antiviral assays all demonstrated CTC-96 to be effective against Adenvirus type 5 (ad-5). The in vivo model of Ad keratoconjunctivitis most similar to human disease and producing highest viral yield was "CC". All eyes (6/6) developed acute conjunctivitis. "CI" yielded more stromal involvement (1/6) and iritis (5/6), but lower clinical scores (area × severity). Infection via "C" was inconsistent (4/6). Fifty (50) ug/ml was effective against Ad-5 at 6x, 9x dosings while 25 ug/ml (6x) was only marginally effective. CONCLUSION: CTC-96 demonstrated virucidal activity against Ad5 in tissue culture with HeLa, A549 and SIRC cell lines. Animal Model Development: 1) "CC" produced conjunctival infection with occasional keratitis similar to human disease; "CI" yielded primarily stromal involvement; 2) "C" consistently produced neither conjunctivitis nor keratitis. CTC Testing: 1) Conjunctivitis in all eyes; 2) Resolution fastest in "50/9" ("50/9". "50/6" > "25/6" > "-"); 3) Efficacy in "50/6" was not statistically different than "50/9"; 4) Conjunctival severity was lower in treatment groups then controls; 5) Little corneal or intra-ocular changes were noted

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets

Author: A McCarthy
A Morgulis
AD Smith
B Langmead
C Camacho
D Willner
D Willner
DA Wheeler
DJ Turner
DR Bentley
EA Dinsdale
ES Lander
F Meyer
Francisco Rodriguez-Valera
FS Collins
GL Rosen
H Li
H Li
H Li
H Li
J Eid
J Peterson
J Qin
J Wang
JC Venter
JC Wooley
JM Kidd
K Mavromatis
KA Frazer
ML Metzker
N Homer
P Ferragina
P Flicek
P Hugenholtz
PJ Turnbaugh
PJ Turnbaugh
PJA Cock
R Li
R Li
R Schmieder
R Schmieder
Robert Edwards
Robert Schmieder
RP Alexander
S Ahn
S Huse
S Kurtz
S Levy
SF Altschul
SF Altschul
SG Tringe
TF Smith
V Kunin
WJ Kent
WJ Kent
Y Li
Z Ning
Publication venue: Public Library of Science
Publication date: 09/03/2011
Field of study

High-throughput sequencing technologies have strongly impacted microbiology, providing a rapid and cost-effective way of generating draft genomes and exploring microbial diversity. However, sequences obtained from impure nucleic acid preparations may contain DNA from sources other than the sample. Those sequence contaminations are a serious concern to the quality of the data used for downstream analysis, causing misassembly of sequence contigs and erroneous conclusions. Therefore, the removal of sequence contaminants is a necessary and required step for all sequencing projects. We developed DeconSeq, a robust framework for the rapid, automated identification and removal of sequence contamination in longer-read datasets (150 bp mean read length). DeconSeq is publicly available as standalone and web-based versions. The results can be exported for subsequent analysis, and the databases used for the web-based version are automatically updated on a regular basis. DeconSeq categorizes possible contamination sequences, eliminates redundant hits with higher similarity to non-contaminant genomes, and provides graphical visualizations of the alignment results and classifications. Using DeconSeq, we conducted an analysis of possible human DNA contamination in 202 previously published microbial and viral metagenomes and found possible contamination in 145 (72%) metagenomes with as high as 64% contaminating sequences. This new framework allows scientists to automatically detect and efficiently remove unwanted sequence contamination from their datasets while eliminating critical limitations of current methods. DeconSeq's web interface is simple and user-friendly. The standalone version allows offline analysis and integration into existing data processing pipelines. DeconSeq's results reveal whether the sequencing experiment has succeeded, whether the correct sample was sequenced, and whether the sample contains any sequence contamination from DNA preparation or host. In addition, the analysis of 202 metagenomes demonstrated significant contamination of the non-human associated metagenomes, suggesting that this method is appropriate for screening all metagenomes. DeconSeq is available at http://deconseq.sourceforge.net/

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Predicted Relative Metabolomic Turnover (PRMT): determining metabolic turnover from a coastal marine metagenomic dataset

Author: A Paytan
AJ Southward
AL Svitil
AN Kulakova
C Jeuniaux
C-Y Lin
D Field
DJ Repeta
F Meyer
Falkowski G Paul
FO Glöckner
GW Gooday
H Petković
H Petković
HW Ma
JA Gilbert
JA Gilbert
JA Gilbert
JA Gilbert
JC Wooley
JD Selengut
JG Bundy
JH Martin
JH Martin
JH Martin
JH Street
JP Quinn
JY Cho
K Motohashi
KB Heidelberg
KO Buesseler
M Kanehisa
MR Viant
MR Viant
MT Cottrell
MT Cottrell
MT Cottrell
NO Keyhani
P Shannon
PM Sivakumar
R Overbeek
S Blain
S Mitra
TA Gianoulis
VS Mikhail
Y Rao
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Queen's University Belfast Research Portal

Crossref

Springer - Publisher Connector

PubMed Central

Probing Metagenomics by Rapid Cluster Analysis of Very Large Datasets

Author: A Krogh
A Lupas
A Sali
AC McHardy
Adam Godzik
AJ Enright
AJ Enright
B Rodriguez-Brito
BE Suzek
David Jones
DB Rusch
DH Huson
EF DeLong
FE Angly
G Yona
GW Tyson
J Park
JA Cuff
JC Venter
JD Bendtsen
JD Thompson
John C. Wooley
K Mavromatis
L Holm
L Krause
L Rychlewski
ML Tress
O Sasson
P Pipenbacher
PD Schloss
R Apweiler
RL Tatusov
S Mika
S Yooseph
SF Altschul
SG Tringe
SR Eddy
SR Gill
U Hobohm
W Li
W Li
W Li
W Li
Weizhong Li
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

BACKGROUND: The scale and diversity of metagenomic sequencing projects challenge both our technical and conceptual approaches in gene and genome annotations. The recent Sorcerer II Global Ocean Sampling (GOS) expedition yielded millions of predicted protein sequences, which significantly altered the landscape of known protein space by more than doubling its size and adding thousands of new families (Yooseph et al., 2007 PLoS Biol 5, e16). Such datasets, not only by their sheer size, but also by many other features, defy conventional analysis and annotation methods. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we describe an approach for rapid analysis of the sequence diversity and the internal structure of such very large datasets by advanced clustering strategies using the newly modified CD-HIT algorithm. We performed a hierarchical clustering analysis on the 17.4 million Open Reading Frames (ORFs) identified from the GOS study and found over 33 thousand large predicted protein clusters comprising nearly 6 million sequences. Twenty percent of these clusters did not match known protein families by sequence similarity search and might represent novel protein families. Distributions of the large clusters were illustrated on organism composition, functional class, and sample locations. CONCLUSION/SIGNIFICANCE: Our clustering took about two orders of magnitude less computational effort than the similar protein family analysis of original GOS study. This approach will help to analyze other large metagenomic datasets in the future. A Web server with our clustering results and annotations of predicted protein clusters is available online at http://tools.camera.calit2.net/gos under the CAMERA project

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California