207 research outputs found

    GHOSTM: A GPU-Accelerated Homology Search Tool for Metagenomics

    Get PDF
    A large number of sensitive homology searches are required for mapping DNA sequence fragments to known protein sequences in public and private databases during metagenomic analysis. BLAST is currently used for this purpose, but its calculation speed is insufficient, especially for analyzing the large quantities of sequence data obtained from a next-generation sequencer. However, faster search tools, such as BLAT, do not have sufficient search sensitivity for metagenomic analysis. Thus, a sensitive and efficient homology search tool is in high demand for this type of analysis.We developed a new, highly efficient homology search algorithm suitable for graphics processing unit (GPU) calculations that was implemented as a GPU system that we called GHOSTM. The system first searches for candidate alignment positions for a sequence from the database using pre-calculated indexes and then calculates local alignments around the candidate positions before calculating alignment scores. We implemented both of these processes on GPUs. The system achieved calculation speeds that were 130 and 407 times faster than BLAST with 1 GPU and 4 GPUs, respectively. The system also showed higher search sensitivity and had a calculation speed that was 4 and 15 times faster than BLAT with 1 GPU and 4 GPUs.We developed a GPU-optimized algorithm to perform sensitive sequence homology searches and implemented the system as GHOSTM. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We developed GHOSTM, which is a cost-efficient tool, and offer this tool as a potential solution to this problem

    Interactive metagenomic visualization in a Web browser

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables.</p> <p>Results</p> <p>Here we present Krona, a new visualization tool that allows intuitive exploration of relative abundances and confidences within the complex hierarchies of metagenomic classifications. Krona combines a variant of radial, space-filling displays with parametric coloring and interactive polar-coordinate zooming. The HTML5 and JavaScript implementation enables fully interactive charts that can be explored with any modern Web browser, without the need for installed software or plug-ins. This Web-based architecture also allows each chart to be an independent document, making them easy to share via e-mail or post to a standard Web server. To illustrate Krona's utility, we describe its application to various metagenomic data sets and its compatibility with popular metagenomic analysis tools.</p> <p>Conclusions</p> <p>Krona is both a powerful metagenomic visualization tool and a demonstration of the potential of HTML5 for highly accessible bioinformatic visualizations. Its rich and interactive displays facilitate more informed interpretations of metagenomic analyses, while its implementation as a browser-based application makes it extremely portable and easily adopted into existing analysis packages. Both the Krona rendering code and conversion tools are freely available under a BSD open-source license, and available from: <url>http://krona.sourceforge.net</url>.</p

    Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data

    Get PDF
    A frequent step in metagenomic data analysis comprises the assembly of the sequenced reads. Many assembly tools have been published in the last years targeting data coming from next-generation sequencing (NGS) technologies but these assemblers have not been designed for or tested in multi-genome scenarios that characterize metagenomic studies. Here we provide a critical assessment of current de novo short reads assembly tools in multi-genome scenarios using complex simulated metagenomic data. With this approach we tested the fidelity of different assemblers in metagenomic studies demonstrating that even under the simplest compositions the number of chimeric contigs involving different species is noticeable. We further showed that the assembly process reduces the accuracy of the functional classification of the metagenomic data and that these errors can be overcome raising the coverage of the studied metagenome. The results presented here highlight the particular difficulties that de novo genome assemblers face in multi-genome scenarios demonstrating that these difficulties, that often compromise the functional classification of the analyzed data, can be overcome with a high sequencing effort

    An Efficient Rank Based Approach for Closest String and Closest Substring

    Get PDF
    This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results

    Efficacy of topical cobalt chelate CTC-96 against adenovirus in a cell culture model and against adenovirus keratoconjunctivitis in a rabbit model

    Get PDF
    BACKGROUND: Adenovirus (Ad), associated with significant morbidity, has no topical treatment. A leading CTC compound (CTC-96), a Co(III )chelate, was found to have potent in vitro and in vivo antiviral efficacy against herpes viruses. In this study CTC-96 is being tested for possible anti-Adenovirus activity. METHODS: The biological anti-adenovirus activity of CTC-96 in concentrations from 5 to 250 ug/ml, was evaluated initially by viral inactivation (viral exposure to CTC-96 followed by dilution and inoculation of cells), virucidal (viral exposure to CTC-96 and inoculation of cells without dilution) and antiviral (effect of CTC-96 on previously adsorbed virus) plaque assays on HeLa (human cervical carcinoma), A549 (human lung carcinoma) and SIRC (rabbit corneal) cells. After verifying the antiviral activity, New Zealand White rabbits were infected with Ad-5 into: 1) the anterior cul-de-sac scarifying the conjunctiva (Group "C+"); 2) the anterior cul-de-sac scarifying the conjunctiva and cornea (Group "CC+"); 3) the stroma (Group "CI+"). Controls were sham-infected ("C-", "CC-", "CI-"). Other rabbits, after "CC", were treated for 21 days with: 1) placebo, 9x/day ("-"); 2) CTC-96, 50 ug/ml, 9x/day ("50/9"); CTC-96, 50 ug/ml, 6x/day ("50/6"); CTC-96, 25 ug/ml, 6x/day ("25/6"). All animals were monitored via examination and plaque assays. RESULTS: In vitro viral inactivation, virucidal and antiviral assays all demonstrated CTC-96 to be effective against Adenvirus type 5 (ad-5). The in vivo model of Ad keratoconjunctivitis most similar to human disease and producing highest viral yield was "CC". All eyes (6/6) developed acute conjunctivitis. "CI" yielded more stromal involvement (1/6) and iritis (5/6), but lower clinical scores (area × severity). Infection via "C" was inconsistent (4/6). Fifty (50) ug/ml was effective against Ad-5 at 6x, 9x dosings while 25 ug/ml (6x) was only marginally effective. CONCLUSION: CTC-96 demonstrated virucidal activity against Ad5 in tissue culture with HeLa, A549 and SIRC cell lines. Animal Model Development: 1) "CC" produced conjunctival infection with occasional keratitis similar to human disease; "CI" yielded primarily stromal involvement; 2) "C" consistently produced neither conjunctivitis nor keratitis. CTC Testing: 1) Conjunctivitis in all eyes; 2) Resolution fastest in "50/9" ("50/9". "50/6" > "25/6" > "-"); 3) Efficacy in "50/6" was not statistically different than "50/9"; 4) Conjunctival severity was lower in treatment groups then controls; 5) Little corneal or intra-ocular changes were noted

    Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets

    Get PDF
    High-throughput sequencing technologies have strongly impacted microbiology, providing a rapid and cost-effective way of generating draft genomes and exploring microbial diversity. However, sequences obtained from impure nucleic acid preparations may contain DNA from sources other than the sample. Those sequence contaminations are a serious concern to the quality of the data used for downstream analysis, causing misassembly of sequence contigs and erroneous conclusions. Therefore, the removal of sequence contaminants is a necessary and required step for all sequencing projects. We developed DeconSeq, a robust framework for the rapid, automated identification and removal of sequence contamination in longer-read datasets (150 bp mean read length). DeconSeq is publicly available as standalone and web-based versions. The results can be exported for subsequent analysis, and the databases used for the web-based version are automatically updated on a regular basis. DeconSeq categorizes possible contamination sequences, eliminates redundant hits with higher similarity to non-contaminant genomes, and provides graphical visualizations of the alignment results and classifications. Using DeconSeq, we conducted an analysis of possible human DNA contamination in 202 previously published microbial and viral metagenomes and found possible contamination in 145 (72%) metagenomes with as high as 64% contaminating sequences. This new framework allows scientists to automatically detect and efficiently remove unwanted sequence contamination from their datasets while eliminating critical limitations of current methods. DeconSeq's web interface is simple and user-friendly. The standalone version allows offline analysis and integration into existing data processing pipelines. DeconSeq's results reveal whether the sequencing experiment has succeeded, whether the correct sample was sequenced, and whether the sample contains any sequence contamination from DNA preparation or host. In addition, the analysis of 202 metagenomes demonstrated significant contamination of the non-human associated metagenomes, suggesting that this method is appropriate for screening all metagenomes. DeconSeq is available at http://deconseq.sourceforge.net/

    Probing Metagenomics by Rapid Cluster Analysis of Very Large Datasets

    Get PDF
    BACKGROUND: The scale and diversity of metagenomic sequencing projects challenge both our technical and conceptual approaches in gene and genome annotations. The recent Sorcerer II Global Ocean Sampling (GOS) expedition yielded millions of predicted protein sequences, which significantly altered the landscape of known protein space by more than doubling its size and adding thousands of new families (Yooseph et al., 2007 PLoS Biol 5, e16). Such datasets, not only by their sheer size, but also by many other features, defy conventional analysis and annotation methods. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we describe an approach for rapid analysis of the sequence diversity and the internal structure of such very large datasets by advanced clustering strategies using the newly modified CD-HIT algorithm. We performed a hierarchical clustering analysis on the 17.4 million Open Reading Frames (ORFs) identified from the GOS study and found over 33 thousand large predicted protein clusters comprising nearly 6 million sequences. Twenty percent of these clusters did not match known protein families by sequence similarity search and might represent novel protein families. Distributions of the large clusters were illustrated on organism composition, functional class, and sample locations. CONCLUSION/SIGNIFICANCE: Our clustering took about two orders of magnitude less computational effort than the similar protein family analysis of original GOS study. This approach will help to analyze other large metagenomic datasets in the future. A Web server with our clustering results and annotations of predicted protein clusters is available online at http://tools.camera.calit2.net/gos under the CAMERA project
    corecore