19 research outputs found

    Indexing arbitrary-length kk-mers in sequencing reads

    Full text link
    We propose a lightweight data structure for indexing and querying collections of NGS reads data in main memory. The data structure supports the interface proposed in the pioneering work by Philippe et al. for counting and locating kk-mers in sequencing reads. Our solution, PgSA (pseudogenome suffix array), based on finding overlapping reads, is competitive to the existing algorithms in the space use, query times, or both. The main applications of our index include variant calling, error correction and analysis of reads from RNA-seq experiments

    CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers.

    Get PDF
    BackgroundThe problem of supervised DNA sequence classification arises in several fields of computational molecular biology. Although this problem has been extensively studied, it is still computationally challenging due to size of the datasets that modern sequencing technologies can produce.ResultsWe introduce CLARK a novel approach to classify metagenomic reads at the species or genus level with high accuracy and high speed. Extensive experimental results on various metagenomic samples show that the classification accuracy of CLARK is better or comparable to the best state-of-the-art tools and it is significantly faster than any of its competitors. In its fastest single-threaded mode CLARK classifies, with high accuracy, about 32 million metagenomic short reads per minute. CLARK can also classify BAC clones or transcripts to chromosome arms and centromeric regions.ConclusionsCLARK is a versatile, fast and accurate sequence classification method, especially useful for metagenomics and genomics applications. It is freely available at http://clark.cs.ucr.edu/

    Nucleic acid detection techniques for adventitious agent testing

    Get PDF
    The availability of safe and effective animal vaccines is critical for the prevention of animal disease. Adventitious agent testing is done on master seed viruses prior to vaccine licensure to ensure that no biological contaminants were introduced during manufacture. Traditional adventitious agent testing is performed using a variety of cell culture lines and a panel of polymerase chain reaction tests. The purpose of this research was to determine if new technologies like DNA microarray and next-generation sequencing (NGS) could be of any benefit for adventitious agent testing. A literature review describes the state of the field and the challenges that will have to be addressed to use these technologies in a regulatory environment. Both techniques were tested on a panel of mammalian and avian viruses, and each virus was tested individually and in combination with other viruses. NGS was found to be a more reliable method of screening for adventitious agents than microarray

    BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS

    Get PDF
    Background: Substantial advances in microbiology, molecular evolution and biodiversity have been carried out in recent years thanks to Metagenomics, which allows to unveil the composition and functions of mixed microbial communities in any environmental niche. If the investigation is aimed only at the microbiome taxonomic structure, a target-based metagenomic approach, here also referred as Meta-barcoding, is generally applied. This approach commonly involves the selective amplification of a species-specific genetic marker (DNA meta-barcode) in the whole taxonomic range of interest and the exploration of its taxon-related variants through High-Throughput Sequencing (HTS) technologies. The accessibility to proper computational systems for the large-scale bioinformatic analysis of HTS data represents, currently, one of the major challenges in advanced Meta-barcoding projects.; Results: BioMaS (Bioinformatic analysis of Metagenomic AmpliconS) is a new bioinformatic pipeline designed to support biomolecular researchers involved in taxonomic studies of environmental microbial communities by a completely automated workflow, comprehensive of all the fundamental steps, from raw sequence data upload and cleaning to final taxonomic identification, that are absolutely required in an appropriately designed Meta-barcoding HTS-based experiment. In its current version, BioMaS allows the analysis of both bacterial and fungal environments starting directly from the raw sequencing data from either Roche 454 or Illumina HTS platforms, following two alternative paths, respectively. BioMaS is implemented into a public web service available at https://recasgateway.ba.infn.it/ and is also available in Galaxy at http://galaxy.cloud.ba.infn.it:8080 (only for Illumina data).; Conclusion: BioMaS is a friendly pipeline for Meta-barcoding HTS data analysis specifically designed for users without particular computing skills. A comparative benchmark, carried out by using a simulated dataset suitably designed to broadly represent the currently known bacterial and fungal world, showed that BioMaS outperforms QIIME and MOTHUR in terms of extent and accuracy of deep taxonomic sequence assignments.Peer ReviewedPostprint (published version

    Identifying accurate metagenome and amplicon software via a meta-analysis of sequence to taxonomy benchmarking studies

    Get PDF
    Metagenomic and meta-barcode DNA sequencing has rapidly become a widely-used technique for investigating a range of questions, particularly related to health and environmental monitoring. There has also been a proliferation of bioinformatic tools for analysing metagenomic and amplicon datasets, which makes selecting adequate tools a significant challenge. A number of benchmark studies have been undertaken; however, these can present conflicting results. In order to address this issue we have applied a robust Z-score ranking procedure and a network meta-analysis method to identify software tools that are consistently accurate for mapping DNA sequences to taxonomic hierarchies. Based upon these results we have identified some tools and computational strategies that produce robust predictions

    Erinevate töövoogude analüüs viiruste tuvastamisel

    Get PDF
    Viirused on väikesed rakusisesed parasiidid, mis vajavad paljunemiseks peremeesorganismi – olgu selleks ühe- või hulkraksed organismid. Viirused on väga mitmekesised ning uusi viiruseid avastatakse pidevalt juurde. Käesoleva töö eesmärgiks on analüüsida erinevaid programme ja töövooge, mis on loodud uute viiruste avastamiseks ja teadaolevate viiruste tuvastamiseks proovidest. Erinevate positiivsete ja negatiivsete külgede välja toomine aitab leida sobivaimat töövoogu viiruste detekteerimiseks. Viiruste tuvastamine erinevatest keskkondadest aitab meilkoguda täiendavaid andmeid selliste viiruste kohta, mida oleks võimalik rakendada ka meditsiinis
    corecore