20,595 research outputs found
SIMBA: a web tool for managing bacterial genome assembly generated by Ion PGM sequencing technology
Background The evolution of Next-Generation Sequencing (NGS) has considerably reduced the cost per sequenced-base, allowing a significant rise of sequencing projects, mainly in prokaryotes. However, the range of available NGS platforms requires different strategies and software to correctly assemble genomes. Different strategies are necessary to properly complete an assembly project, in addition to the installation or modification of various software. This requires users to have significant expertise in these software and command line scripting experience on Unix platforms, besides possessing the basic expertise on methodologies and techniques for genome assembly. These difficulties often delay the complete genome assembly projects. Results In order to overcome this, we developed SIMBA (SImple Manager for Bacterial Assemblies), a freely available web tool that integrates several component tools for assembling and finishing bacterial genomes. SIMBA provides a friendly and intuitive user interface so bioinformaticians, even with low computational expertise, can work under a centralized administrative control system of assemblies managed by the assembly center head. SIMBA guides the users to execute assembly process through simple and interactive pages. SIMBA workflow was divided in three modules: (i) projects: allows a general vision of genome sequencing projects, in addition to data quality analysis and data format conversions; (ii) assemblies: allows de novo assemblies with the software Mira, Minia, Newbler and SPAdes, also assembly quality validations using QUAST software; and (iii) curation: presents methods to finishing assemblies through tools for scaffolding contigs and close gaps. We also presented a case study that validated the efficacy of SIMBA to manage bacterial assemblies projects sequenced using Ion Torrent PGM. Conclusion Besides to be a web tool for genome assembly, SIMBA is a complete genome assemblies project management system, which can be useful for managing of several projects in laboratories. SIMBA source code is available to download and install in local webservers at http://ufmg-simba.sourceforge.net
Viral pathogen discovery.
Viral pathogen discovery is of critical importance to clinical microbiology, infectious diseases, and public health. Genomic approaches for pathogen discovery, including consensus polymerase chain reaction (PCR), microarrays, and unbiased next-generation sequencing (NGS), have the capacity to comprehensively identify novel microbes present in clinical samples. Although numerous challenges remain to be addressed, including the bioinformatics analysis and interpretation of large datasets, these technologies have been successful in rapidly identifying emerging outbreak threats, screening vaccines and other biological products for microbial contamination, and discovering novel viruses associated with both acute and chronic illnesses. Downstream studies such as genome assembly, epidemiologic screening, and a culture system or animal model of infection are necessary to establish an association of a candidate pathogen with disease
BamView: visualizing and interpretation of next-generation sequencing read alignments.
So-called next-generation sequencing (NGS) has provided the ability to sequence on a massive scale at low cost, enabling biologists to perform powerful experiments and gain insight into biological processes. BamView has been developed to visualize and analyse sequence reads from NGS platforms, which have been aligned to a reference sequence. It is a desktop application for browsing the aligned or mapped reads [Ruffalo, M, LaFramboise, T, Koyutürk, M. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics 2011;27:2790-6] at different levels of magnification, from nucleotide level, where the base qualities can be seen, to genome or chromosome level where overall coverage is shown. To enable in-depth investigation of NGS data, various views are provided that can be configured to highlight interesting aspects of the data. Multiple read alignment files can be overlaid to compare results from different experiments, and filters can be applied to facilitate the interpretation of the aligned reads. As well as being a standalone application it can be used as an integrated part of the Artemis genome browser, BamView allows the user to study NGS data in the context of the sequence and annotation of the reference genome. Single nucleotide polymorphism (SNP) density and candidate SNP sites can be highlighted and investigated, and read-pair information can be used to discover large structural insertions and deletions. The application will also calculate simple analyses of the read mapping, including reporting the read counts and reads per kilobase per million mapped reads (RPKM) for genes selected by the user
Reevaluating Assembly Evaluations with Feature Response Curves: GAGE and Assemblathons
In just the last decade, a multitude of bio-technologies and software
pipelines have emerged to revolutionize genomics. To further their central
goal, they aim to accelerate and improve the quality of de novo whole-genome
assembly starting from short DNA reads. However, the performance of each of
these tools is contingent on the length and quality of the sequencing data, the
structure and complexity of the genome sequence, and the resolution and quality
of long-range information. Furthermore, in the absence of any metric that
captures the most fundamental "features" of a high-quality assembly, there is
no obvious recipe for users to select the most desirable assembler/assembly.
International competitions such as Assemblathons or GAGE tried to identify the
best assembler(s) and their features. Some what circuitously, the only
available approach to gauge de novo assemblies and assemblers relies solely on
the availability of a high-quality fully assembled reference genome sequence.
Still worse, reference-guided evaluations are often both difficult to analyze,
leading to conclusions that are difficult to interpret. In this paper, we
circumvent many of these issues by relying upon a tool, dubbed FRCbam, which is
capable of evaluating de novo assemblies from the read-layouts even when no
reference exists. We extend the FRCurve approach to cases where lay-out
information may have been obscured, as is true in many deBruijn-graph-based
algorithms. As a by-product, FRCurve now expands its applicability to a much
wider class of assemblers -- thus, identifying higher-quality members of this
group, their inter-relations as well as sensitivity to carefully selected
features, with or without the support of a reference sequence or layout for the
reads. The paper concludes by reevaluating several recently conducted assembly
competitions and the datasets that have resulted from them.Comment: Submitted to PLoS One. Supplementary material available at
http://www.nada.kth.se/~vezzi/publications/supplementary.pdf and
http://cs.nyu.edu/mishra/PUBLICATIONS/12.supplementaryFRC.pd
- …