1,054 research outputs found

    BamView: visualizing and interpretation of next-generation sequencing read alignments.

    Get PDF
    So-called next-generation sequencing (NGS) has provided the ability to sequence on a massive scale at low cost, enabling biologists to perform powerful experiments and gain insight into biological processes. BamView has been developed to visualize and analyse sequence reads from NGS platforms, which have been aligned to a reference sequence. It is a desktop application for browsing the aligned or mapped reads [Ruffalo, M, LaFramboise, T, Koyutürk, M. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics 2011;27:2790-6] at different levels of magnification, from nucleotide level, where the base qualities can be seen, to genome or chromosome level where overall coverage is shown. To enable in-depth investigation of NGS data, various views are provided that can be configured to highlight interesting aspects of the data. Multiple read alignment files can be overlaid to compare results from different experiments, and filters can be applied to facilitate the interpretation of the aligned reads. As well as being a standalone application it can be used as an integrated part of the Artemis genome browser, BamView allows the user to study NGS data in the context of the sequence and annotation of the reference genome. Single nucleotide polymorphism (SNP) density and candidate SNP sites can be highlighted and investigated, and read-pair information can be used to discover large structural insertions and deletions. The application will also calculate simple analyses of the read mapping, including reporting the read counts and reads per kilobase per million mapped reads (RPKM) for genes selected by the user

    Circlator: automated circularization of genome assemblies using long sequencing reads

    Get PDF
    The assembly of DNA sequence data is undergoing a renaissance thanks to emerging technologies capable of producing reads tens of kilobases long. Assembling complete bacterial and small eukaryotic genomes is now possible, but the final step of circularizing sequences remains unsolved. Here we present Circlator, the first tool to automate assembly circularization and produce accurate linear representations of circular sequences. Using Pacific Biosciences and Oxford Nanopore data, Circlator correctly circularized 26 of 27 circularizable sequences, comprising 11 chromosomes and 12 plasmids from bacteria, the apicoplast and mitochondrion of Plasmodium falciparum and a human mitochondrion. Circlator is available at http://sanger-pathogens.github.io/circlator/

    Identification, variation and transcription of pneumococcal repeat sequences.

    Get PDF
    BACKGROUND: Small interspersed repeats are commonly found in many bacterial chromosomes. Two families of repeats (BOX and RUP) have previously been identified in the genome of Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen of humans. However, little is known about the role they play in pneumococcal genetics. RESULTS: Analysis of the genome of S. pneumoniae ATCC 700669 revealed the presence of a third repeat family, which we have named SPRITE. All three repeats are present at a reduced density in the genome of the closely related species S. mitis. However, they are almost entirely absent from all other streptococci, although a set of elements related to the pneumococcal BOX repeat was identified in the zoonotic pathogen S. suis. In conjunction with information regarding their distribution within the pneumococcal chromosome, this suggests that it is unlikely that these repeats are specialised sequences performing a particular role for the host, but rather that they constitute parasitic elements. However, comparing insertion sites between pneumococcal sequences indicates that they appear to transpose at a much lower rate than IS elements. Some large BOX elements in S. pneumoniae were found to encode open reading frames on both strands of the genome, whilst another was found to form a composite RNA structure with two T box riboswitches. In multiple cases, such BOX elements were demonstrated as being expressed using directional RNA-seq and RT-PCR. CONCLUSIONS: BOX, RUP and SPRITE repeats appear to have proliferated extensively throughout the pneumococcal chromosome during the species' past, but novel insertions are currently occurring at a relatively slow rate. Through their extensive secondary structures, they seem likely to affect the expression of genes with which they are co-transcribed. Software for annotation of these repeats is freely available from ftp://ftp.sanger.ac.uk/pub/pathogens/strep_repeats/

    Genome Sequencing of a Historic Staphylococcus aureus Collection Reveals New Enterotoxin Genes and Sheds Light on the Evolution and Genomic Organization of This Key Virulence Gene Family.

    Get PDF
    We take advantage of a historic collection of 133 Staphylococcus aureus strains accessioned between 1924 and 2016, whose genomes have been long-read sequenced as part of a major National Collection of Type Cultures (NCTC) initiative, to conduct a gene family-wide computational analysis of enterotoxin genes. We identify two novel staphylococcal enterotoxin (pseudo)genes (sel29p and sel30), the former of which has not been observed in any contemporary strain to date. We provide further information on five additional enterotoxin genes or gene variants that either have recently entered the literature or for which the nomenclature or description is currently unclear (selz, sel26, sel27, sel28, and ses-2p). An examination of over 11,000 RefSeq genomes in search of wider support for these seven (pseudo)genes led to the identification of an additional three novel enterotoxin gene family members (sel31, sel32, and sel33) plus two new variants (seh-2p and ses-3p). We cast light on the genomic distribution of the enterotoxin genes, further defining their arrangement in gene clusters. Finally, we show that cooccurrence of enterotoxin genes is prevalent, with individual NCTC strains possessing as many as 18 enterotoxin genes and pseudogenes, and that clonal complex membership rather than time of isolation is the key factor in determining enterotoxin load.IMPORTANCEStaphylococcus aureus strains pose a significant health risk to both human and animal populations. Key among this species' virulence factors is the staphylococcal enterotoxin gene family. Certain enterotoxin forms can induce a potentially life-threatening immune response, while others are implicated in less fatal though often severe conditions such as food poisoning. Genetic characterization of staphylococcal enterotoxin gene family members has steadily accumulated over recent decades, with over 20 genes now established in the literature. Despite the current wealth of knowledge on this important gene family, questions remain about the presence of additional enterotoxin genes and the genomic composition of family members. This study further expands knowledge of the staphylococcal enterotoxins while shedding light on their evolution over the last century

    BamView: viewing mapped read alignment data in the context of the reference sequence

    Get PDF
    Summary: BamView is an interactive Java application for visualizing the large amounts of data stored for sequence reads which are aligned against a reference genome sequence. It supports the BAM (Binary Alignment/Map) format. It can be used in a number of contexts including SNP calling and structural annotation. BamView has also been integrated into Artemis so that the reads can be viewed in the context of the nucleotide sequence and genomic features

    Re-annotation and re-analysis of the Campylobacter jejuni NCTC11168 genome sequence

    Get PDF
    BACKGROUND: Campylobacter jejuni is the leading bacterial cause of human gastroenteritis in the developed world. To improve our understanding of this important human pathogen, the C. jejuni NCTC11168 genome was sequenced and published in 2000. The original annotation was a milestone in Campylobacter research, but is outdated. We now describe the complete re-annotation and re-analysis of the C. jejuni NCTC11168 genome using current database information, novel tools and annotation techniques not used during the original annotation. RESULTS: Re-annotation was carried out using sequence database searches such as FASTA, along with programs such as TMHMM for additional support. The re-annotation also utilises sequence data from additional Campylobacter strains and species not available during the original annotation. Re-annotation was accompanied by a full literature search that was incorporated into the updated EMBL file [EMBL: AL111168]. The C. jejuni NCTC11168 re-annotation reduced the total number of coding sequences from 1654 to 1643, of which 90.0% have additional information regarding the identification of new motifs and/or relevant literature. Re-annotation has led to 18.2% of coding sequence product functions being revised. CONCLUSIONS: Major updates were made to genes involved in the biosynthesis of important surface structures such as lipooligosaccharide, capsule and both O- and N-linked glycosylation. This re-annotation will be a key resource for Campylobacter research and will also provide a prototype for the re-annotation and re-interpretation of other bacterial genomes

    Expression of Cellulosome Components and Type IV Pili within the Extracellular Proteome of Ruminococcus flavefaciens 007

    Get PDF
    Funding: The Rowett Institute receives funding from SG-RESAS (Scottish Government Rural and Environmental Science and Analysis Service). Visit of M.V. was supported by research grants from FEMS and Slovene human resources development and scholarship funds. Parts of this work were funded by grants from the United States-Israel Binational Science Foundation (BSF), Jerusalem, Israel – BSF Energy Research grant to E.A.B. and B.A.W. and Regular BSF Research grants to R.L. and B.A.W. – and by the Israel Science Foundation (grant nos 966/09 and 159/07 291/08). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Peer reviewedPublisher PD

    Evaluation of PacBio sequencing for full-length bacterial 16S rRNA gene classification.

    Get PDF
    BACKGROUND: Currently, bacterial 16S rRNA gene analyses are based on sequencing of individual variable regions of the 16S rRNA gene (Kozich, et al Appl Environ Microbiol 79:5112-5120, 2013).This short read approach can introduce biases. Thus, full-length bacterial 16S rRNA gene sequencing is needed to reduced biases. A new alternative for full-length bacterial 16S rRNA gene sequencing is offered by PacBio single molecule, real-time (SMRT) technology. The aim of our study was to validate PacBio P6 sequencing chemistry using three approaches: 1) sequencing the full-length bacterial 16S rRNA gene from a single bacterial species Staphylococcus aureus to analyze error modes and to optimize the bioinformatics pipeline; 2) sequencing the full-length bacterial 16S rRNA gene from a pool of 50 different bacterial colonies from human stool samples to compare with full-length bacterial 16S rRNA capillary sequence; and 3) sequencing the full-length bacterial 16S rRNA genes from 11 vaginal microbiome samples and compare with in silico selected bacterial 16S rRNA V1V2 gene region and with bacterial 16S rRNA V1V2 gene regions sequenced using the Illumina MiSeq. RESULTS: Our optimized bioinformatics pipeline for PacBio sequence analysis was able to achieve an error rate of 0.007% on the Staphylococcus aureus full-length 16S rRNA gene. Capillary sequencing of the full-length bacterial 16S rRNA gene from the pool of 50 colonies from stool identified 40 bacterial species of which up to 80% could be identified by PacBio full-length bacterial 16S rRNA gene sequencing. Analysis of the human vaginal microbiome using the bacterial 16S rRNA V1V2 gene region on MiSeq generated 129 operational taxonomic units (OTUs) from which 70 species could be identified. For the PacBio, 36,000 sequences from over 58,000 raw reads could be assigned to a barcode, and the in silico selected bacterial 16S rRNA V1V2 gene region generated 154 OTUs grouped into 63 species, of which 62% were shared with the MiSeq dataset. The PacBio full-length bacterial 16S rRNA gene datasets generated 261 OTUs, which were grouped into 52 species, of which 54% were shared with the MiSeq dataset. Alpha diversity index reported a higher diversity in the MiSeq dataset. CONCLUSION: The PacBio sequencing error rate is now in the same range of the previously widely used Roche 454 sequencing platform and current MiSeq platform. Species-level microbiome analysis revealed some inconsistencies between the full-length bacterial 16S rRNA gene capillary sequencing and PacBio sequencing

    Fundamental differences in physiology of Bordetella pertussis dependent on the two-component system Bvg revealed by gene essentiality studies.

    Get PDF
    The identification of genes essential for a bacterium's growth reveals much about its basic physiology under different conditions. Bordetella pertussis, the causative agent of whooping cough, adopts both virulent and avirulent states through the activity of the two-component system, Bvg. The genes essential for B. pertussis growth in vitro were defined using transposon sequencing, for different Bvg-determined growth states. In addition, comparison of the insertion indices of each gene between Bvg phases identified those genes whose mutation exerted a significantly different fitness cost between phases. As expected, many of the genes identified as essential for growth in other bacteria were also essential for B. pertussis. However, the essentiality of some genes was dependent on Bvg. In particular, a number of key cell wall biosynthesis genes, including the entire mre/mrd locus, were essential for growth of the avirulent (Bvg minus) phase but not the virulent (Bvg plus) phase. In addition, cell wall biosynthesis was identified as a fundamental process that when disrupted produced greater fitness costs for the Bvg minus phase compared to the Bvg plus phase. Bvg minus phase growth was more susceptible than Bvg plus phase growth to the cell wall-disrupting antibiotic ampicillin, demonstrating the increased susceptibility of the Bvg minus phase to disruption of cell wall synthesis. This Bvg-dependent conditional essentiality was not due to Bvg-regulation of expression of cell wall biosynthesis genes; suggesting that this fundamental process differs between the Bvg phases in B. pertussis and is more susceptible to disruption in the Bvg minus phase. The ability of a bacterium to modify its cell wall synthesis is important when considering the action of antibiotics, particularly if developing novel drugs targeting cell wall synthesis
    corecore