    Assessing the Quality of Whole Genome Alignments in Bacteria

    Comparing genomes is an essential preliminary step to solve many problems in biology. Matching long similar segments between two genomes is a precondition for their evolutionary, genetic, and genome rearrangement analyses. Though various comparison methods have been developed in recent years, a quantitative assessment of their performance is lacking. Here, we describe two families of assessment measures whose purpose is to evaluate bacteria-oriented comparison tools. The first measure is based on how well the genome segmentation fits the gene annotation of the studied organisms; the second uses the number of segments created by the segmentation and the percentage of the two genomes that are conserved. The effectiveness of the two measures is demonstrated by applying them to the results of genome comparison tools obtained on 41 pairs of bacterial species. Despite the difference in the nature of the two types of measurements, both show consistent results, providing insights into the subtle differences between the mapping tools

    Systematic identification of gene families for use as markers for phylogenetic and phylogeny- driven ecological studies of bacteria and archaea and their major subgroups

    With the astonishing rate that the genomic and metagenomic sequence data sets are accumulating, there are many reasons to constrain the data analyses. One approach to such constrained analyses is to focus on select subsets of gene families that are particularly well suited for the tasks at hand. Such gene families have generally been referred to as marker genes. We are particularly interested in identifying and using such marker genes for phylogenetic and phylogeny-driven ecological studies of microbes and their communities. We therefore refer to these as PhyEco (for phylogenetic and phylogenetic ecology) markers. The dual use of these PhyEco markers means that we needed to develop and apply a set of somewhat novel criteria for identification of the best candidates for such markers. The criteria we focused on included universality across the taxa of interest, ability to be used to produce robust phylogenetic trees that reflect as much as possible the evolution of the species from which the genes come, and low variation in copy number across taxa. We describe here an automated protocol for identifying potential PhyEco markers from a set of complete genome sequences. The protocol combines rapid searching, clustering and phylogenetic tree building algorithms to generate protein families that meet the criteria listed above. We report here the identification of PhyEco markers for different taxonomic levels including 40 for all bacteria and archaea, 114 for all bacteria, and much more for some of the individual phyla of bacteria. This new list of PhyEco markers should allow much more detailed automated phylogenetic and phylogenetic ecology analyses of these groups than possible previously.Comment: 24 pages, 3 figure

    Ion torrent-based transcriptional assessment of a Corynebacterium pseudotuberculosis equi strain reveals denaturing high-performance liquid chromatography a promising rRNA depletion method

    Corynebacterium pseudotuberculosis equi is a Gram-positive pathogenic bacterium which affects a variety of hosts. Besides the great economic losses it causes to horse-breeders, this organism is also known to be an important infectious agent to cattle and buffaloes. As an outcome of the efforts in characterizing the molecular basis of its virulence, several complete genome sequences were made available in recent years, enabling the large-scale assessment of genes throughout distinct isolates. Meanwhile, the RNA-seq stood out as the technology of choice for comprehensive transcriptome studies, which may bring valuable information regarding active genomic regions, despite of the still impeditive associated costs. In an attempt to increase the use of generated reads per instrument run, by effectively eliminating unwanted rRNAs from total RNA samples without relying on any commercially available kits, we applied denaturing high-performance liquid chromatography (DHPLC) as an alternative method to assess the transcriptional profile of C. pseudotuberculosis. We have found that the DHPLC depletion method, allied to Ion Torrent sequencing, allows mapping of transcripts in a comprehensive way and identifying novel transcripts when a de novo approach is used. These data encourage us to use DHPLC in future transcriptional evaluations in C. pseudotuberculosis

    An improved genome of the model marine alga Ostreococcus tauri unfolds by assessing Illumina de novo assemblies

    Background: Cost effective next generation sequencing technologies now enable the production of genomic datasets for many novel planktonic eukaryotes, representing an understudied reservoir of genetic diversity. O. tauri is the smallest free-living photosynthetic eukaryote known to date, a coccoid green alga that was first isolated in 1995 in a lagoon by the Mediterranean sea. Its simple features, ease of culture and the sequencing of its 13 Mb haploid nuclear genome have promoted this microalga as a new model organism for cell biology. Here, we investigated the quality of genome assemblies of Illumina GAIIx 75 bp paired-end reads from Ostreococcus tauri, thereby also improving the existing assembly and showing the genome to be stably maintained in culture. Results: The 3 assemblers used, ABySS, CLCBio and Velvet, produced 95% complete genomes in 1402 to 2080 scaffolds with a very low rate of misassembly. Reciprocally, these assemblies improved the original genome assembly by filling in 930 gaps. Combined with additional analysis of raw reads and PCR sequencing effort, 1194 gaps have been solved in total adding up to 460 kb of sequence. Mapping of RNAseq Illumina data on this updated genome led to a twofold reduction in the proportion of multi-exon protein coding genes, representing 19% of the total 7699 protein coding genes. The comparison of the DNA extracted in 2001 and 2009 revealed the fixation of 8 single nucleotide substitutions and 2 deletions during the approximately 6000 generations in the lab. The deletions either knocked out or truncated two predicted transmembrane proteins, including a glutamate-receptor like gene. Conclusion: High coverage (>80 fold) paired-end Illumina sequencing enables a high quality 95% complete genome assembly of a compact ~13 Mb haploid eukaryote. This genome sequence has remained stable for 6000 generations of lab culture

    Two intracellular and cell type-specific bacterial symbionts in the placozoan Trichoplax H2

    Placozoa is an enigmatic phylum of simple, microscopic, marine metazoans(1,2). Although intracellular bacteria have been found in all members of this phylum, almost nothing is known about their identity, location and interactions with their host(3-6). We used metagenomic and metatranscriptomic sequencing of single host individuals, plus metaproteomic and imaging analyses, to show that the placozoan Trichoplax sp. H2 lives in symbiosis with two intracellular bacteria. One symbiont forms an undescribed genus in the Midichloriaceae (Rickettsiales)(7,8) and has a genomic repertoire similar to that of rickettsial parasites(9,10), but does not seem to express key genes for energy parasitism. Correlative image analyses and three-dimensional electron tomography revealed that this symbiont resides in the rough endoplasmic reticulum of its host's internal fibre cells. The second symbiont belongs to the Margulisbacteria, a phylum without cultured representatives and not known to form intracellular associations(11-13). This symbiont lives in the ventral epithelial cells of Trichoplax, probably metabolizes algal lipids digested by its host and has the capacity to supplement the placozoan's nutrition. Our study shows that one of the simplest animals has evolved highly specific and intimate associations with symbiotic, intracellular bacteria and highlights that symbioses can provide access to otherwise elusive microbial dark matter

    Progressive Mauve: Multiple alignment of genomes with gene flux and rearrangement

    Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. We describe a method to align two or more genomes that have undergone large-scale recombination, particularly genomes that have undergone substantial amounts of gene gain and loss (gene flux). The method utilizes a novel alignment objective score, referred to as a sum-of-pairs breakpoint score. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The progressive genome alignment algorithm demonstrates markedly improved accuracy over previous approaches in situations where genomes have undergone realistic amounts of genome rearrangement, gene gain, loss, and duplication. We apply the progressive genome alignment algorithm to a set of 23 completely sequenced genomes from the genera Escherichia, Shigella, and Salmonella. The 23 enterobacteria have an estimated 2.46Mbp of genomic content conserved among all taxa and total unique content of 15.2Mbp. We document substantial population-level variability among these organisms driven by homologous recombination, gene gain, and gene loss. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve .Comment: Revision dated June 19, 200

    Specificity Between Lactobacilli And Hymenopteran Hosts Is The Exception Rather Than The Rule

    Lactobacilli (Lactobacillales: Lactobacillaceae) are well known for their roles in food fermentation, as probiotics, and in human health, but they can also be dominant members of the microbiota of some species of Hymenoptera (ants, bees, and wasps). Honey bees and bumble bees associate with host-specific lactobacilli, and some evidence suggests that these lactobacilli are important for bee health. Social transmission helps maintain associations between these bees and their respective microbiota. To determine whether lactobacilli associated with social hymenopteran hosts are generally host specific, we gathered publicly available Lactobacillus 16S rRNA gene sequences, along with Lactobacillus sequences from 454 pyrosequencing surveys of six other hymenopteran species (three sweat bees and three ants). We determined the comparative secondary structural models of 16S rRNA, which allowed us to accurately align the entire 16S rRNA gene, including fast-evolving regions. BLAST searches and maximum-likelihood phylogenetic reconstructions confirmed that honey and bumble bees have host-specific Lactobacillus associates. Regardless of colony size or within-colony oral sharing of food (trophallaxis), sweat bees and ants associate with lactobacilli that are closely related to those found in vertebrate hosts or in diverse environments. Why honey and bumble bees associate with host-specific lactobacilli while other social Hymenoptera do not remains an open question. Lactobacilli are known to inhibit the growth of other microbes and can be beneficial whether they are coevolved with their host or are recruited by the host from environmental sources through mechanisms of partner choice.National Science Foundation PRFB-1003133, DEB-0919519Texas Higher Education Coordinating Board 01923, National Institutes of Health GM067317Integrative BiologyCellular and Molecular BiologyCenter for Computational Biology and BioinformaticsBrackenridge Field Laborator