154 research outputs found

    progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement

    Get PDF
    Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms.We describe a new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss (flux). We demonstrate that the new method can accurately align regions conserved in some, but not all, of the genomes, an important case not handled by our previous work. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score, which facilitates accurate detection of rearrangement breakpoints when genomes have unequal gene content. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The new genome alignment algorithm demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. We apply the new algorithm to a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows us to extend the previously defined concepts of core- and pan-genomes to include not only annotated genes, but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core-genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. We document substantial population-level variability among these organisms driven by segmental gain and loss. Interestingly, much variability lies in intergenic regions, suggesting that the Enterobacteriacae may exhibit regulatory divergence.The multiple genome alignments generated by our software provide a platform for comparative genomic and population genomic studies. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve

    inGeno – an integrated genome and ortholog viewer for improved genome to genome comparisons

    Get PDF
    BACKGROUND: Systematic genome comparisons are an important tool to reveal gene functions, pathogenic features, metabolic pathways and genome evolution in the era of post-genomics. Furthermore, such comparisons provide important clues for vaccines and drug development. Existing genome comparison software often lacks accurate information on orthologs, the function of similar genes identified and genome-wide reports and lists on specific functions. All these features and further analyses are provided here in the context of a modular software tool "inGeno" written in Java with Biojava subroutines. RESULTS: InGeno provides a user-friendly interactive visualization platform for sequence comparisons (comprehensive reciprocal protein – protein comparisons) between complete genome sequences and all associated annotations and features. The comparison data can be acquired from several different sequence analysis programs in flexible formats. Automatic dot-plot analysis includes output reduction, filtering, ortholog testing and linear regression, followed by smart clustering (local collinear blocks; LCBs) to reveal similar genome regions. Further, the system provides genome alignment and visualization editor, collinear relationships and strain-specific islands. Specific annotations and functions are parsed, recognized, clustered, logically concatenated and visualized and summarized in reports. CONCLUSION: As shown in this study, inGeno can be applied to study and compare in particular prokaryotic genomes against each other (gram positive and negative as well as close and more distantly related species) and has been proven to be sensitive and accurate. This modular software is user-friendly and easily accommodates new routines to meet specific user-defined requirements

    Precise detection of rearrangement breakpoints in mammalian chromosomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genomes undergo large structural changes that alter their organisation. The chromosomal regions affected by these rearrangements are called breakpoints, while those which have not been rearranged are called synteny blocks. We developed a method to precisely delimit rearrangement breakpoints on a genome by comparison with the genome of a related species. Contrary to current methods which search for synteny blocks and simply return what remains in the genome as breakpoints, we propose to go further and to investigate the breakpoints themselves in order to refine them.</p> <p>Results</p> <p>Given some reliable and non overlapping synteny blocks, the core of the method consists in refining the regions that are not contained in them. By aligning each breakpoint sequence against its specific orthologous sequences in the other species, we can look for weak similarities inside the breakpoint, thus extending the synteny blocks and narrowing the breakpoints. The identification of the narrowed breakpoints relies on a segmentation algorithm and is statistically assessed. Since this method requires as input synteny blocks with some properties which, though they appear natural, are not verified by current methods for detecting such blocks, we further give a formal definition and provide an algorithm to compute them.</p> <p>The whole method is applied to delimit breakpoints on the human genome when compared to the mouse and dog genomes. Among the 355 human-mouse and 240 human-dog breakpoints, 168 and 146 respectively span less than 50 Kb. We compared the resulting breakpoints with some publicly available ones and show that we achieve a better resolution. Furthermore, we suggest that breakpoints are rarely reduced to a point, and instead consist in often large regions that can be distinguished from the sequences around in terms of segmental duplications, similarity with related species, and transposable elements.</p> <p>Conclusion</p> <p>Our method leads to smaller breakpoints than already published ones and allows for a better description of their internal structure. In the majority of cases, our refined regions of breakpoint exhibit specific biological properties (no similarity, presence of segmental duplications and of transposable elements). We hope that this new result may provide some insight into the mechanism and evolutionary properties of chromosomal rearrangements.</p

    Context-driven discovery of gene cassettes in mobile integrons using a computational grammar

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene discovery algorithms typically examine sequence data for low level patterns. A novel method to computationally discover higher order DNA structures is presented, using a context sensitive grammar. The algorithm was applied to the discovery of gene cassettes associated with integrons. The discovery and annotation of antibiotic resistance genes in such cassettes is essential for effective monitoring of antibiotic resistance patterns and formulation of public health antibiotic prescription policies.</p> <p>Results</p> <p>We discovered two new putative gene cassettes using the method, from 276 integron features and 978 GenBank sequences. The system achieved <it>κ </it>= 0.972 annotation agreement with an expert gold standard of 300 sequences. In rediscovery experiments, we deleted 789,196 cassette instances over 2030 experiments and correctly relabelled 85.6% (<it>α </it>≥ 95%, <it>E </it>≤ 1%, mean sensitivity = 0.86, specificity = 1, F-score = 0.93), with no false positives.</p> <p>Error analysis demonstrated that for 72,338 missed deletions, two adjacent deleted cassettes were labeled as a single cassette, increasing performance to 94.8% (mean sensitivity = 0.92, specificity = 1, F-score = 0.96).</p> <p>Conclusion</p> <p>Using grammars we were able to represent heuristic background knowledge about large and complex structures in DNA. Importantly, we were also able to use the context embedded in the model to discover new putative antibiotic resistance gene cassettes. The method is complementary to existing automatic annotation systems which operate at the sequence level.</p

    Dynamics of Genome Rearrangement in Bacterial Populations

    Get PDF
    Genome structure variation has profound impacts on phenotype in organisms ranging from microbes to humans, yet little is known about how natural selection acts on genome arrangement. Pathogenic bacteria such as Yersinia pestis, which causes bubonic and pneumonic plague, often exhibit a high degree of genomic rearrangement. The recent availability of several Yersinia genomes offers an unprecedented opportunity to study the evolution of genome structure and arrangement. We introduce a set of statistical methods to study patterns of rearrangement in circular chromosomes and apply them to the Yersinia. We constructed a multiple alignment of eight Yersinia genomes using Mauve software to identify 78 conserved segments that are internally free from genome rearrangement. Based on the alignment, we applied Bayesian statistical methods to infer the phylogenetic inversion history of Yersinia. The sampling of genome arrangement reconstructions contains seven parsimonious tree topologies, each having different histories of 79 inversions. Topologies with a greater number of inversions also exist, but were sampled less frequently. The inversion phylogenies agree with results suggested by SNP patterns. We then analyzed reconstructed inversion histories to identify patterns of rearrangement. We confirm an over-representation of “symmetric inversions”—inversions with endpoints that are equally distant from the origin of chromosomal replication. Ancestral genome arrangements demonstrate moderate preference for replichore balance in Yersinia. We found that all inversions are shorter than expected under a neutral model, whereas inversions acting within a single replichore are much shorter than expected. We also found evidence for a canonical configuration of the origin and terminus of replication. Finally, breakpoint reuse analysis reveals that inversions with endpoints proximal to the origin of DNA replication are nearly three times more frequent. Our findings represent the first characterization of genome arrangement evolution in a bacterial population evolving outside laboratory conditions. Insight into the process of genomic rearrangement may further the understanding of pathogen population dynamics and selection on the architecture of circular bacterial chromosomes

    Comparative Geno-Plasticity Analysis of Mycoplasma bovis HB0801 (Chinese Isolate)

    Get PDF
    Mycoplasma bovis pneumonia in cattle has been epidemic in China since 2008. To investigate M. bovis pathogenesis, we completed genome sequencing of strain HB0801 isolated from a lesioned bovine lung from Hubei, China. The genomic plasticity was determined by comparing HB0801 with M. bovis strain ATCC® 25523™/PG45 from cow mastitis milk, Chinese strain Hubei-1 from lesioned lung tissue, and 16 other Mycoplasmas species. Compared to PG45, the genome size of HB0801 was reduced by 11.7 kb. Furthermore, a large chromosome inversion (580 kb) was confirmed in all Chinese isolates including HB0801, HB1007, a strain from cow mastitis milk, and Hubei-1. In addition, the variable surface lipoproteins (vsp) gene cluster existed in HB0801, but contained less than half of the genes, and had poor identity to that in PG45, but they had conserved structures. Further inter-strain comparisons revealed other mechanisms of gene acquisition and loss in HB0801 that primarily involved insertion sequence (IS) elements, integrative conjugative element, restriction and modification systems, and some lipoproteins and transmembrane proteins. Subsequently, PG45 and HB0801 virulence in cattle was compared. Results indicated that both strains were pathogenic to cattle. The scores of gross pathological assessment for the control group, and the PG45- and HB0801-infected groups were 3, 13 and 9, respectively. Meanwhile the scores of lung lesion for these three groups were 36, 70, and 69, respectively. In addition, immunohistochemistry detection demonstrated that both strains were similarly distributed in lungs and lymph nodes. Although PG45 showed slightly higher virulence in calves than HB0801, there was no statistical difference between the strains (P>0.05). Compared to Hubei-1, a total of 122 SNP loci were disclosed in HB0801. In conclusion, although genomic plasticity was thought to be an evolutionary advantage, it did not apparently affect virulence of M. bovis strains in cattle

    Phage Encoded H-NS: A Potential Achilles Heel in the Bacterial Defence System

    Get PDF
    The relationship between phage and their microbial hosts is difficult to elucidate in complex natural ecosystems. Engineered systems performing enhanced biological phosphorus removal (EBPR), offer stable, lower complexity communities for studying phage-host interactions. Here, metagenomic data from an EBPR reactor dominated by Candidatus Accumulibacter phosphatis (CAP), led to the recovery of three complete and six partial phage genomes. Heat-stable nucleoid structuring (H-NS) protein, a global transcriptional repressor in bacteria, was identified in one of the complete phage genomes (EPV1), and was most similar to a homolog in CAP. We infer that EPV1 is a CAP-specific phage and has the potential to repress up to 6% of host genes based on the presence of putative H-NS binding sites in the CAP genome. These genes include CRISPR associated proteins and a Type III restriction-modification system, which are key host defense mechanisms against phage infection. Further, EPV1 was the only member of the phage community found in an EBPR microbial metagenome collected seven months prior. We propose that EPV1 laterally acquired H-NS from CAP providing it with a means to reduce bacterial defenses, a selective advantage over other phage in the EBPR system. Phage encoded H-NS could constitute a previously unrecognized weapon in the phage-host arms race

    Metabolic Versatility and Antibacterial Metabolite Biosynthesis Are Distinguishing Genomic Features of the Fire Blight Antagonist Pantoea vagans C9-1

    Get PDF
    Smits THM, Rezzonico F, Kamber T, et al. Metabolic Versatility and Antibacterial Metabolite Biosynthesis Are Distinguishing Genomic Features of the Fire Blight Antagonist Pantoea vagans C9-1. PLoS ONE. 2011;6(7): e22247.Background: Pantoea vagans is a commercialized biological control agent used against the pome fruit bacterial disease fire blight, caused by Erwinia amylovora. Compared to other biocontrol agents, relatively little is currently known regarding Pantoea genetics. Better understanding of antagonist mechanisms of action and ecological fitness is critical to improving efficacy. Principal Findings: Genome analysis indicated two major factors contribute to biocontrol activity: competition for limiting substrates and antibacterial metabolite production. Pathways for utilization of a broad diversity of sugars and acquisition of iron were identified. Metabolism of sorbitol by P. vagans C9-1 may be a major metabolic feature in biocontrol of fire blight. Biosynthetic genes for the antibacterial peptide pantocin A were found on a chromosomal 28-kb genomic island, and for dapdiamide E on the plasmid pPag2. There was no evidence of potential virulence factors that could enable an animal or phytopathogenic lifestyle and no indication of any genetic-based biosafety risk in the antagonist. Conclusions: Identifying key determinants contributing to disease suppression allows the development of procedures to follow their expression in planta and the genome sequence contributes to rationale risk assessment regarding the use of the biocontrol strain in agricultural systems

    Sequence of the hyperplastic genome of the naturally competent Thermus scotoductus SA-01

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many strains of <it>Thermus </it>have been isolated from hot environments around the world. <it>Thermus scotoductus </it>SA-01 was isolated from fissure water collected 3.2 km below surface in a South African gold mine. The isolate is capable of dissimilatory iron reduction, growth with oxygen and nitrate as terminal electron acceptors and the ability to reduce a variety of metal ions, including gold, chromate and uranium, was demonstrated. The genomes from two different <it>Thermus thermophilus </it>strains have been completed. This paper represents the completed genome from a second <it>Thermus </it>species - <it>T. scotoductus</it>.</p> <p>Results</p> <p>The genome of <it>Thermus scotoductus </it>SA-01 consists of a chromosome of 2,346,803 bp and a small plasmid which, together are about 11% larger than the <it>Thermus thermophilus </it>genomes. The <it>T. thermophilus </it>megaplasmid genes are part of the <it>T. scotoductus </it>chromosome and extensive rearrangement, deletion of nonessential genes and acquisition of gene islands have occurred, leading to a loss of synteny between the chromosomes of <it>T. scotoductus and T. thermophilus</it>. At least nine large inserts of which seven were identified as alien, were found, the most remarkable being a denitrification cluster and two operons relating to the metabolism of phenolics which appear to have been acquired from <it>Meiothermus ruber</it>. The majority of acquired genes are from closely related species of the Deinococcus-Thermus group, and many of the remaining genes are from microorganisms with a thermophilic or hyperthermophilic lifestyle. The natural competence of <it>Thermus scotoductus </it>was confirmed experimentally as expected as most of the proteins of the natural transformation system of <it>Thermus thermophilus </it>are present. Analysis of the metabolic capabilities revealed an extensive energy metabolism with many aerobic and anaerobic respiratory options. An abundance of sensor histidine kinases, response regulators and transporters for a wide variety of compounds are indicative of an oligotrophic lifestyle.</p> <p>Conclusions</p> <p>The genome of <it>Thermus scotoductus </it>SA-01 shows remarkable plasticity with the loss, acquisition and rearrangement of large portions of its genome compared to <it>Thermus thermophilus</it>. Its ability to naturally take up foreign DNA has helped it adapt rapidly to a subsurface lifestyle in the presence of a dense and diverse population which acted as source of nutrients. The genome of <it>Thermus scotoductus </it>illustrates how rapid adaptation can be achieved by a highly dynamic and plastic genome.</p
    corecore