31 research outputs found

    A Genomic Distance Based on MUM Indicates Discontinuity between Most Bacterial Species and Genera

    Get PDF
    The fundamental unit of biological diversity is the species. However, a remarkable extent of intraspecies diversity in bacteria was discovered by genome sequencing, and it reveals the need to develop clear criteria to group strains within a species. Two main types of analyses used to quantify intraspecies variation at the genome level are the average nucleotide identity (ANI), which detects the DNA conservation of the core genome, and the DNA content, which calculates the proportion of DNA shared by two genomes. Both estimates are based on BLAST alignments for the definition of DNA sequences common to the genome pair. Interestingly, however, results using these methods on intraspecies pairs are not well correlated. This prompted us to develop a genomic-distance index taking into account both criteria of diversity, which are based on DNA maximal unique matches (MUM) shared by two genomes. The values, called MUMi, for MUM index, correlate better with the ANI than with the DNA content. Moreover, the MUMi groups strains in a way that is congruent with routinely used multilocus sequence-typing trees, as well as with ANI-based trees. We used the MUMi to determine the relatedness of all available genome pairs at the species and genus levels. Our analysis reveals a certain consistency in the current notion of bacterial species, in that the bulk of intraspecies and intragenus values are clearly separable. It also confirms that some species are much more diverse than most. As the MUMi is fast to calculate, it offers the possibility of measuring genome distances on the whole database of available genomes

    Small variable segments constitute a major type of diversity of bacterial genomes at the species level.

    Get PDF
    International audienceBACKGROUND: Analysis of large scale diversity in bacterial genomes has mainly focused on elements such as pathogenicity islands, or more generally, genomic islands. These comprise numerous genes and confer important phenotypes, which are present or absent depending on strains. We report that despite this widely accepted notion, most diversity at the species level is composed of much smaller DNA segments, 20 to 500 bp in size, which we call microdiversity. RESULTS: We performed a systematic analysis of the variable segments detected by multiple whole genome alignments at the DNA level on three species for which the greatest number of genomes have been sequenced: Escherichia coli, Staphylococcus aureus, and Streptococcus pyogenes. Among the numerous sites of variability, 62 to 73% were loci of microdiversity, many of which were located within genes. They contribute to phenotypic variations, as 3 to 6% of all genes harbor microdiversity, and 1 to 9% of total genes are located downstream from a microdiversity locus. Microdiversity loci are particularly abundant in genes encoding membrane proteins. In-depth analysis of the E. coli alignments shows that most of the diversity does not correspond to known mobile or repeated elements, and it is likely that they were generated by illegitimate recombination. An intriguing class of microdiversity includes small blocks of highly diverged sequences, whose origin is discussed. CONCLUSIONS: This analysis uncovers the importance of this small-sized genome diversity, which we expect to be present in a wide range of bacteria, and possibly also in many eukaryotic genomes

    Identification of DNA Motifs Implicated in Maintenance of Bacterial Core Genomes by Predictive Modeling

    Get PDF
    Bacterial biodiversity at the species level, in terms of gene acquisition or loss, is so immense that it raises the question of how essential chromosomal regions are spared from uncontrolled rearrangements. Protection of the genome likely depends on specific DNA motifs that impose limits on the regions that undergo recombination. Although most such motifs remain unidentified, they are theoretically predictable based on their genomic distribution properties. We examined the distribution of the “crossover hotspot instigator,” or Chi, in Escherichia coli, and found that its exceptional distribution is restricted to the core genome common to three strains. We then formulated a set of criteria that were incorporated in a statistical model to search core genomes for motifs potentially involved in genome stability in other species. Our strategy led us to identify and biologically validate two distinct heptamers that possess Chi properties, one in Staphylococcus aureus, and the other in several streptococci. This strategy paves the way for wide-scale discovery of other important functional noncoding motifs that distinguish core genomes from the strain-variable regions

    Quantitative genomic analysis of RecA protein binding during DNA double-strand break repair reveals RecBCD action in vivo

    Get PDF
    International audienceUnderstanding molecular mechanisms in the context of living cells requires the development of new methods of in vivo biochemical analysis to complement established in vitro biochemistry. A critically important molecular mechanism is genetic recombination, required for the beneficial reassortment of genetic information and for DNA double-strand break repair (DSBR). Central to recom-bination is the RecA (Rad51) protein that assembles into a spiral filament on DNA and mediates genetic exchange. Here we have developed a method that combines chromatin immunoprecipita-tion with next-generation sequencing (ChIP-Seq) and mathematical modeling to quantify RecA protein binding during the active repair of a single DSB in the chromosome of Escherichia coli. We have used quantitative genomic analysis to infer the key in vivo molecular parameters governing RecA loading by the helicase/ nuclease RecBCD at recombination hot-spots, known as Chi. Our genomic analysis has also revealed that DSBR at the lacZ locus causes a second RecBCD-mediated DSBR event to occur in the terminus region of the chromosome, over 1 Mb away. homologous recombination | mechanistic modelling | DNA repair | RecA

    The MatP/matS Site-Specific System Organizes the Terminus Region of the E. coli Chromosome into a Macrodomain

    Get PDF
    The organization of the Escherichia coli chromosome into insulated macrodomains influences the segregation of sister chromatids and the mobility of chromosomal DNA. Here, we report that organization of the Terminus region (Ter) into a macrodomain relies on the presence of a 13 bp motif called matS repeated 23 times in the 800-kb-long domain. matS sites are the main targets in the E. coli chromosome of a newly identified protein designated MatP. MatP accumulates in the cell as a discrete focus that colocalizes with the Ter macrodomain. The effects of MatP inactivation reveal its role as main organizer of the Ter macrodomain: in the absence of MatP, DNA is less compacted, the mobility of markers is increased, and segregation of Ter macrodomain occurs early in the cell cycle. Our results indicate that a specific organizational system is required in the Terminus region for bacterial chromosome management during the cell cycle

    Co-evolution of segregation guide DNA motifs and the FtsK translocase in bacteria: identification of the atypical Lactococcus lactis KOPS motif

    Get PDF
    Bacteria use the global bipolarization of their chromosomes into replichores to control the dynamics and segregation of their genome during the cell cycle. This involves the control of protein activities by recognition of specific short DNA motifs whose orientation along the chromosome is highly skewed. The KOPS motifs act in chromosome segregation by orienting the activity of the FtsK DNA translocase towards the terminal replichore junction. KOPS motifs have been identified in γ-Proteobacteria and in Bacillus subtilis as closely related G-rich octamers. We have identified the KOPS motif of Lactococcus lactis, a model bacteria of the Streptococcaceae family harbouring a compact and low GC% genome. This motif, 5′-GAAGAAG-3, was predicted in silico using the occurrence and skew characteristics of known KOPS motifs. We show that it is specifically recognized by L. lactis FtsK in vitro and controls its activity in vivo. L. lactis KOPS is thus an A-rich heptamer motif. Our results show that KOPS-controlled chromosome segregation is conserved in Streptococcaceae but that KOPS may show important variation in sequence and length between bacterial families. This suggests that FtsK adapts to its host genome by selecting motifs with convenient occurrence frequencies and orientation skews to orient its activity

    MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level

    Get PDF
    BACKGROUND: The recent availability of complete sequences for numerous closely related bacterial genomes opens up new challenges in comparative genomics. Several methods have been developed to align complete genomes at the nucleotide level but their use and the biological interpretation of results are not straightforward. It is therefore necessary to develop new resources to access, analyze, and visualize genome comparisons. DESCRIPTION: Here we present recent developments on MOSAIC, a generalist comparative bacterial genome database. This database provides the bacteriologist community with easy access to comparisons of complete bacterial genomes at the intra-species level. The strategy we developed for comparison allows us to define two types of regions in bacterial genomes: backbone segments (i.e., regions conserved in all compared strains) and variable segments (i.e., regions that are either specific to or variable in one of the aligned genomes). Definition of these segments at the nucleotide level allows precise comparative and evolutionary analyses of both coding and non-coding regions of bacterial genomes. Such work is easily performed using the MOSAIC Web interface, which allows browsing and graphical visualization of genome comparisons. CONCLUSION: The MOSAIC database now includes 493 pairwise comparisons and 35 multiple maximal comparisons representing 78 bacterial species. Genome conserved regions (backbones) and variable segments are presented in various formats for further analysis. A graphical interface allows visualization of aligned genomes and functional annotations. The MOSAIC database is available online at http://genome.jouy.inra.fr/mosaic

    Organised Genome Dynamics in the Escherichia coli Species Results in Highly Diverse Adaptive Paths

    Get PDF
    The Escherichia coli species represents one of the best-studied model organisms, but also encompasses a variety of commensal and pathogenic strains that diversify by high rates of genetic change. We uniformly (re-) annotated the genomes of 20 commensal and pathogenic E. coli strains and one strain of E. fergusonii (the closest E. coli related species), including seven that we sequenced to completion. Within the ∼18,000 families of orthologous genes, we found ∼2,000 common to all strains. Although recombination rates are much higher than mutation rates, we show, both theoretically and using phylogenetic inference, that this does not obscure the phylogenetic signal, which places the B2 phylogenetic group and one group D strain at the basal position. Based on this phylogeny, we inferred past evolutionary events of gain and loss of genes, identifying functional classes under opposite selection pressures. We found an important adaptive role for metabolism diversification within group B2 and Shigella strains, but identified few or no extraintestinal virulence-specific genes, which could render difficult the development of a vaccine against extraintestinal infections. Genome flux in E. coli is confined to a small number of conserved positions in the chromosome, which most often are not associated with integrases or tRNA genes. Core genes flanking some of these regions show higher rates of recombination, suggesting that a gene, once acquired by a strain, spreads within the species by homologous recombination at the flanking genes. Finally, the genome's long-scale structure of recombination indicates lower recombination rates, but not higher mutation rates, at the terminus of replication. The ensuing effect of background selection and biased gene conversion may thus explain why this region is A+T-rich and shows high sequence divergence but low sequence polymorphism. Overall, despite a very high gene flow, genes co-exist in an organised genome
    corecore