143 research outputs found

    A Genomic Distance Based on MUM Indicates Discontinuity between Most Bacterial Species and Genera

    Get PDF
    The fundamental unit of biological diversity is the species. However, a remarkable extent of intraspecies diversity in bacteria was discovered by genome sequencing, and it reveals the need to develop clear criteria to group strains within a species. Two main types of analyses used to quantify intraspecies variation at the genome level are the average nucleotide identity (ANI), which detects the DNA conservation of the core genome, and the DNA content, which calculates the proportion of DNA shared by two genomes. Both estimates are based on BLAST alignments for the definition of DNA sequences common to the genome pair. Interestingly, however, results using these methods on intraspecies pairs are not well correlated. This prompted us to develop a genomic-distance index taking into account both criteria of diversity, which are based on DNA maximal unique matches (MUM) shared by two genomes. The values, called MUMi, for MUM index, correlate better with the ANI than with the DNA content. Moreover, the MUMi groups strains in a way that is congruent with routinely used multilocus sequence-typing trees, as well as with ANI-based trees. We used the MUMi to determine the relatedness of all available genome pairs at the species and genus levels. Our analysis reveals a certain consistency in the current notion of bacterial species, in that the bulk of intraspecies and intragenus values are clearly separable. It also confirms that some species are much more diverse than most. As the MUMi is fast to calculate, it offers the possibility of measuring genome distances on the whole database of available genomes

    Small variable segments constitute a major type of diversity of bacterial genomes at the species level.

    Get PDF
    International audienceBACKGROUND: Analysis of large scale diversity in bacterial genomes has mainly focused on elements such as pathogenicity islands, or more generally, genomic islands. These comprise numerous genes and confer important phenotypes, which are present or absent depending on strains. We report that despite this widely accepted notion, most diversity at the species level is composed of much smaller DNA segments, 20 to 500 bp in size, which we call microdiversity. RESULTS: We performed a systematic analysis of the variable segments detected by multiple whole genome alignments at the DNA level on three species for which the greatest number of genomes have been sequenced: Escherichia coli, Staphylococcus aureus, and Streptococcus pyogenes. Among the numerous sites of variability, 62 to 73% were loci of microdiversity, many of which were located within genes. They contribute to phenotypic variations, as 3 to 6% of all genes harbor microdiversity, and 1 to 9% of total genes are located downstream from a microdiversity locus. Microdiversity loci are particularly abundant in genes encoding membrane proteins. In-depth analysis of the E. coli alignments shows that most of the diversity does not correspond to known mobile or repeated elements, and it is likely that they were generated by illegitimate recombination. An intriguing class of microdiversity includes small blocks of highly diverged sequences, whose origin is discussed. CONCLUSIONS: This analysis uncovers the importance of this small-sized genome diversity, which we expect to be present in a wide range of bacteria, and possibly also in many eukaryotic genomes

    The MatP/matS Site-Specific System Organizes the Terminus Region of the E. coli Chromosome into a Macrodomain

    Get PDF
    The organization of the Escherichia coli chromosome into insulated macrodomains influences the segregation of sister chromatids and the mobility of chromosomal DNA. Here, we report that organization of the Terminus region (Ter) into a macrodomain relies on the presence of a 13 bp motif called matS repeated 23 times in the 800-kb-long domain. matS sites are the main targets in the E. coli chromosome of a newly identified protein designated MatP. MatP accumulates in the cell as a discrete focus that colocalizes with the Ter macrodomain. The effects of MatP inactivation reveal its role as main organizer of the Ter macrodomain: in the absence of MatP, DNA is less compacted, the mobility of markers is increased, and segregation of Ter macrodomain occurs early in the cell cycle. Our results indicate that a specific organizational system is required in the Terminus region for bacterial chromosome management during the cell cycle

    Low Efficiency of Homology-Facilitated Illegitimate Recombination during Conjugation in Escherichia coli

    Get PDF
    Homology-facilitated illegitimate recombination has been described in three naturally competent bacterial species. It permits integration of small linear DNA molecules into the chromosome by homologous recombination at one end of the linear DNA substrate, and illegitimate recombination at the other end. We report that homology-facilitated illegitimate recombination also occurs in Escherichia coli during conjugation with small non-replicative plasmids, but at a low frequency of 3×10−10 per recipient cell. The fate of linear DNA in E. coli is either RecBCD-dependent degradation, or circularisation by ligation, and integration into the chromosome by single crossing-over. We also report that the observed single crossing-overs are recA-dependent, but essentially recBCD, and recFOR independent. This suggests that other, still unknown, proteins may act as mediator for the loading of RecA on DNA during single crossing-over recombination in E. coli

    MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level

    Get PDF
    BACKGROUND: The recent availability of complete sequences for numerous closely related bacterial genomes opens up new challenges in comparative genomics. Several methods have been developed to align complete genomes at the nucleotide level but their use and the biological interpretation of results are not straightforward. It is therefore necessary to develop new resources to access, analyze, and visualize genome comparisons. DESCRIPTION: Here we present recent developments on MOSAIC, a generalist comparative bacterial genome database. This database provides the bacteriologist community with easy access to comparisons of complete bacterial genomes at the intra-species level. The strategy we developed for comparison allows us to define two types of regions in bacterial genomes: backbone segments (i.e., regions conserved in all compared strains) and variable segments (i.e., regions that are either specific to or variable in one of the aligned genomes). Definition of these segments at the nucleotide level allows precise comparative and evolutionary analyses of both coding and non-coding regions of bacterial genomes. Such work is easily performed using the MOSAIC Web interface, which allows browsing and graphical visualization of genome comparisons. CONCLUSION: The MOSAIC database now includes 493 pairwise comparisons and 35 multiple maximal comparisons representing 78 bacterial species. Genome conserved regions (backbones) and variable segments are presented in various formats for further analysis. A graphical interface allows visualization of aligned genomes and functional annotations. The MOSAIC database is available online at http://genome.jouy.inra.fr/mosaic

    Organised Genome Dynamics in the Escherichia coli Species Results in Highly Diverse Adaptive Paths

    Get PDF
    The Escherichia coli species represents one of the best-studied model organisms, but also encompasses a variety of commensal and pathogenic strains that diversify by high rates of genetic change. We uniformly (re-) annotated the genomes of 20 commensal and pathogenic E. coli strains and one strain of E. fergusonii (the closest E. coli related species), including seven that we sequenced to completion. Within the ∼18,000 families of orthologous genes, we found ∼2,000 common to all strains. Although recombination rates are much higher than mutation rates, we show, both theoretically and using phylogenetic inference, that this does not obscure the phylogenetic signal, which places the B2 phylogenetic group and one group D strain at the basal position. Based on this phylogeny, we inferred past evolutionary events of gain and loss of genes, identifying functional classes under opposite selection pressures. We found an important adaptive role for metabolism diversification within group B2 and Shigella strains, but identified few or no extraintestinal virulence-specific genes, which could render difficult the development of a vaccine against extraintestinal infections. Genome flux in E. coli is confined to a small number of conserved positions in the chromosome, which most often are not associated with integrases or tRNA genes. Core genes flanking some of these regions show higher rates of recombination, suggesting that a gene, once acquired by a strain, spreads within the species by homologous recombination at the flanking genes. Finally, the genome's long-scale structure of recombination indicates lower recombination rates, but not higher mutation rates, at the terminus of replication. The ensuing effect of background selection and biased gene conversion may thus explain why this region is A+T-rich and shows high sequence divergence but low sequence polymorphism. Overall, despite a very high gene flow, genes co-exist in an organised genome

    Detection of novel recombinases in bacteriophage genomes unveils Rad52, Rad51 and Gp2.5 remote homologs

    Get PDF
    Homologous recombination is a key in contributing to bacteriophages genome repair, circularization and replication. No less than six kinds of recombinase genes have been reported so far in bacteriophage genomes, two (UvsX and Gp2.5) from virulent, and four (Sak, Redβ, Erf and Sak4) from temperate phages. Using profile–profile comparisons, structure-based modelling and gene-context analyses, we provide new views on the global landscape of recombinases in 465 bacteriophages. We show that Sak, Redβ and Erf belong to a common large superfamily adopting a shortcut Rad52-like fold. Remote homologs of Sak4 are predicted to adopt a shortcut Rad51/RecA fold and are discovered widespread among phage genomes. Unexpectedly, within temperate phages, gene-context analyses also pinpointed the presence of distant Gp2.5 homologs, believed to be restricted to virulent phages. All in all, three major superfamilies of phage recombinases emerged either related to Rad52-like, Rad51-like or Gp2.5-like proteins. For two newly detected recombinases belonging to the Sak4 and Gp2.5 families, we provide experimental evidence of their recombination activity in vivo. Temperate versus virulent lifestyle together with the importance of genome mosaicism is discussed in the light of these novel recombinases. Screening for these recombinases in genomes can be performed at http://biodev.extra.cea.fr/virfam

    The λ Red Proteins Promote Efficient Recombination between Diverged Sequences: Implications for Bacteriophage Genome Mosaicism

    Get PDF
    Genome mosaicism in temperate bacterial viruses (bacteriophages) is so great that it obscures their phylogeny at the genome level. However, the precise molecular processes underlying this mosaicism are unknown. Illegitimate recombination has been proposed, but homeologous recombination could also be at play. To test this, we have measured the efficiency of homeologous recombination between diverged oxa gene pairs inserted into λ. High yields of recombinants between 22% diverged genes have been obtained when the virus Red Gam pathway was active, and 100 fold less when the host Escherichia coli RecABCD pathway was active. The recombination editing proteins, MutS and UvrD, showed only marginal effects on λ recombination. Thus, escape from host editing contributes to the high proficiency of virus recombination. Moreover, our bioinformatics study suggests that homeologous recombination between similar lambdoid viruses has created part of their mosaicism. We therefore propose that the remarkable propensity of the λ-encoded Red and Gam proteins to recombine diverged DNA is effectively contributing to mosaicism, and more generally, that a correlation may exist between virus genome mosaicism and the presence of Red/Gam-like systems

    16p11.2 600 kb Duplications confer risk for typical and atypical Rolandic epilepsy

    Get PDF
    Rolandic epilepsy (RE) is the most common idiopathic focal childhood epilepsy. Its molecular basis is largely unknown and a complex genetic etiology is assumed in the majority of affected individuals. The present study tested whether six large recurrent copy number variants at 1q21, 15q11.2, 15q13.3, 16p11.2, 16p13.11 and 22q11.2 previously associated with neurodevelopmental disorders also increase risk of RE. Our association analyses revealed a significant excess of the 600 kb genomic duplication at the 16p11.2 locus (chr16: 29.5-30.1 Mb) in 393 unrelated patients with typical (n = 339) and atypical (ARE; n = 54) RE compared with the prevalence in 65 046 European population controls (5/393 cases versus 32/65 046 controls; Fisher's exact test P = 2.83 × 10−6, odds ratio = 26.2, 95% confidence interval: 7.9-68.2). In contrast, the 16p11.2 duplication was not detected in 1738 European epilepsy patients with either temporal lobe epilepsy (n = 330) and genetic generalized epilepsies (n = 1408), suggesting a selective enrichment of the 16p11.2 duplication in idiopathic focal childhood epilepsies (Fisher's exact test P = 2.1 × 10−4). In a subsequent screen among children carrying the 16p11.2 600 kb rearrangement we identified three patients with RE-spectrum epilepsies in 117 duplication carriers (2.6%) but none in 202 carriers of the reciprocal deletion. Our results suggest that the 16p11.2 duplication represents a significant genetic risk factor for typical and atypical R

    Effects of eight neuropsychiatric copy number variants on human brain structure

    Get PDF
    Many copy number variants (CNVs) confer risk for the same range of neurodevelopmental symptoms and psychiatric conditions including autism and schizophrenia. Yet, to date neuroimaging studies have typically been carried out one mutation at a time, showing that CNVs have large effects on brain anatomy. Here, we aimed to characterize and quantify the distinct brain morphometry effects and latent dimensions across 8 neuropsychiatric CNVs. We analyzed T1-weighted MRI data from clinically and non-clinically ascertained CNV carriers (deletion/duplication) at the 1q21.1 (n = 39/28), 16p11.2 (n = 87/78), 22q11.2 (n = 75/30), and 15q11.2 (n = 72/76) loci as well as 1296 non-carriers (controls). Case-control contrasts of all examined genomic loci demonstrated effects on brain anatomy, with deletions and duplications showing mirror effects at the global and regional levels. Although CNVs mainly showed distinct brain patterns, principal component analysis (PCA) loaded subsets of CNVs on two latent brain dimensions, which explained 32 and 29% of the variance of the 8 Cohen’s d maps. The cingulate gyrus, insula, supplementary motor cortex, and cerebellum were identified by PCA and multi-view pattern learning as top regions contributing to latent dimension shared across subsets of CNVs. The large proportion of distinct CNV effects on brain morphology may explain the small neuroimaging effect sizes reported in polygenic psychiatric conditions. Nevertheless, latent gene brain morphology dimensions will help subgroup the rapidly expanding landscape of neuropsychiatric variants and dissect the heterogeneity of idiopathic conditions
    corecore