141 research outputs found

    Leaner and meaner genomes in Escherichia coli

    Get PDF
    A 'better' Escherichia coli K-12 genome has recently been engineered in which about 15% of the genome has been removed by planned deletions. Comparison with related bacterial genomes that have undergone a natural reduction in size suggests that there is plenty of scope for yet more deletions

    Bioinformatics 2000

    Get PDF
    A report from the Bioinformatics 2000 conference [], held in Elsinore, Denmark, 27-30 April, 2000

    Standard operating procedure for computing pangenome trees

    Get PDF
    We present the pan-genome tree as a tool for visualizing similarities and differences between closely related microbial genomes within a species or genus. Distance between genomes is computed as a weighted relative Manhattan distance based on gene family presence/absence. The weights can be chosen with emphasis on groups of gene families conserved to various degrees inside the pan-genome. The software is available for free as an R-package

    Prediction of highly expressed genes in microbes based on chromatin accessibility

    Get PDF
    BACKGROUND: It is well known that gene expression is dependent on chromatin structure in eukaryotes and it is likely that chromatin can play a role in bacterial gene expression as well. Here, we use a nucleosomal position preference measure of anisotropic DNA flexibility to predict highly expressed genes in microbial genomes. We compare these predictions with those based on codon adaptation index (CAI) values, and also with experimental data for 6 different microbial genomes, with a particular interest in experimental data from Escherichia coli. Moreover, position preference is examined further in 328 sequenced microbial genomes. RESULTS: We find that absolute gene expression levels are correlated with the position preference in many microbial genomes. It is postulated that in these regions, the DNA may be more accessible to the transcriptional machinery. Moreover, ribosomal proteins and ribosomal RNA are encoded by DNA having significantly lower position preference values than other genes in fast-replicating microbes. CONCLUSION: This insight into DNA structure-dependent gene expression in microbes may be exploited for predicting the expression of non-translated genes such as non-coding RNAs that may not be predicted by any of the conventional codon usage bias approaches

    Microbial comparative pan-genomics using binomial mixture models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The size of the core- and pan-genome of bacterial species is a topic of increasing interest due to the growing number of sequenced prokaryote genomes, many from the same species. Attempts to estimate these quantities have been made, using regression methods or mixture models. We extend the latter approach by using statistical ideas developed for capture-recapture problems in ecology and epidemiology.</p> <p>Results</p> <p>We estimate core- and pan-genome sizes for 16 different bacterial species. The results reveal a complex dependency structure for most species, manifested as heterogeneous detection probabilities. Estimated pan-genome sizes range from small (around 2600 gene families) in <it>Buchnera aphidicola </it>to large (around 43000 gene families) in <it>Escherichia coli</it>. Results for <it>Echerichia coli </it>show that as more data become available, a larger diversity is estimated, indicating an extensive pool of rarely occurring genes in the population.</p> <p>Conclusion</p> <p>Analyzing pan-genomics data with binomial mixture models is a way to handle dependencies between genomes, which we find is always present. A bottleneck in the estimation procedure is the annotation of rarely occurring genes.</p

    Reliability and applications of statistical methods based on oligonucleotide frequencies in bacterial and archaeal genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The increasing number of sequenced prokaryotic genomes contains a wealth of genomic data that needs to be effectively analysed. A set of statistical tools exists for such analysis, but their strengths and weaknesses have not been fully explored. The statistical methods we are concerned with here are mainly used to examine similarities between archaeal and bacterial DNA from different genomes. These methods compare observed genomic frequencies of fixed-sized oligonucleotides with expected values, which can be determined by genomic nucleotide content, smaller oligonucleotide frequencies, or be based on specific statistical distributions. Advantages with these statistical methods include measurements of phylogenetic relationship with relatively small pieces of DNA sampled from almost anywhere within genomes, detection of foreign/conserved DNA, and homology searches. Our aim was to explore the reliability and best suited applications for some popular methods, which include relative oligonucleotide frequencies (ROF), di- to hexanucleotide zero'th order Markov methods (ZOM) and 2.order Markov chain Method (MCM). Tests were performed on distant homology searches with large DNA sequences, detection of foreign/conserved DNA, and plasmid-host similarity comparisons. Additionally, the reliability of the methods was tested by comparing both real and random genomic DNA.</p> <p>Results</p> <p>Our findings show that the optimal method is context dependent. ROFs were best suited for distant homology searches, whilst the hexanucleotide ZOM and MCM measures were more reliable measures in terms of phylogeny. The dinucleotide ZOM method produced high correlation values when used to compare real genomes to an artificially constructed random genome with similar %GC, and should therefore be used with care. The tetranucleotide ZOM measure was a good measure to detect horizontally transferred regions, and when used to compare the phylogenetic relationships between plasmids and hosts, significant correlation <it>(R</it><sup>2 </sup>= <it>0.4) </it>was found with genomic GC content and intra-chromosomal homogeneity.</p> <p>Conclusion</p> <p>The statistical methods examined are fast, easy to implement, and powerful for a number of different applications involving genomic sequence comparisons. However, none of the measures examined were superior in all tests, and therefore the choice of the statistical method should depend on the task at hand.</p

    Comparative Genomics of Bifidobacterium, Lactobacillus and Related Probiotic Genera

    Get PDF
    Six bacterial genera containing species commonly used as probiotics for human consumption or starter cultures for food fermentation were compared and contrasted, based on publicly available complete genome sequences. The analysis included 19 Bifidobacterium genomes, 21 Lactobacillus genomes, 4 Lactococcus and 3 Leuconostoc genomes, as well as a selection of Enterococcus (11) and Streptococcus (23) genomes. The latter two genera included genomes from probiotic or commensal as well as pathogenic organisms to investigate if their non-pathogenic members shared more genes with the other probiotic genomes than their pathogenic members. The pan- and core genome of each genus was defined. Pairwise BLASTP genome comparison was performed within and between genera. It turned out that pathogenic Streptococcus and Enterococcus shared more gene families than did the non-pathogenic genomes. In silico multilocus sequence typing was carried out for all genomes per genus, and the variable gene content of genomes was compared within the genera. Informative BLAST Atlases were constructed to visualize genomic variation within genera. The clusters of orthologous groups (COG) classes of all genes in the pan- and core genome of each genus were compared. In addition, it was investigated whether pathogenic genomes contain different COG classes compared to the probiotic or fermentative organisms, again comparing their pan- and core genomes. The obtained results were compared with published data from the literature. This study illustrates how over 80 genomes can be broadly compared using simple bioinformatic tools, leading to both confirmation of known information as well as novel observations

    The qacC Gene Has Recently Spread between Rolling Circle Plasmids of Staphylococcus, Indicative of a Novel Gene Transfer Mechanism

    Get PDF
    Resistance of Staphylococcus species to quaternary ammonium compounds, frequently used as disinfectants and biocides, can be attributed to qac genes. These qac gene products belong to the Small Multidrug Resistant (SMR) protein family, and are often encoded by rolling-circle (RC) replicating plasmids. Four classes of SMR-type qac gene families have been described in Staphylococcus species: qacC, qacG, qacJ and qacH. Within their class, these genes are highly conserved, but qacC genes are extremely conserved, although they are found in variable plasmid backgrounds. The lower degree of sequence identity of these plasmids compared to the strict nucleotide conservation of their qacC means that this gene has recently spread. In the absence of insertion sequences or other genetic elements explaining the mobility, we sought for an explanation of mobilization by sequence comparison. Publically available sequences of qac genes, their flanking genes and the replication gene that is invariably present in RC-plasmids were compared to reconstruct the evolutionary history of these plasmids and to explain the recent spread of qacC. Here we propose a new model that explains how qacC is mobilized and transferred to acceptor RC-plasmids without assistance of other genes, by means of its location in between the Double Strand replication Origin (DSO) and the Single-Strand replication Origin (SSO). The proposed mobilization model of this DSO-qacC-SSO element represents a novel mechanism of gene mobilization in RC-plasmids, which has also been employed by other genes, such as lnuA (conferring lincomycin resistance). The proposed gene mobility has aided to the wide spread of clinically relevant resistance genes in Staphylococcus populations
    corecore