87 research outputs found

    Adding complexity to complexity: Gene family evolution in polyploids

    Get PDF
    Comparative genomics of non-model organisms has resurrected whole genome duplication (WGD) from being viewed as a somewhat obscure process that happens in plants to a primary driver of eukaryotic diversification. The shadow of past ploidy increases has left a strong signature of duplicated genes organized into gene families, even in small genomes that have undergone effectively complete rediploidization. Nevertheless, despite continually advancing technologies and bioinformatics pipelines, resolving the fate of duplicate genes remains a substantial challenge. For example, many important recognition processes are driven not only by allelic expansion through retention of duplicates but also by diversification and copy number variation. This creates technical difficulties with assembly to reference genomes and accurate interpretation of homology. Thus, relatively little is known about the impacts of recent polyploidization and hybridization on the evolution of gene families under selective forces that maintain diversity, such as balancing selection. Here we use a complex of species and ploidy levels in the genus Arabidopsis (A. lyrata and A. arenosa) as a model to investigate the evolutionary dynamics of a large and complicated gene family known to be under strong balancing selection: the receptor-like kinases, which include the female component of genetically controlled self-incompatibility. Specifically, we question: (1) How does diversity of S-receptor kinase (SRK) alleles in tetraploids compare to that in their close diploid relatives? (2) Is there increased trans-specific polymorphism (i.e., sharing of alleles that transcend speciation, characteristic of balancing selection) in tetraploids compared to diploids due to the higher number of copies they carry? (3) Do these highly variable loci show evidence of introgression among extant species/ploidy levels within or outside known zones of hybridization? (4) Is there evidence for copy number variation among paralogs? We use this example to highlight specific issues to consider when interpreting gene family evolution, particularly in relation to polyploids but also more generally in diploids. We conclude with recommendations for strategies to address the challenges of resolving such complex loci in the future, using advances in deep sequencing approaches

    Assessing diversity of the female urine microbiota by high throughput sequencing of 16S rDNA amplicons

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Urine within the urinary tract is commonly regarded as "sterile" in cultivation terms. Here, we present a comprehensive in-depth study of bacterial 16S rDNA sequences associated with urine from healthy females by means of culture-independent high-throughput sequencing techniques.</p> <p>Results</p> <p>Sequencing of the V1V2 and V6 regions of the 16S ribosomal RNA gene using the 454 GS FLX system was performed to characterize the possible bacterial composition in 8 culture-negative (<100,000 CFU/ml) healthy female urine specimens. Sequences were compared to 16S rRNA databases and showed significant diversity, with the predominant genera detected being <it>Lactobacillus</it>, <it>Prevotella </it>and <it>Gardnerella</it>. The bacterial profiles in the female urine samples studied were complex; considerable variation between individuals was observed and a common microbial signature was not evident. Notably, a significant amount of sequences belonging to bacteria with a known pathogenic potential was observed. The number of operational taxonomic units (OTUs) for individual samples varied substantially and was in the range of 20 - 500.</p> <p>Conclusions</p> <p>Normal female urine displays a noticeable and variable bacterial 16S rDNA sequence richness, which includes fastidious and anaerobic bacteria previously shown to be associated with female urogenital pathology.</p

    Genomic comparisons of Brucella spp. and closely related bacteria using base compositional and proteome based methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Classification of bacteria within the genus <it>Brucella </it>has been difficult due in part to considerable genomic homogeneity between the different species and biovars, in spite of clear differences in phenotypes. Therefore, many different methods have been used to assess <it>Brucella </it>taxonomy. In the current work, we examine 32 sequenced genomes from genus <it>Brucella </it>representing the six classical species, as well as more recently described species, using bioinformatical methods. Comparisons were made at the level of genomic DNA using oligonucleotide based methods (Markov chain based genomic signatures, genomic codon and amino acid frequencies based comparisons) and proteomes (all-against-all BLAST protein comparisons and pan-genomic analyses).</p> <p>Results</p> <p>We found that the oligonucleotide based methods gave different results compared to that of the proteome based methods. Differences were also found between the oligonucleotide based methods used. Whilst the Markov chain based genomic signatures grouped the different species in genus <it>Brucella </it>according to host preference, the codon and amino acid frequencies based methods reflected small differences between the <it>Brucella </it>species. Only minor differences could be detected between all genera included in this study using the codon and amino acid frequencies based methods.</p> <p>Proteome comparisons were found to be in strong accordance with current <it>Brucella </it>taxonomy indicating a remarkable association between gene gain or loss on one hand and mutations in marker genes on the other. The proteome based methods found greater similarity between <it>Brucella </it>species and <it>Ochrobactrum </it>species than between species within genus <it>Agrobacterium </it>compared to each other. In other words, proteome comparisons of species within genus <it>Agrobacterium </it>were found to be more diverse than proteome comparisons between species in genus <it>Brucella </it>and genus <it>Ochrobactrum</it>. Pan-genomic analyses indicated that uptake of DNA from outside genus <it>Brucella </it>appears to be limited.</p> <p>Conclusions</p> <p>While both the proteome based methods and the Markov chain based genomic signatures were able to reflect environmental diversity between the different species and strains of genus <it>Brucella</it>, the genomic codon and amino acid frequencies based comparisons were not found adequate for such comparisons. The proteome comparison based phylogenies of the species in genus <it>Brucella </it>showed a surprising consistency with current <it>Brucella </it>taxonomy.</p

    Custom Design and Analysis of High-Density Oligonucleotide Bacterial Tiling Microarrays

    Get PDF
    Not until recently have custom made high-density oligonucleotide microarrays been available at an affordable price. The aim of this thesis was to design microarrays and analysis algorithms for DNA repair and DNA damage detection, and to apply the methods in real experiments. Thomassen et al. have used their custom designed whole genome-tiling microarrays for detection of transcriptional changes in Escherichia coli after exposure to DNA damageing reagents. The transcriptional changes in E. coli treated with UV light or the methylating reagent MNNG were shown to be larger and to include far more genes than previously reported. To optimize the data analysis for the custom made arrays, Thomassen and coworkers designed their own normalization and analysis algorithms, and showed these more suitable than established methods that are currently applied on custom tiling arrays. Among other findings several novel stress-induced transcripts were detected, of which one is predicted to be a UV-induced short transmembrane protein. Additionally, no upregulation of the previously described UV-inducible aidB is shown. In the MNNG study several genes are shown as downregulated in response to DNA damage although having upstream regulatory sequences similar to the established LexA box A and B. This indicates that the LexA regulon also might control gene repression and that the box A and B sequence can not alone answer for the LexA controlled gene regulation. Thomassen et al. have also custom designed a microarray for oncogenic fusion gene detection. Cancer specific fusion genes are often used to subgroup cancers and to define the optimal treatment, but currently the laboratory detection procedure is both laborious and tedious. In a blinded study on six cancer cell lines proof of principle was shown by detection of six out of six positive controls. The design and analysis methods for this microarray are now being refined to make a diagnostic fusion gene detection tool

    Analysis of intra-genomic GC content homogeneity within prokaryotes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Bacterial genomes possess varying GC content (total guanines (Gs) and cytosines (Cs) per total of the four bases within the genome) but within a given genome, GC content can vary locally along the chromosome, with some regions significantly more or less GC rich than on average. We have examined how the GC content varies within microbial genomes to assess whether this property can be associated with certain biological functions related to the organism's environment and phylogeny. We utilize a new quantity <it>GCVAR</it>, the intra-genomic GC content variability with respect to the average GC content of the total genome. A low <it>GCVAR </it>indicates intra-genomic GC homogeneity and high <it>GCVAR </it>heterogeneity.</p> <p>Results</p> <p>The regression analyses indicated that <it>GCVAR </it>was significantly associated with domain (i.e. archaea or bacteria), phylum, and oxygen requirement. <it>GCVAR </it>was significantly higher among anaerobes than both aerobic and facultative microbes. Although an association has previously been found between mean genomic GC content and oxygen requirement, our analysis suggests that no such association exits when phylogenetic bias is accounted for. A significant association between <it>GCVAR </it>and mean GC content was also found but appears to be non-linear and varies greatly among phyla.</p> <p>Conclusions</p> <p>Our findings show that <it>GCVAR </it>is linked with oxygen requirement, while mean genomic GC content is not. We therefore suggest that <it>GCVAR </it>should be used as a complement to mean GC content.</p

    Tenacibaculosis in Norwegian Atlantic salmon (Salmo salar) cage-farmed in cold sea water is primarily associated with Tenacibaculum finnmarkense genomovar finnmarkense

    Get PDF
    Skin conditions associated with Tenacibaculum spp. constitute a significant threat to the health and welfare of sea-farmed Atlantic salmon (Salmo salar L.) in Norway. Fifteen presumptive tenacibaculosis outbreaks distributed along the Norwegian coast during the late winter and spring of 2018 were investigated. Bacteriological culture confirmed the presence of Tenacibaculum spp. Seventy-six isolates cultured from individual fish were selected and subjected to whole-genome sequencing and MALDI-TOF MS analysis. Average nucleotide identity and MALDI-TOF analyses confirmed the presence of T. finnmarkense and T. dicentrarchi, with further division of T. finnmarkense into genomovars (gv.) finnmarkense and ulcerans. Core genome multilocus sequence typing (cgMLST) and single-nucleotide polymorphism (SNP) analyses identified the presence of a genetically conserved cluster of gv. finnmarkense isolates against a background of relatively genetically diverse gv. finnmarkense and gv. ulcerans isolates in 13 of the 15 studied cases. This clustering strongly suggests a link between T. finnmarkense gv. finnmarkense and development of clinical tenacibaculosis in sea-farmed Norwegian salmon in the late winter and spring. Analysis of 25 Tenacibaculum isolates collected during the spring of 2019 from similar cases identified a similar distribution of genotypes. Low water temperatures were common to all cases, and most incidences involved relatively small fish shortly after sea transfer, suggesting that these fish are particularly predisposed to Tenacibaculum infection.publishedVersio

    Analysis of evolutionary patterns of genes in Campylobacter jejuni and C. coli

    Get PDF
    BACKGROUND: The thermophilic Campylobacter jejuni and Campylobacter coli are considered weakly clonal populations where incongruences between genetic markers are assumed to be due to random horizontal transfer of genomic DNA. In order to investigate the population genetics structure we extracted a set of 1180 core gene families (CGF) from 27 sequenced genomes of C. jejuni and C. coli. We adopted a principal component analysis (PCA) on the normalized evolutionary distances in order to reveal any patterns in the evolutionary signals contained within the various CGFs. RESULTS: The analysis indicates that the conserved genes in Campylobacter show at least two, possibly five, distinct patterns of evolutionary signals, seen as clusters in the score-space of our PCA. The dominant underlying factor separating the core genes is the ability to distinguish C. jejuni from C. coli. The genes in the clusters outside the main gene group have a strong tendency of being chromosomal neighbors, which is natural if they share a common evolutionary history. Also, the most distinct cluster outside the main group is enriched with genes under positive selection and displays larger than average recombination rates. CONCLUSIONS: The Campylobacter genomes investigated here show that subsets of conserved genes differ from each other in a more systematic way than expected by random horizontal transfer, and is consistent with differences in selection pressure acting on different genes. These findings are indications of a population of bacteria characterized by genomes with a mixture of evolutionary patterns

    Genome dynamics in major bacterial pathogens

    Get PDF
    Pathogenic bacteria continuously encounter multiple forms of stress in their hostile environments, which leads to DNA damage. With the new insight into biology offered by genome sequences, the elucidation of the gene content encoding proteins provides clues toward understanding the microbial lifestyle related to habitat and niche. Campylobacter jejuni, Haemophilus influenzae, Helicobacter pylori, Mycobacterium tuberculosis, the pathogenic Neisseria, Streptococcus pneumoniae, Streptococcus pyogenes and Staphylococcus aureus are major human pathogens causing detrimental morbidity and mortality at a global scale. An algorithm for the clustering of orthologs was established in order to identify whether orthologs of selected genes were present or absent in the genomes of the pathogenic bacteria under study. Based on the known genes for the various functions and their orthologs in selected pathogenic bacteria, an overview of the presence of the different types of genes was created. In this context, we focus on selected processes enabling genome dynamics in these particular pathogens, namely DNA repair, recombination and horizontal gene transfer. An understanding of the precise molecular functions of the enzymes participating in DNA metabolism and their importance in the maintenance of bacterial genome integrity has also, in recent years, indicated a future role for these enzymes as targets for therapeutic intervention

    Genomic Characterization of Campylobacter jejuni Strain M1

    Get PDF
    Campylobacter jejuni strain M1 (laboratory designation 99/308) is a rarely documented case of direct transmission of C. jejuni from chicken to a person, resulting in enteritis. We have sequenced the genome of C. jejuni strain M1, and compared this to 12 other C. jejuni sequenced genomes currently publicly available. Compared to these, M1 is closest to strain 81116. Based on the 13 genome sequences, we have identified the C. jejuni pan-genome, as well as the core genome, the auxiliary genes, and genes unique between strains M1 and 81116. The pan-genome contains 2,427 gene families, whilst the core genome comprised 1,295 gene families, or about two-thirds of the gene content of the average of the sequenced C. jejuni genomes. Various comparison and visualization tools were applied to the 13 C. jejuni genome sequences, including a species pan- and core genome plot, a BLAST Matrix and a BLAST Atlas. Trees based on 16S rRNA sequences and on the total gene families in each genome are presented. The findings are discussed in the background of the proven virulence potential of M1

    RNAmmer: consistent and rapid annotation of ribosomal RNA genes

    Get PDF
    The publication of a complete genome sequence is usually accompanied by annotations of its genes. In contrast to protein coding genes, genes for ribosomal RNA (rRNA) are often poorly or inconsistently annotated. This makes comparative studies based on rRNA genes difficult. We have therefore created computational predictors for the major rRNA species from all kingdoms of life and compiled them into a program called RNAmmer. The program uses hidden Markov models trained on data from the 5S ribosomal RNA database and the European ribosomal RNA database project. A pre-screening step makes the method fast with little loss of sensitivity, enabling the analysis of a complete bacterial genome in less than a minute. Results from running RNAmmer on a large set of genomes indicate that the location of rRNAs can be predicted with a very high level of accuracy. Novel, unannotated rRNAs are also predicted in many genomes. The software as well as the genome analysis results are available at the CBS web server
    corecore