11 research outputs found
Candidate genes for shell colour polymorphism in [i]Cepaea nemoralis[i]
The characteristic ground colour and banding patterns on shells of the land snail Cepaea nemoralis form a classic study system for genetics and adaptation. We use RNAseq analysis to identify candidate genes underlying this polymorphism. We sequenced cDNA from the body and the mantle (the shell-producing tissue) of four individuals of two phenotypes and produced a de novo transcriptome of 147,397 contigs. Differential expression analysis identified a set of 1,961 transcripts that were upregulated in mantle tissue. Sequence variant analysis resulted in a set of 2,592 transcripts with single nucleotide polymorphisms (SNPs) that differed consistently between the phenotypes. Combining these results yielded a set of 197 candidate transcripts, of which 38 were annotated. Four of these transcripts are involved in production of the shell's nacreous layer. Comparison with morph-associated RAD-tags from a published study yielded seven transcripts that were annotated as metallothionein, a protein that is thought to inhibit the production of melanin in melanocytes. These results thus provide an excellent starting point for the elucidation of the genetic regulation of the Cepaea nemoralis shell colou
Metagenomic assembly is the main bottleneck in the identification of mobile genetic elements
Antimicrobial resistance genes (ARG) are commonly found on acquired mobile genetic elements (MGEs) such as plasmids or transposons. Understanding the spread of resistance genes associated with mobile elements (mARGs) across different hosts and environments requires linking ARGs to the existing mobile reservoir within bacterial communities. However, reconstructing mARGs in metagenomic data from diverse ecosystems poses computational challenges, including genome fragment reconstruction (assembly), high-throughput annotation of MGEs, and identification of their association with ARGs. Recently, several bioinformatics tools have been developed to identify assembled fragments of plasmids, phages, and insertion sequence (IS) elements in metagenomic data. These methods can help in understanding the dissemination of mARGs. To streamline the process of identifying mARGs in multiple samples, we combined these tools in an automated high-throughput open-source pipeline, MetaMobilePicker, that identifies ARGs associated with plasmids, IS elements and phages, starting from short metagenomic sequencing reads. This pipeline was used to identify these three elements on a simplified simulated metagenome dataset, comprising whole genome sequences from seven clinically relevant bacterial species containing 55 ARGs, nine plasmids and five phages. The results demonstrated moderate precision for the identification of plasmids (0.57) and phages (0.71), and moderate sensitivity of identification of IS elements (0.58) and ARGs (0.70). In this study, we aim to assess the main causes of this moderate performance of the MGE prediction tools in a comprehensive manner. We conducted a systematic benchmark, considering metagenomic read coverage, contig length cutoffs and investigating the performance of the classification algorithms. Our analysis revealed that the metagenomic assembly process is the primary bottleneck when linking ARGs to identified MGEs in short-read metagenomics sequencing experiments rather than ARGs and MGEs identification by the different tools
Fungal metabarcoding data integration framework for the MycoDiversity DataBase (MDDB)
Fungi have crucial roles in ecosystems, and are important associates for many organisms. They are adapted to a wide variety of habitats, however their global distribution and diversity remains poorly documented. The exponential growth of DNA barcode information retrieved from the environment is assisting considerably the traditional ways for unraveling fungal diversity and detection. The raw DNA data in association to environmental descriptors of metabarcoding studies are made available in public sequence read archives. While this is potentially a valuable source of information for the investigation of Fungi across diverse environmental conditions, the annotation used to describe environment is heterogenous. Moreover, a uniform processing pipeline still needs to be applied to the available raw DNA data. Hence, a comprehensive framework to analyses these data in a large context is still lacking. We introduce the MycoDiversity DataBase, a database which includes public fungal metabarcoding data of environmental samples for the study of biodiversity patterns of Fungi. The framework we propose will contribute to our understanding of fungal biodiversity and aims to become a valuable source for large-scale analyses of patterns in space and time, in addition to assisting evolutionary and ecological research on Fungi
PlasmidEC and gplas2: an optimized short-read approach to predict and reconstruct antibiotic resistance plasmids in Escherichia coli.
Accurate reconstruction of Escherichia coli antibiotic resistance gene (ARG) plasmids from Illumina sequencing data has proven to be a challenge with current bioinformatic tools. In this work, we present an improved method to reconstruct E. coli plasmids using short reads. We developed plasmidEC, an ensemble classifier that identifies plasmid-derived contigs by combining the output of three different binary classification tools. We showed that plasmidEC is especially suited to classify contigs derived from ARG plasmids with a high recall of 0.941. Additionally, we optimized gplas, a graph-based tool that bins plasmid-predicted contigs into distinct plasmid predictions. Gplas2 is more effective at recovering plasmids with large sequencing coverage variations and can be combined with the output of any binary classifier. The combination of plasmidEC with gplas2 showed a high completeness (median=0.818) and F1-Score (median=0.812) when reconstructing ARG plasmids and exceeded the binning capacity of the reference-based method MOB-suite. In the absence of long-read data, our method offers an excellent alternative to reconstruct ARG plasmids in E. coli
Antimicrobial resistance on the move: Computational methods to identify and reconstruct mobile genetic elements contributing to AMR dissemination
Antimicrobial resistance (AMR) is a growing challenge for public health. In 2019 alone, 1.27 million people have died of causes directly linked to infections caused by resistant bacteria. This problem does not only focus on the clinical health of humans, but also causes issues in other aspects of public health of humans and animals. Resistant bacteria, but also genes conferring AMR, can be transferred to humans via direct animal contact, the food chain or the environment. AMR genes can arise through mutations, but can also be transferred between cells through means of horizontal gene transfer by mobile genetic elements (MGEs). An important factor in the dissemination of AMR genes is dissemination by plasmids. These extrachromosomal DNA molecules can between bacterial cells, not necessarily following species barriers. This capability makes that plasmids are an important contributing factor to the spread of AMR genes. Therefore, to investigate the spread of AMR genes, it is essential to investigate the spread of plasmids. Two techniques to investigate the DNA of plasmids are whole genome sequencing (WGS) and metagenomics. In WGS, the genome of a cultured bacterial colony is read, whereas in metagenomics, a large part of the genetic material in environmental samples is sequenced, without prior filtering or culturing of bacteria. This makes the resulting data much more complex and generates much more raw data compared to WGS experiments. In Chapter 2 and 3, I focus on plasmids in WGS data. I describe a collection of software tools: plasmidEC, plasmidCC, and gplas2 versions Bilbao and Flevo. PlasmidEC and plasmidCC are capable of identifying sequences originating from plasmids, based on six species-specific plasmid databases and one species-agnostic plasmid database. Gplas2 is an extension of the gplas algorithm that uses the assembly graph to reconstruct plasmid fragments into plasmid ‘bins’ that are likely to originate from the same plasmid. In Chapter 4 and 5, I focus on metagenomics data to assess the spread of mobile AMR genes. In Chapter 4, I present a software pipeline called MetaMobilePicker, that uses existing tools to assemble metagenomic reads, identify MGEs and annotate AMR genes. By validating this pipeline using simulated metagenomics data with known MGEs, I show that the metagenomics assembly step, is the bottleneck for the identification of MGEs like plasmids. I show that not all reads originating from MGEs are assembled correctly, or are assembled without enough context to be correctly identified as an MGE. In Chapter 5, I focus on the composition of the microbiome, the collection of AMR genes (the resistome), and the collection of MGEs (the mobilome) in the caecum and the faeces of broiler chickens, and compare these between two interventions to prevent coccidiosis. I show that there are measurable differences in the microbiome and resistome between these intervention methods. Additionally, I identify AMR genes located on plasmids that are present in both the caecum as well as the faeces on the farmhouse floor, which makes them interesting starting points for further research into the way mobile AMR genes can spread between environments
Candidate genes for shell colour polymorphism in Cepaea nemoralis
The characteristic ground colour and banding patterns on shells of the land snail Cepaea nemoralis form a classic study system for genetics and adaptation as it varies widely between individuals. We use RNAseq analysis to identify candidate genes underlying this polymorphism. We sequenced cDNA from the foot and the mantle (the shellproducing tissue) of four individuals of two phenotypes and produced a de novo transcriptome of 147,397 contigs. Differential expression analysis identified a set of 1,961 transcripts that were upregulated in mantle tissue. Sequence variant analysis resulted in a set of 2,592 transcripts with single nucleotide polymorphisms (SNPs) that differed consistently between the phenotypes. Inspection of the overlap between the differential expression analysis and SNP analysis yielded a set of 197 candidate transcripts, of which 38 were annotated. Four of these transcripts are thought to be involved in production of the shell's nacreous layer. Comparison with morphassociated Restriction-site Associated DNA (RAD)-tags from a published study yielded eight transcripts that were annotated as metallothionein, a protein that is thought to inhibit the production of melanin in melanocytes. These results thus provide an excellent starting point for the elucidation of the genetic regulation of the Cepaea nemoralis shell colour polymorphism
PlasmidEC and gplas2 : an optimized short-read approach to predict and reconstruct antibiotic resistance plasmids in Escherichia coli
Accurate reconstruction of Escherichia coli antibiotic resistance gene (ARG) plasmids from Illumina sequencing data has proven to be a challenge with current bioinformatic tools. In this work, we present an improved method to reconstruct E. coli plasmids using short reads. We developed plasmidEC, an ensemble classifier that identifies plasmid-derived contigs by combining the output of three different binary classification tools. We showed that plasmidEC is especially suited to classify contigs derived from ARG plasmids with a high recall of 0.941. Additionally, we optimized gplas, a graph-based tool that bins plasmid-predicted contigs into distinct plasmid predictions. Gplas2 is more effective at recovering plasmids with large sequencing coverage vari-ations and can be combined with the output of any binary classifier. The combination of plasmidEC with gplas2 showed a high completeness (median=0.818) and F1-Score (median=0.812) when reconstructing ARG plasmids and exceeded the binning capacity of the reference-based method MOB-suite. In the absence of long-read data, our method offers an excellent alternative to reconstruct ARG plasmids in E. coli.Peer reviewe
Fungal metabarcoding data integration framework for the MycoDiversity DataBase (MDDB)
Fungi have crucial roles in ecosystems, and are important associates for many organisms. They are adapted to a wide variety of habitats, however their global distribution and diversity remains poorly documented. The exponential growth of DNA barcode information retrieved from the environment is assisting considerably the traditional ways for unraveling fungal diversity and detection. The raw DNA data in association to environmental descriptors of metabarcoding studies are made available in public sequence read archives. While this is potentially a valuable source of information for the investigation of Fungi across diverse environmental conditions, the annotation used to describe environment is heterogenous. Moreover, a uniform processing pipeline still needs to be applied to the available raw DNA data. Hence, a comprehensive framework to analyses these data in a large context is still lacking. We introduce the MycoDiversity DataBase, a database which includes public fungal metabarcoding data of environmental samples for the study of biodiversity patterns of Fungi. The framework we propose will contribute to our understanding of fungal biodiversity and aims to become a valuable source for large-scale analyses of patterns in space and time, in addition to assisting evolutionary and ecological research on Fungi