Draft genome sequence of the anaerobic intestinal parasite Blastocystis subtype 4 (ST4)

Abstract

International audienceBlastocystis is a highly prevalent anaerobic eukaryotic intestinal parasite found in the intestinal tract of human and various animals. Although the role of Blastocystis as human pathogen remains unclear, it can cause acute or chronic digestive disorders and some studies have suggested an association with irritable bowel syndrome. Seventeen subtypes (ST1-ST17), among which the first nine are found in human, have been identified based on the gene coding for the small-subunit ribosomal RNA. We have previously sequenced the first whole genome of Blastocystis ST7 (Denoeud et al., 2011). It consists of a 18.8 Mb nuclear genome with more than 6000 genes and a circular genome of 29 Kbp located within mitochondria-like organelles (MLO). Here we report the sequencing and annotation of the genome of a Blastocystis ST4 isolate. Genome sequencing was done with the Illumina HiSeq 2000 system generating more than 43 millions of 100-bp paired-end reads. The sequence reads were de novo assembled using the IDBA-ud algorithm. In total, 3996 scaffolds higher than 200 bp were obtained, with a scaffold N50 determined to be 20,431 bp. The draft genome sequence of Blastocystis ST4 has a total of 13.36 Mbp. As expected, assembly also provided a circular genome of ~27 kb in size corresponding to the whole MLO genome sequence. Genes were predicted using the Maker gene annotation pipeline. A first run of Maker was performed using the ab initio predictor Augustus trained with 413 genes manually designed from the ST4 scaffolds, the ~6000 annotated genes of the ST7 genome and available ESTs data from both ST7 and ST1. Genes determined from this first run were then used to train another ab initio gene prediction program called SNAP. A second run of Maker similar to the first run and including the newly trained gene predictor SNAP was finally performed. This led to significant improvements in gene prediction accuracy with a final annotation set of 6046 genes. The same pipeline was also used to complete and correct the previous annotation of the ST7 genome (Denoeud et al., 2011). Gene functions (for ST4 predicted genes) were annotated by Blast2GO and blastP analyses with NCBI, Swissprot/Uniprot and KEGG databases. Finally OrthoMCL was applied to compare both ST4 and ST7 genomes. This led to the identification of new candidate genes, in particular some potential virulence factors that may be involved in the physiopathology of this parasite. The sequencing of other ST genomes is under progress and should be very helpful for a better understanding of the genetic diversity, pathogenesis, metabolic potential and genome evolution of this neglected human parasite

    Similar works

    Full text

    thumbnail-image

    Available Versions