Using whole genome sequence data to study genomic diversity and develop molecular barcodes to profile Plasmodium malaria parasites

Abstract

Malaria is a major threat to human health, causing over 300 million clinical cases and approximately ~500,000 deaths per year. Countries attempting malaria elimination are increasingly concerned with identifying pockets of transmission and outbreaks arising from imported cases, and there is a need to establish molecular barcodes for implementation in the field. The genetic diversity and non-recombining properties of mitochondrial and apicoplast sequence can be powerfully exploited for geographic genetic profiling of P. falciparum malaria at an inter-continent level. However, this approach provides limited insights for assessing drug resistance, intra-regional geographical differentiation, and ignores malaria caused by other Plasmodium spp. (P. vivax and P. knowlesi). To overcome these limitations, this project proposes to study the genomic diversity found in the nuclear and organellar genomes of the Plasmodium species causing human malaria and establish robust ways to create SNP barcodes. In this study, an assessment of the current libraries of genomic sequence data across the species P. falciparum, P. vivax and P. knowlesi was performed and using a range of bioinformatics approaches the genetic diversity in the different populations was assessed. For this, a new high-quality reference for the A1-H.1 P. knowlesi strain was generated and its methylome was characterized. Using this reference, the first evidence of genetic exchange events between the three subpopulations of P. knowlesi was found in Malaysia. Furthermore, a study of the structural and genetic diversity found in the hypervariable vaccine candidate var2csa gene in P. falciparum and its potential geographical signal associated with Malaria in Pregnancy (MiP) were performed. Finally, we accomplished a genetic diversity study of global P. vivax isolates and the insights obtained from this analysis allowed the development of a 71 SNP barcode to predict the geographical origin of P. vivax isolates. The obtained barcode were tested using prospectively and retrospectively collected datasets, particularly from endemic settings with complex mixed infections and near-elimination settings. The identification of SNP barcodes using this methodology can inform future rapid diagnostics and promote the application of field-based sequencing

    Similar works