90 research outputs found

    GENtle, a free multi-purpose molecular biology tool

    Get PDF
    A result of modern techniques in molecular biology, especially DNA sequencing, is the exponentially growing amount of available data. Besides giant, specialized databases, which are accessible over the Internet, all work groups in the field of molecular biology today need to handle, modify, analyze and store sequence information. This trend notwithstanding, general purpose software for these tasks often suffers from severe drawbacks. Free software exists, but is often hard to set up and operate for users on today's point-and-click interfaces, and usually leads to the application of a patch-work of multiple, only partially compatible tools and web services. Commercial software often covers only parts of the required functions, and tends to lock the user into proprietary formats. In my thesis, I have developed GENtle, a free, multi-purpose bioinformatics software, seamlessly integrating diverse applications for every-day lab use in a single package. It was designed for easy and intuitive use, while providing many powerful functions. This C++ application runs on multiple platforms, is optimized for performance, and includes database interfaces for easy sequence management. It features DNA and protein sequence management and analysis, virtual cloning, gels, and PCR, primer design, alignment generation and layout, chromatogram and image display, as well as many related functions. GENtle strives to satisfy the need for an easy and comfortable, yet powerful multi-purpose tool. One design goal of GENtle was "instant responsiveness". Likewise, consistent display and handling are of great importance. GENtle has been outfitted with modules for DNA and protein sequence management, editing, and analysis, primer design, virtual PCR, alignments, virtual gels, a plethora of import and export formats, integrated database management, internet search functionality, an auto-update mechanism, and a number of integrated tools. GENtle is free software licensed under the GPL and available for Windows, Mac OSX, and Linux in several languages. As such, it is already in use in research groups worldwide

    Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.

    Get PDF
    BAckground: Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences. Results: We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates. Conclusion: We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of DNA starting material

    Efficient depletion of host DNA contamination in malaria clinical sequencing.

    Get PDF
    The cost of whole-genome sequencing (WGS) is decreasing rapidly as next-generation sequencing technology continues to advance, and the prospect of making WGS available for public health applications is becoming a reality. So far, a number of studies have demonstrated the use of WGS as an epidemiological tool for typing and controlling outbreaks of microbial pathogens. Success of these applications is hugely dependent on efficient generation of clean genetic material that is free from host DNA contamination for rapid preparation of sequencing libraries. The presence of large amounts of host DNA severely affects the efficiency of characterizing pathogens using WGS and is therefore a serious impediment to clinical and epidemiological sequencing for health care and public health applications. We have developed a simple enzymatic treatment method that takes advantage of the methylation of human DNA to selectively deplete host contamination from clinical samples prior to sequencing. Using malaria clinical samples with over 80% human host DNA contamination, we show that the enzymatic treatment enriches Plasmodium falciparum DNA up to ∼9-fold and generates high-quality, nonbiased sequence reads covering >98% of 86,158 catalogued typeable single-nucleotide polymorphism loci

    Whole genome sequencing of Plasmodium falciparum from dried blood spots using selective whole genome amplification

    Get PDF
    BACKGROUND: Translating genomic technologies into healthcare applications for the malaria parasite Plasmodium falciparum has been limited by the technical and logistical difficulties of obtaining high quality clinical samples from the field. Sampling by dried blood spot (DBS) finger-pricks can be performed safely and efficiently with minimal resource and storage requirements compared with venous blood (VB). Here, the use of selective whole genome amplification (sWGA) to sequence the P. falciparum genome from clinical DBS samples was evaluated, and the results compared with current methods that use leucodepleted VB. METHODS: Parasite DNA with high (>95%) human DNA contamination was selectively amplified by Phi29 polymerase using short oligonucleotide probes of 8-12 mers as primers. These primers were selected on the basis of their differential frequency of binding the desired (P. falciparum DNA) and contaminating (human) genomes. RESULTS: Using sWGA method, clinical samples from 156 malaria patients, including 120 paired samples for head-to-head comparison of DBS and leucodepleted VB were sequenced. Greater than 18-fold enrichment of P. falciparum DNA was achieved from DBS extracts. The parasitaemia threshold to achieve >5× coverage for 50% of the genome was 0.03% (40 parasites per 200 white blood cells). Over 99% SNP concordance between VB and DBS samples was achieved after excluding missing calls. CONCLUSION: The sWGA methods described here provide a reliable and scalable way of generating P. falciparum genome sequence data from DBS samples. The current data indicate that it will be possible to get good quality sequence on most if not all drug resistance loci from the majority of symptomatic malaria patients. This technique overcomes a major limiting factor in P. falciparum genome sequencing from field samples, and paves the way for large-scale epidemiological applications

    An optimized microarray platform for assaying genomic variation in Plasmodium falciparum field populations

    Get PDF
    We present an optimized probe design for copy number variation (CNV) and SNP genotyping in the Plasmodium falciparum genome. We demonstrate that variable length and isothermal probes are superior to static length probes. We show that sample preparation and hybridization conditions mitigate the effects of host DNA contamination in field samples. The microarray and workflow presented can be used to identify CNVs and SNPs with 95% accuracy in a single hybridization, in field samples containing up to 92% human DNA contamination

    Drug-resistant genotypes and multi-clonality in Plasmodium falciparum analysed by direct genome sequencing from peripheral blood of malaria patients.

    Get PDF
    Naturally acquired blood-stage infections of the malaria parasite Plasmodium falciparum typically harbour multiple haploid clones. The apparent number of clones observed in any single infection depends on the diversity of the polymorphic markers used for the analysis, and the relative abundance of rare clones, which frequently fail to be detected among PCR products derived from numerically dominant clones. However, minority clones are of clinical interest as they may harbour genes conferring drug resistance, leading to enhanced survival after treatment and the possibility of subsequent therapeutic failure. We deployed new generation sequencing to derive genome data for five non-propagated parasite isolates taken directly from 4 different patients treated for clinical malaria in a UK hospital. Analysis of depth of coverage and length of sequence intervals between paired reads identified both previously described and novel gene deletions and amplifications. Full-length sequence data was extracted for 6 loci considered to be under selection by antimalarial drugs, and both known and previously unknown amino acid substitutions were identified. Full mitochondrial genomes were extracted from the sequencing data for each isolate, and these are compared against a panel of polymorphic sites derived from published or unpublished but publicly available data. Finally, genome-wide analysis of clone multiplicity was performed, and the number of infecting parasite clones estimated for each isolate. Each patient harboured at least 3 clones of P. falciparum by this analysis, consistent with results obtained with conventional PCR analysis of polymorphic merozoite antigen loci. We conclude that genome sequencing of peripheral blood P. falciparum taken directly from malaria patients provides high quality data useful for drug resistance studies, genomic structural analyses and population genetics, and also robustly represents clonal multiplicity

    An Effective Method to Purify Plasmodium falciparum DNA Directly from Clinical Blood Samples for Whole Genome High-Throughput Sequencing

    Get PDF
    Highly parallel sequencing technologies permit cost-effective whole genome sequencing of hundreds of Plasmodium parasites. The ability to sequence clinical Plasmodium samples, extracted directly from patient blood without a culture step, presents a unique opportunity to sample the diversity of “natural” parasite populations in high resolution clinical and epidemiological studies. A major challenge to sequencing clinical Plasmodium samples is the abundance of human DNA, which may substantially reduce the yield of Plasmodium sequence. We tested a range of human white blood cell (WBC) depletion methods on P. falciparum-infected patient samples in search of a method displaying an optimal balance of WBC-removal efficacy, cost, simplicity, and applicability to low resource settings. In the first of a two-part study, combinations of three different WBC depletion methods were tested on 43 patient blood samples in Mali. A two-step combination of Lymphoprep plus Plasmodipur best fitted our requirements, although moderate variability was observed in human DNA quantity. This approach was further assessed in a larger sample of 76 patients from Burkina Faso. WBC-removal efficacy remained high (<30% human DNA in >70% samples) and lower variation was observed in human DNA quantities. In order to assess the Plasmodium sequence yield at different human DNA proportions, 59 samples with up to 60% human DNA contamination were sequenced on the Illumina Genome Analyzer platform. An average ∼40-fold coverage of the genome was observed per lane for samples with ≤30% human DNA. Even in low resource settings, using a simple two-step combination of Lymphoprep plus Plasmodipur, over 70% of clinical sample preparations should exhibit sufficiently low human DNA quantities to enable ∼40-fold sequence coverage of the P. falciparum genome using a single lane on the Illumina Genome Analyzer platform. This approach should greatly facilitate large-scale clinical and epidemiologic studies of P. falciparum

    A forward genetic screen reveals a primary role for Plasmodium falciparum Reticulocyte Binding Protein Homologue 2a and 2b in determining alternative erythrocyte invasion pathways.

    Get PDF
    Invasion of human erythrocytes is essential for Plasmodium falciparum parasite survival and pathogenesis, and is also a complex phenotype. While some later steps in invasion appear to be invariant and essential, the earlier steps of recognition are controlled by a series of redundant, and only partially understood, receptor-ligand interactions. Reverse genetic analysis of laboratory adapted strains has identified multiple genes that when deleted can alter invasion, but how the relative contributions of each gene translate to the phenotypes of clinical isolates is far from clear. We used a forward genetic approach to identify genes responsible for variable erythrocyte invasion by phenotyping the parents and progeny of previously generated experimental genetic crosses. Linkage analysis using whole genome sequencing data revealed a single major locus was responsible for the majority of phenotypic variation in two invasion pathways. This locus contained the PfRh2a and PfRh2b genes, members of one of the major invasion ligand gene families, but not widely thought to play such a prominent role in specifying invasion phenotypes. Variation in invasion pathways was linked to significant differences in PfRh2a and PfRh2b expression between parasite lines, and their role in specifying alternative invasion was confirmed by CRISPR-Cas9-mediated genome editing. Expansion of the analysis to a large set of clinical P. falciparum isolates revealed common deletions, suggesting that variation at this locus is a major cause of invasion phenotypic variation in the endemic setting. This work has implications for blood-stage vaccine development and will help inform the design and location of future large-scale studies of invasion in clinical isolates

    Characterization of Within-Host Plasmodium falciparum Diversity Using Next-Generation Sequence Data

    Get PDF
    Our understanding of the composition of multi-clonal malarial infections and the epidemiological factors which shape their diversity remain poorly understood. Traditionally within-host diversity has been defined in terms of the multiplicity of infection (MOI) derived by PCR-based genotyping. Massively parallel, single molecule sequencing technologies now enable individual read counts to be derived on genome-wide datasets facilitating the development of new statistical approaches to describe within-host diversity. In this class of measures the FWS metric characterizes within-host diversity and its relationship to population level diversity. Utilizing P. falciparum field isolates from patients in West Africa we here explore the relationship between the traditional MOI and FWS approaches. FWS statistics were derived from read count data at 86,158 SNPs in 64 samples sequenced on the Illumina GA platform. MOI estimates were derived by PCR at the msp-1 and -2 loci. Significant correlations were observed between the two measures, particularly with the msp-1 locus (P = 5.92×10−5). The FWS metric should be more robust than the PCR-based approach owing to reduced sensitivity to potential locus-specific artifacts. Furthermore the FWS metric captures information on a range of parameters which influence out-crossing risk including the number of clones (MOI), their relative proportions and genetic divergence. This approach should provide novel insights into the factors which correlate with, and shape within-host diversity
    corecore