2,256 research outputs found

    The Use of Bioinformatics for Studying HIV Evolutionary and Epidemiological History in South America

    Get PDF
    The South American human immunodeficiency virus type 1 (HIV-1) epidemic is driven by several subtypes (B, C, and F1) and circulating and unique recombinant forms derived from those subtypes. Those variants are heterogeneously distributed around the continent in a country-specific manner. Despite some inconsistencies mainly derived from sampling biases and analytical constrains, most of studies carried out in the area agreed in pointing out specificities in the evolutionary dynamics of the circulating HIV-1 lineages. In this paper, we covered the theoretical basis, and the application of bioinformatics methods to reconstruct the HIV spatial-temporal dynamics, unveiling relevant information to understand the origin, geographical dissemination and the current molecular scenario of the HIV epidemic in the continent, particularly in the countries of Southern Cone

    The influence of HIV-1 genomic target region selection and sequence length on the accuracy of inferred phylogenies and clustering outcomes.

    Get PDF
    Masters Degree. University of KwaZulu-Natal, Durban.To improve the methodology of HIV-1 cluster analysis, we addressed how analysis of HIV-1 clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering, tree certainty, subtype diversity ratio (SDR), subtype diversity variance (SDV) and Shimodaira-Hasegawa (SH)-like support values were compared between 2881 HIV-1 full genome sequences and sub-genomic regions of which 2567 were retrieved from the LANL HIV Database and 314 were sequenced from blood samples from a cohort in KwaZulu-Natal. Sliding window analysis was based on 99 windows of 1000 bp, 45 windows of 2000 bp and 27 windows of 3000 bp. Clusters were enumerated for each window sequence length, and the optimal sequence length for cluster identification was probed. Potential associations between the extent of HIV clustering and sequence length were also evaluated. The phylogeny based on the full-genome sequences showed the best tree accuracy; it ranked highest with regards to both tree certainty and SH-like support. Product 4, a region associated with env, had the best tree accuracy among the sub-genomic regions. Among the HIV-1 structural genes, env had the best tree certainty, SH-like support, SDR score and the best SDV score overall. The hierarchy of cluster phylotype enumeration mirrored the tree accuracy analysis, with the full genome phylogeny showing the highest extent of clustering, and the product 4 region being second best. Among the structural genes, the highest number of phylotypes was enumerated from the pol phylogeny, followed by env. The extent of HIV-1 clustering was slightly higher for sliding windows of 3 000 bp than 2000 bp and 1000 bp, thus 3000 bp was found to be the optimal length for phylogenetic cluster analysis. We found a moderate association between the length of sequences used and proportion of HIV sequences in clusters; the influence of viral sequence length may have been diminished by the substantial number of taxa. Full-genome sequences could provide the most informative HIV cluster analysis. Selected sub-genomic regions with the best combination of high extent of HIV clustering and high tree accuracy, such as env, could also be considered as a second choice

    Exploring the phylodynamics, genetic reassortment and RNA secondary structure formation patterns of orthomyxoviruses by comparative sequence analysis

    Get PDF
    RNA viruses are among the most virulent microorganisms that threaten the health of humans and livestock. Among the most socio-economically important of the known RNA viruses are those found in the family Orthomyxovirus. In this era of rapid low-cost genome sequencing and advancements in computational biology techniques, many previously difficult research questions relating to the molecular epidemiology and evolutionary dynamics of these viruses can now be answered with ease. Using sequence data together with associated meta-data, in chapter two of this dissertation I tested the hypothesis that the Influenza A/H1N1 2009 pandemic virus was introduced multiple times into Africa, and subsequently dispersed heterogeneously across the continent. I further tested to what degree factors such as road distances and air travel distances impacted the observed pattern of spread of this virus in Africa using a generalised linear modelbased approach. The results suggested that their were multiple simultaneous introductions of 2009 pandemic A/H1N1 into Africa, and geographical distance and human mobility through air travel played an important role towards dissemination. In chapter three, I set out to test two hypotheses: (1) that there is no difference in the frequency of reassortments among the segments that constitute influenza virus genomes; and (2) that there is epochal temporal reassortment among influenza viruses and that all geographical regions are equally likely sources of epidemiologically important influenza virus reassortant lineages. The findings suggested that surface segments are more frequently exchanges than internal genes and that North America/Asia, Oceania, and Asia could be the most likely source locations for reassortant Influenza A, B and C virus lineages respectively. In chapter four of this thesis, I explored the formation of RNA secondary structures within the genomes of orthomyxoviruses belonging to five genera: Influenza A, B and C, Infectious Salmon Anaemia Virus and Thogotovirus using in silico RNA folding predictions and additional molecular evolution and phylogenetic tests to show that structured regions may be biologically functional. The presence of some conserved structures across the five genera is likely a reflection of the biological importance of these structures, warranting further investigation regarding their role in the evolution and possible development of antiviral resistance. The studies herein demonstrate that pathogen genomics-based analytical approaches are useful both for understanding the mechanisms that drive the evolution and spread of rapidly evolving viral pathogens such as orthomyxoviruses, and for illuminating how these approaches could be leveraged to improve the management of these pathogens

    Bioinformatics Methods For Studying Intra-Host and Inter-Host Evolution Of Highly Mutable Viruses

    Get PDF
    Reproducibility and robustness of genomic tools are two important factors to assess the reliability of bioinformatics analysis. Such assessment based on these criteria requires repetition of experiments across lab facilities which is usually costly and time consuming. In this study we propose methods that are able to generate computational replicates, allowing the assessment of the reproducibility of genomic tools. We analyzed three different groups of genomic tools: DNA-seq read alignment tools, structural variant (SV) detection tools and RNA-seq gene expression quantification tools. We tested these tools with different technical replicate data. We observed that while some tools were impacted by the technical replicate data some remained robust. We observed the importance of the choice of read alignment tools for SV detection as well. On the other hand, we found out that the RNA-seq quantification tools (Kallisto and Salmon) that we chose were not affected by the shuffled data but were affected by reverse complement data. Using these findings, our proposed method here may help biomedical communities to advice on the robustness and reproducibility factors of genomic tools and help them to choose the most appropriate tools in terms of their needs. Furthermore, this study will give an insight to genomic tool developers about the importance of a good balance between technical improvements and reliable results

    Evolutionary history and molecular epidemiology of "Mycobacterium tuberculosis" in Tanzania and across Africa

    Get PDF
    Humans have been affected by tuberculosis (TB) for millennia. Today, TB remains a global health problem and the leading cause of mortality due to a single infectious agent. TB in humans is primarily caused by seven human-adapted phylogenetic lineages of Mycobacterium tuberculosis (Mtb) complex. Mtb lineages differ in their geographical distribution, partly reflecting human demographic histories. Importantly, variation in Mtb is known to impact TB infection and clinical disease. In recent years, advances in sequence-based molecular markers i.e. single nucleotide polymorphisms (SNPs) and whole genome sequencing (WGS) technologies have enabled robust classification of Mtb strains which ultimately have allowed researchers to address important questions regarding Mtb phenotypes, transmission patterns and the evolutionary history of TB. Remarkably, such investigations remain underexplored in high-endemic TB settings of sub-Saharan Africa. By applying phylogenetically robust methods such as SNP-based typing complemented with WGS we can gradually disentangle the role of Mtb variation on TB epidemic in high burden clinical settings. On the other hand, with recent large-scale WGS, it is becoming clear that Mtb strains are heterogeneous at the lineage level. Several studies have explored the phylogenetic substructure of Lineage 2 and Lineage 4; the two most geographically widespread and more successful Mtb lineages. However, Lineage 1 and 3 are still important drivers of TB epidemics along the Indian Ocean rim, which includes parts of Africa. Yet to date, the phylogeographies of these two lineages have not been fully explored. By contrast, Lineage 2–Beijing seems to have emerged only recently in Africa. Among the seven Mtb lineages, Lineage 2–Beijing is highly virulent and associated with antibiotic resistance; thus, this calls for investigation of its origin on the African continent. In this thesis, we aimed to gain countrywide insights into the genetic diversity of Mtb in Tanzania based on SNP-typing. Secondly, using a combination of SNP-typing and WGS techniques we describe the local diversity of Mtb and assessed for clinical phenotypes in urban and rural settings of Tanzania. We then studied the global phylogeographies of Mtb Lineage 1 and 3 to infer their evolutionary histories and global spread. Finally, we analyzed the origin of Mtb Lineage 2–Beijing in Africa using WGS. This thesis contains 7 chapters. The first two chapters provide the background on TB, Mtb lineages, and the objectives of the thesis. The remaining four chapters cover the conducted research performed during this PhD thesis. In the final chapter, we summarize the key findings, limitations and discuss the general implications of our work. In Chapter 1, we highlight the global burden and control of TB, the outcome of TB infection and disease, the overview on the Mtb genetic diversity, different molecular markers and genotyping techniques, and the consequences of Mtb diversity. In Chapter 2 we state the objectives of the thesis. In Chapter 3, we studied a countrywide population structure of Mtb in Tanzania based on SNP-typing and assessed relationships between Mtb lineages with patients’ clinical and sociodemographic characteristics. In Chapter 4, we zoomed into the local urban and rural settings of Temeke, Dar es Salaam and Ifakara, Morogoro in Tanzania, to identify clinically relevant Mtb phenotypes. In addition, we describe the local diversity and performed an exploratory analysis on transmission patterns in the urban setting. In Chapter 5, we studied the phylogeography and the spread of Lineage 1 and 3 using global representative genomes from places where strains of the two lineages are frequent. In Chapter 6, we used whole genome sequences of Mtb Lineage 2–Beijing to investigate the evolutionary history of this lineage in Africa. We reveal multiple introductions of Mtb Lineage 2–Beijing into Africa originating from Asia. We further show that these introductions occurred over the last 300 years, with most pre-dating the antibiotic era. In Chapter 7, we summarize the key findings from this PhD thesis, discuss the implications and highlight future directions

    Evolution of Mycobacterium tuberculosis complex lineages and their role in an emerging threat of multidrug resistant tuberculosis in Bamako, Mali

    Get PDF
    In recent years Bamako has been faced with an emerging threat from multidrug resistant TB (MDR-TB). Whole genome sequence analysis was performed on a subset of 76 isolates from a total of 208 isolates recovered from tuberculosis patients in Bamako, Mali between 2006 and 2012. Among the 76 patients, 61(80.3%) new cases and 15(19.7%) retreatment cases, 12 (16%) were infected by MDR-TB. The dominant lineage was the Euro-American lineage, Lineage 4. Within Lineage 4, the Cameroon genotype was the most prevalent genotype (n = 20, 26%), followed by the Ghana genotype (n = 16, 21%). A sub-clade of the Cameroon genotype, which emerged ~22 years ago was likely to be involved in community transmission. A sub-clade of the Ghana genotype that arose approximately 30 years ago was an important cause of MDR-TB in Bamako. The Ghana genotype isolates appeared more likely to be MDR than other genotypes after controlling for treatment history. We identified a clade of four related Beijing isolates that included one MDR-TB isolate. It is a major concern to find the Cameroon and Ghana genotypes involved in community transmission and MDR-TB respectively. The presence of the Beijing genotype in Bamako remains worrying, given its high transmissibility and virulence

    Evolution of Mycobacterium tuberculosis complex lineages and their role in an emerging threat of multidrug resistant tuberculosis in Bamako, Mali

    Get PDF
    In recent years Bamako has been faced with an emerging threat from multidrug resistant TB (MDR-TB). Whole genome sequence analysis was performed on a subset of 76 isolates from a total of 208 isolates recovered from tuberculosis patients in Bamako, Mali between 2006 and 2012. Among the 76 patients, 61(80.3%) new cases and 15(19.7%) retreatment cases, 12 (16%) were infected by MDR-TB. The dominant lineage was the Euro-American lineage, Lineage 4. Within Lineage 4, the Cameroon genotype was the most prevalent genotype (n=20, 26%), followed by the Ghana genotype (n=16, 21%). A sub-clade of the Cameroon genotype, which emerged ~22 years ago was likely to be involved in community transmission. A sub-clade of the Ghana genotype that arose approximately 30 years ago was an important cause of MDR-TB in Bamako. The Ghana genotype isolates appeared more likely to be MDR than other genotypes after controlling for treatment history. We identifed a clade of four related Beijing isolates that included one MDR-TB isolate. It is a major concern to fnd the Cameroon and Ghana genotypes involved in community transmission and MDR-TB respectively. The presence of the Beijing genotype in Bamako remains worrying, given its high transmissibility and virulence

    Exploring the integration of traditional and molecular epidemiological methods for infectious disease outbreaks

    Get PDF
    BACKGROUND: Understanding the transmission dynamics of infectious pathogens is critical to developing effective public health strategies. Traditionally, time consuming epidemiological methods were used, often limited by incomplete or inaccurate datasets. Novel phylogenetic techniques can determine transmission events, but have rarely been used in real-time outbreak settings to inform interventions and limit the impact of outbreaks. METHODS: I undertook a series of novel studies to explore the utility of combining phylogenetics with traditional epidemiological analysis to enhance the understanding of transmission dynamics. I investigated HIV in an endemic South African setting and Ebola in an acute outbreak in Sierra Leone. The strengths and limitations of this combined approach are explored, ethical issues investigated and recommendations made regarding the implications of this work for public health. RESULTS: Phylogenetics provides an exciting and synergistic tool to epidemiological analysis in outbreak investigation and control. These combined methods enable a more detailed understanding than is possible through either discipline alone. My key findings include: • Identification of infection source: Phylogenetics gives new insight into the role of external introductions (e.g. migrators) in driving and sustaining the high incidence of HIV. • Earlier identification of new emerging clusters: I identified a new cluster of HIV from around a mining community. This is one of the first examples of molecular methods detecting a previously unknown outbreak. • Identification of novel mechanisms of transmission: This work suggests that children may have been infected by playing in puddles contaminated with Ebola, a previously unrecognised route of transmission. CONCLUSION: The integration of these two methods facilitate sophisticated real-time techniques to maximise understanding of transmission dynamics, allowing faster and more effectively targeted interventions. Moving forwards, sequence data should be incorporated into standard outbreak investigation. This is critical at a time when infectious disease outbreaks have led to the some of the most significant global health threats of the recent past
    corecore