2,256 research outputs found
The Use of Bioinformatics for Studying HIV Evolutionary and Epidemiological History in South America
The South American human immunodeficiency virus type 1 (HIV-1) epidemic is driven by several subtypes (B, C, and F1) and circulating and unique recombinant forms derived from those subtypes. Those variants are heterogeneously distributed around the continent in a country-specific manner. Despite some inconsistencies mainly derived from sampling biases and analytical constrains, most of studies carried out in the area agreed in pointing out specificities in the evolutionary dynamics of the circulating HIV-1 lineages. In this paper, we covered the theoretical basis, and the application of bioinformatics methods to reconstruct the HIV spatial-temporal dynamics, unveiling relevant information to understand the origin, geographical dissemination and the current molecular scenario of the HIV epidemic in the continent, particularly in the countries of Southern Cone
The influence of HIV-1 genomic target region selection and sequence length on the accuracy of inferred phylogenies and clustering outcomes.
Masters Degree. University of KwaZulu-Natal, Durban.To improve the methodology of HIV-1 cluster analysis, we addressed how analysis of HIV-1
clustering is associated with parameters that can affect the outcome of viral clustering. The
extent of HIV clustering, tree certainty, subtype diversity ratio (SDR), subtype diversity
variance (SDV) and Shimodaira-Hasegawa (SH)-like support values were compared between
2881 HIV-1 full genome sequences and sub-genomic regions of which 2567 were retrieved
from the LANL HIV Database and 314 were sequenced from blood samples from a cohort in
KwaZulu-Natal. Sliding window analysis was based on 99 windows of 1000 bp, 45 windows of
2000 bp and 27 windows of 3000 bp. Clusters were enumerated for each window sequence
length, and the optimal sequence length for cluster identification was probed. Potential
associations between the extent of HIV clustering and sequence length were also evaluated. The
phylogeny based on the full-genome sequences showed the best tree accuracy; it ranked highest
with regards to both tree certainty and SH-like support. Product 4, a region associated with env,
had the best tree accuracy among the sub-genomic regions. Among the HIV-1 structural genes,
env had the best tree certainty, SH-like support, SDR score and the best SDV score overall. The
hierarchy of cluster phylotype enumeration mirrored the tree accuracy analysis, with the full
genome phylogeny showing the highest extent of clustering, and the product 4 region being
second best. Among the structural genes, the highest number of phylotypes was enumerated
from the pol phylogeny, followed by env. The extent of HIV-1 clustering was slightly higher for
sliding windows of 3 000 bp than 2000 bp and 1000 bp, thus 3000 bp was found to be the
optimal length for phylogenetic cluster analysis. We found a moderate association between the
length of sequences used and proportion of HIV sequences in clusters; the influence of viral
sequence length may have been diminished by the substantial number of taxa. Full-genome
sequences could provide the most informative HIV cluster analysis. Selected sub-genomic
regions with the best combination of high extent of HIV clustering and high tree accuracy, such
as env, could also be considered as a second choice
Exploring the phylodynamics, genetic reassortment and RNA secondary structure formation patterns of orthomyxoviruses by comparative sequence analysis
RNA viruses are among the most virulent microorganisms that threaten the health of humans and livestock. Among the most socio-economically important of the known RNA viruses are those found in the family Orthomyxovirus. In this era of rapid low-cost genome sequencing and advancements in computational biology techniques, many previously difficult research questions relating to the molecular epidemiology and evolutionary dynamics of these viruses can now be answered with ease. Using sequence data together with associated meta-data, in chapter two of this dissertation I tested the hypothesis that the Influenza A/H1N1 2009 pandemic virus was introduced multiple times into Africa, and subsequently dispersed heterogeneously across the continent. I further tested to what degree factors such as road distances and air travel distances impacted the observed pattern of spread of this virus in Africa using a generalised linear modelbased approach. The results suggested that their were multiple simultaneous introductions of 2009 pandemic A/H1N1 into Africa, and geographical distance and human mobility through air travel played an important role towards dissemination. In chapter three, I set out to test two hypotheses: (1) that there is no difference in the frequency of reassortments among the segments that constitute influenza virus genomes; and (2) that there is epochal temporal reassortment among influenza viruses and that all geographical regions are equally likely sources of epidemiologically important influenza virus reassortant lineages. The findings suggested that surface segments are more frequently exchanges than internal genes and that North America/Asia, Oceania, and Asia could be the most likely source locations for reassortant Influenza A, B and C virus lineages respectively. In chapter four of this thesis, I explored the formation of RNA secondary structures within the genomes of orthomyxoviruses belonging to five genera: Influenza A, B and C, Infectious Salmon Anaemia Virus and Thogotovirus using in silico RNA folding predictions and additional molecular evolution and phylogenetic tests to show that structured regions may be biologically functional. The presence of some conserved structures across the five genera is likely a reflection of the biological importance of these structures, warranting further investigation regarding their role in the evolution and possible development of antiviral resistance. The studies herein demonstrate that pathogen genomics-based analytical approaches are useful both for understanding the mechanisms that drive the evolution and spread of rapidly evolving viral pathogens such as orthomyxoviruses, and for illuminating how these approaches could be leveraged to improve the management of these pathogens
Bioinformatics Methods For Studying Intra-Host and Inter-Host Evolution Of Highly Mutable Viruses
Reproducibility and robustness of genomic tools are two important factors to assess the reliability of bioinformatics analysis. Such assessment based on these criteria requires repetition of experiments across lab facilities which is usually costly and time consuming. In this study we propose methods that are able to generate computational replicates, allowing the assessment of the reproducibility of genomic tools. We analyzed three different groups of genomic tools: DNA-seq read alignment tools, structural variant (SV) detection tools and RNA-seq gene expression quantification tools. We tested these tools with different technical replicate data. We observed that while some tools were impacted by the technical replicate data some remained robust. We observed the importance of the choice of read alignment tools for SV detection as well. On the other hand, we found out that the RNA-seq quantification tools (Kallisto and Salmon) that we chose were not affected by the shuffled data but were affected by reverse complement data. Using these findings, our proposed method here may help biomedical communities to advice on the robustness and reproducibility factors of genomic tools and help them to choose the most appropriate tools in terms of their needs. Furthermore, this study will give an insight to genomic tool developers about the importance of a good balance between technical improvements and reliable results
Evolutionary history and molecular epidemiology of "Mycobacterium tuberculosis" in Tanzania and across Africa
Humans have been affected by tuberculosis (TB) for millennia. Today, TB remains a
global health problem and the leading cause of mortality due to a single infectious agent.
TB in humans is primarily caused by seven human-adapted phylogenetic lineages of Mycobacterium
tuberculosis (Mtb) complex. Mtb lineages differ in their geographical distribution,
partly reflecting human demographic histories. Importantly, variation in Mtb is
known to impact TB infection and clinical disease.
In recent years, advances in sequence-based molecular markers i.e. single nucleotide polymorphisms
(SNPs) and whole genome sequencing (WGS) technologies have enabled robust
classification of Mtb strains which ultimately have allowed researchers to address important
questions regarding Mtb phenotypes, transmission patterns and the evolutionary
history of TB. Remarkably, such investigations remain underexplored in high-endemic
TB settings of sub-Saharan Africa.
By applying phylogenetically robust methods such as SNP-based typing complemented
with WGS we can gradually disentangle the role of Mtb variation on TB epidemic in high
burden clinical settings. On the other hand, with recent large-scale WGS, it is becoming
clear that Mtb strains are heterogeneous at the lineage level. Several studies have explored
the phylogenetic substructure of Lineage 2 and Lineage 4; the two most geographically
widespread and more successful Mtb lineages. However, Lineage 1 and 3 are still important
drivers of TB epidemics along the Indian Ocean rim, which includes parts of Africa. Yet
to date, the phylogeographies of these two lineages have not been fully explored. By
contrast, Lineage 2–Beijing seems to have emerged only recently in Africa. Among the
seven Mtb lineages, Lineage 2–Beijing is highly virulent and associated with antibiotic
resistance; thus, this calls for investigation of its origin on the African continent.
In this thesis, we aimed to gain countrywide insights into the genetic diversity of Mtb in
Tanzania based on SNP-typing. Secondly, using a combination of SNP-typing and WGS
techniques we describe the local diversity of Mtb and assessed for clinical phenotypes in
urban and rural settings of Tanzania. We then studied the global phylogeographies of Mtb Lineage 1 and 3 to infer their evolutionary histories and global spread. Finally, we
analyzed the origin of Mtb Lineage 2–Beijing in Africa using WGS.
This thesis contains 7 chapters. The first two chapters provide the background on TB,
Mtb lineages, and the objectives of the thesis. The remaining four chapters cover the
conducted research performed during this PhD thesis. In the final chapter, we summarize
the key findings, limitations and discuss the general implications of our work.
In Chapter 1, we highlight the global burden and control of TB, the outcome of TB infection
and disease, the overview on the Mtb genetic diversity, different molecular markers
and genotyping techniques, and the consequences of Mtb diversity.
In Chapter 2 we state the objectives of the thesis.
In Chapter 3, we studied a countrywide population structure of Mtb in Tanzania based on
SNP-typing and assessed relationships between Mtb lineages with patients’ clinical and
sociodemographic characteristics.
In Chapter 4, we zoomed into the local urban and rural settings of Temeke, Dar es
Salaam and Ifakara, Morogoro in Tanzania, to identify clinically relevant Mtb phenotypes.
In addition, we describe the local diversity and performed an exploratory analysis on
transmission patterns in the urban setting.
In Chapter 5, we studied the phylogeography and the spread of Lineage 1 and 3 using
global representative genomes from places where strains of the two lineages are frequent.
In Chapter 6, we used whole genome sequences of Mtb Lineage 2–Beijing to investigate
the evolutionary history of this lineage in Africa. We reveal multiple introductions of Mtb
Lineage 2–Beijing into Africa originating from Asia. We further show that these introductions
occurred over the last 300 years, with most pre-dating the antibiotic era.
In Chapter 7, we summarize the key findings from this PhD thesis, discuss the implications
and highlight future directions
Evolution of Mycobacterium tuberculosis complex lineages and their role in an emerging threat of multidrug resistant tuberculosis in Bamako, Mali
In recent years Bamako has been faced with an emerging threat from multidrug resistant TB (MDR-TB). Whole genome sequence analysis was performed on a subset of 76 isolates from a total of 208 isolates recovered from tuberculosis patients in Bamako, Mali between 2006 and 2012. Among the 76 patients, 61(80.3%) new cases and 15(19.7%) retreatment cases, 12 (16%) were infected by MDR-TB. The dominant lineage was the Euro-American lineage, Lineage 4. Within Lineage 4, the Cameroon genotype was the most prevalent genotype (n = 20, 26%), followed by the Ghana genotype (n = 16, 21%). A sub-clade of the Cameroon genotype, which emerged ~22 years ago was likely to be involved in community transmission. A sub-clade of the Ghana genotype that arose approximately 30 years ago was an important cause of MDR-TB in Bamako. The Ghana genotype isolates appeared more likely to be MDR than other genotypes after controlling for treatment history. We identified a clade of four related Beijing isolates that included one MDR-TB isolate. It is a major concern to find the Cameroon and Ghana genotypes involved in community transmission and MDR-TB respectively. The presence of the Beijing genotype in Bamako remains worrying, given its high transmissibility and virulence
Evolution of Mycobacterium tuberculosis complex lineages and their role in an emerging threat of multidrug resistant tuberculosis in Bamako, Mali
In recent years Bamako has been faced with an emerging threat from multidrug resistant TB (MDR-TB).
Whole genome sequence analysis was performed on a subset of 76 isolates from a total of 208
isolates recovered from tuberculosis patients in Bamako, Mali between 2006 and 2012. Among the 76
patients, 61(80.3%) new cases and 15(19.7%) retreatment cases, 12 (16%) were infected by MDR-TB.
The dominant lineage was the Euro-American lineage, Lineage 4. Within Lineage 4, the Cameroon
genotype was the most prevalent genotype (n=20, 26%), followed by the Ghana genotype (n=16,
21%). A sub-clade of the Cameroon genotype, which emerged ~22 years ago was likely to be involved
in community transmission. A sub-clade of the Ghana genotype that arose approximately 30 years ago
was an important cause of MDR-TB in Bamako. The Ghana genotype isolates appeared more likely to be
MDR than other genotypes after controlling for treatment history. We identifed a clade of four related
Beijing isolates that included one MDR-TB isolate. It is a major concern to fnd the Cameroon and Ghana
genotypes involved in community transmission and MDR-TB respectively. The presence of the Beijing
genotype in Bamako remains worrying, given its high transmissibility and virulence
Exploring the integration of traditional and molecular epidemiological methods for infectious disease outbreaks
BACKGROUND: Understanding the transmission dynamics of infectious pathogens is critical to developing effective public health strategies. Traditionally, time consuming epidemiological methods were used, often limited by incomplete or inaccurate datasets. Novel phylogenetic techniques can determine transmission events, but have rarely been used in real-time outbreak settings to inform interventions and limit the impact of outbreaks. METHODS: I undertook a series of novel studies to explore the utility of combining phylogenetics with traditional epidemiological analysis to enhance the understanding of transmission dynamics. I investigated HIV in an endemic South African setting and Ebola in an acute outbreak in Sierra Leone. The strengths and limitations of this combined approach are explored, ethical issues investigated and recommendations made regarding the implications of this work for public health. RESULTS: Phylogenetics provides an exciting and synergistic tool to epidemiological analysis in outbreak investigation and control. These combined methods enable a more detailed understanding than is possible through either discipline alone. My key findings include: • Identification of infection source: Phylogenetics gives new insight into the role of external introductions (e.g. migrators) in driving and sustaining the high incidence of HIV. • Earlier identification of new emerging clusters: I identified a new cluster of HIV from around a mining community. This is one of the first examples of molecular methods detecting a previously unknown outbreak. • Identification of novel mechanisms of transmission: This work suggests that children may have been infected by playing in puddles contaminated with Ebola, a previously unrecognised route of transmission. CONCLUSION: The integration of these two methods facilitate sophisticated real-time techniques to maximise understanding of transmission dynamics, allowing faster and more effectively targeted interventions. Moving forwards, sequence data should be incorporated into standard outbreak investigation. This is critical at a time when infectious disease outbreaks have led to the some of the most significant global health threats of the recent past
- …