1,036 research outputs found

    Within-host evolution of Staphylococcus aureus during asymptomatic carriage

    Get PDF
    Background Staphylococcus aureus is a major cause of healthcare associated mortality, but like many important bacterial pathogens, it is a common constituent of the normal human body flora. Around a third of healthy adults are carriers. Recent evidence suggests that evolution of S. aureus during nasal carriage may be associated with progression to invasive disease. However, a more detailed understanding of within-host evolution under natural conditions is required to appreciate the evolutionary and mechanistic reasons why commensal bacteria such as S. aureus cause disease. Therefore we examined in detail the evolutionary dynamics of normal, asymptomatic carriage. Sequencing a total of 131 genomes across 13 singly colonized hosts using the Illumina platform, we investigated diversity, selection, population dynamics and transmission during the short-term evolution of S. aureus. Principal Findings We characterized the processes by which the raw material for evolution is generated: micro-mutation (point mutation and small insertions/deletions), macro-mutation (large insertions/deletions) and the loss or acquisition of mobile elements (plasmids and bacteriophages). Through an analysis of synonymous, non-synonymous and intergenic mutations we discovered a fitness landscape dominated by purifying selection, with rare examples of adaptive change in genes encoding surface-anchored proteins and an enterotoxin. We found evidence for dramatic, hundred-fold fluctuations in the size of the within-host population over time, which we related to the cycle of colonization and clearance. Using a newly-developed population genetics approach to detect recent transmission among hosts, we revealed evidence for recent transmission between some of our subjects, including a husband and wife both carrying populations of methicillin-resistant S. aureus (MRSA). Significance This investigation begins to paint a picture of the within-host evolution of an important bacterial pathogen during its prevailing natural state, asymptomatic carriage. These results also have wider significance as a benchmark for future systematic studies of evolution during invasive S. aureus disease

    Exploiting natural selection to study adaptive behavior

    Get PDF
    The research presented in this dissertation explores different computational and modeling techniques that combined with predictions from evolution by natural selection leads to the analysis of the adaptive behavior of populations under selective pressure. For this thesis three computational methods were developed: EXPLoRA, EVORhA and SSA-ME. EXPLoRA finds genomic regions associated with a trait of interests (QTL) by explicitly modeling the expected linkage disequilibrium of a population of sergeants under selection. Data from BSA experiments was analyzed to find genomic loci associated with ethanol tolerance. EVORhA explores the interplay between driving and hitchhiking mutations during evolution to reconstruct the subpopulation structure of clonal bacterial populations based on deep sequencing data. Data from mixed infections and evolution experiments of E. Coli was used and their population structure reconstructed. SSA-ME uses mutual exclusivity in cancer to prioritize cancer driver genes. TCGA data of breast cancer tumor samples were analyzed.status: publishe

    A phylogenetic method to perform genome-wide association studies in microbes

    Get PDF
    Genome-Wide Association Studies (GWAS) are designed to perform an unbiased search of genetic sequence data with the intent of identifying statistically significant associations with a phenotype or trait of interest. The application of GWAS methods to microbial organisms promises to improve the way we understand, manage, and treat infectious diseases. Yet, while microbial pathogens continue to undermine human health, wealth, and longevity, microbial GWAS methods remain unable to fully capitalise on the growing wealth of bacterial and viral genetic sequence data. Clonal population structure and homologous recombination in microbial organisms make it difficult for existing GWAS methods to achieve both the precision needed to reject false positive findings and the statistical power required to detect genuine associations between microbial genotypic and phenotypic variants. In this thesis, we investigate potential solutions to the most substantial methodological challenges in microbial GWAS, and we introduce a new phylogenetic GWAS approach that has been specifically designed for use in bacterial samples. In presenting our approach, we describe the features that render it robust to the confounding effects of both population structure and recombination, while maintaining high statistical power to detect associations. Our approach is applicable to organisms ranging from purely clonal to frequently recombining, to sequence data from both the core and accessory genome, and to binary, categorical, and continuous phenotypes. We also describe the efforts taken to make our method efficient, scalable, and accessible in its implementation within the open-source R package we have created, called treeWAS. Next, we apply our GWAS method to simulated datasets. We develop multiple frameworks for simulating genotypic and phenotypic data with control over relevant parameters. We then present the results of our simulation study, and we use thorough performance testing to demonstrate the power and specificity of our approach, as compared to the performance of alternative cluster-based and dimension-reduction methods. Our approach is then applied to three empirical datasets, from Neisseria gonorrhoeae and Neisseria meningitidis, where we identify core SNPs associated with binary drug resistance and continuous antibiotic minimum inhibitory concentration phenotypes, as well as both core SNP and accessory genome associations with invasive and commensal phenotypes. These applications illustrate the versatility and potential of our method, demonstrating in each case that our approach is capable of confirming known resistance- or virulence-associated loci and discovering novel associations. Our thesis concludes with a review of the previous chapters and an evaluation of the strengths and limitations displayed by the current implementation of our phylogenetic approach to association testing. We discuss key areas for further development, and we propose potential solutions to advance the development of microbial GWAS in future work.Open Acces

    Sequence analysis of pooled bacterial samples enables identification of strain variation in group A streptococcus

    Get PDF
    Knowledge of the genomic variation among different strains of a pathogenic microbial species can help in selecting optimal candidates for diagnostic assays and vaccine development. Pooled sequencing (Pool-seq) is a cost effective approach for population level genetic studies that require large numbers of samples such as various strains of a microbe. To test the use of Pool-seq in identifying variation, we pooled DNA of 100 Streptococcus pyogenes strains of different emm types in two pools, each containing 50 strains. We used four variant calling tools (Freebayes, UnifiedGenotyper, SNVer, and SAMtools) and one emm1 strain, SF370, as a reference genome. In total 63719 SNPs and 164 INDELs were identified in the two pools concordantly by at least two of the tools. Majority of the variants (93.4%) from six individually sequenced strains used in the pools could be identified from the two pools and 72.3% and 97.4% of the variants in the pools could be mined from the analysis of the 44 complete Str. pyogenes genomes and 3407 sequence runs deposited in the European Nucleotide Archive respectively. We conclude that DNA sequencing of pooled samples of large numbers of bacterial strains is a robust, rapid and cost-efficient way to discover sequence variation.Peer reviewe

    Estimation of allele frequency and association mapping using next-generation sequencing data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15<it>X</it>). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates.</p> <p>Results</p> <p>We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data.</p> <p>Conclusions</p> <p>Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.</p

    Detecting copy number status and uncovering subclonal markers in heterogeneous tumor biopsies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genomic aberrations can be used to determine cancer diagnosis and prognosis. Clinically relevant novel aberrations can be discovered using high-throughput assays such as Single Nucleotide Polymorphism (SNP) arrays and next-generation sequencing, which typically provide aggregate signals of many cells at once. However, heterogeneity of tumor subclones dramatically complicates the task of detecting aberrations.</p> <p>Results</p> <p>The aggregate signal of a population of subclones can be described as a linear system of equations. We employed a measure of allelic imbalance and total amount of DNA to characterize each locus by the copy number status (gain, loss or neither) of the strongest subclonal component. We designed simulated data to compare our measure to existing approaches and we analyzed SNP-arrays from 30 melanoma samples and transcriptome sequencing (RNA-Seq) from one melanoma sample.</p> <p>We showed that any system describing aggregate subclonal signals is underdetermined, leading to non-unique solutions for the exact copy number profile of subclones. For this reason, our illustrative measure was more robust than existing Hidden Markov Model (HMM) based tools in inferring the aberration status, as indicated by tests on simulated data. This higher robustness contributed in identifying numerous aberrations in several loci of melanoma samples. We validated the heterogeneity and aberration status within single biopsies by fluorescent <it>in situ </it>hybridization of four affected and transcriptionally up-regulated genes E2F8, ETV4, EZH2 and FAM84B in 11 melanoma cell lines. Heterogeneity was further demonstrated in the analysis of allelic imbalance changes along single exons from melanoma RNA-Seq.</p> <p>Conclusions</p> <p>These studies demonstrate how subclonal heterogeneity, prevalent in tumor samples, is reflected in aggregate signals measured by high-throughput techniques. Our proposed approach yields high robustness in detecting copy number alterations using high-throughput technologies and has the potential to identify specific subclonal markers from next-generation sequencing data.</p
    corecore