236 research outputs found

    THE EFFECT OF STRUCTURE IN SHORT REGIONS OF DNA ON MEASUREMENTS ON SHORT OLIGONUCLEOTIDE MICROARRAY AND ION TORRENT PGM SEQUENCING PLATFORMS

    Get PDF
    Single-stranded DNA in solution has been studied by biophysicists for many years, as complex structures, both stable and dynamic, form under normal experimental conditions. Stable intra-strand formations affect enzymatic technical processes such as PCR and biological processes such as gene regulation. In the research described here we examined the effect of such structures on two high-throughput genomic assay platforms and whether we could predict the influence of those effects to improve the interpretation of genomic sequencing results. Helical structures in DNA can be composed of interactions across strands or within a strand. Exclusion of the aqueous solvent provides an entropic advantage to more compact structures. Our first experiments were tested whether internal helical regions in one of the two binding partners in a microarray experiment would influence the stability of the complex. Our results are novel and show, from molecular simulations and hybridization experiments, that stable secondary structures on the boundary, when not impinging on the ability of targets to access the probes, stabilize the probe-target hybridization. High-throughput sequencing (HTS) platforms use as templates short single-stranded DNA fragments. We tested the influence of template secondary structure on the fidelity of reads generated using the Ion Torrent PGM platform. It can clearly be seen for targets where hairpin structures are quite long (~20bp) that a high level of mis-calling occurs, particularly of deletions, and that some of these deletions are 20-30 bases long. These deletions are not associated with homopolymers, which are known to cause base mis-calls on the PGM, and the effect of structure on the sequencing reaction, rather than the PCR preparative steps, has not been previously published. As HTS technologies bring the cost of sequencing whole genomes down, a number of unexpected observations have arisen. An example that caught our attention is the prevalence of far more short deletions than had been detected using Sanger methods. The prevalence is particularly high in the Korean genome. Since we showed that helical structures could disrupt the fidelity of base calls on the Ion Torrent we looked at the context of the apparent deletions to determine whether any sequence or structure pattern discriminated them. Starting with the genome provided by Kim et al (1) we selected deletions > 2 bases long from chromosome I of a Korean genome. We created 70 nucleotide fragments centered on the deletion. We simulated the secondary structures using OMP software and then modeled using the Random Forest algorithm in the WEKA modeling package to characterize the relations between the deletions and secondary structures in or around them. After training the model on chromosome I deletions we tested it using chromosome 20 deletions. We show that sequence information alone is not able to predict whether a deletion will occur, while the addition of structural information improves the prediction rates. Classification rates are not yet high: additional data and a more precise structural description are likely needed to train a robust model. We are unable to state which of the structures affect in vitro platforms and which occur in vivo. A comparative genomics approach using 38 genomes recently made available for the CAMDA 2013 competition should provide the necessary information to train separate models if the important features are different in the two cases

    Genomics and spatial surveillance of Chagas disease and American visceral leishmaniasis

    Get PDF
    The Trypanosomatidae are a family of parasitic protozoa that infect various animals and plants. Several species within the Trypanosoma and Leishmania genera also pose a major threat to human health. Among these are Trypanosoma cruzi and Leishmania infantum, aetiological agents of the highly debilitating and often deadly vector-borne zoonoses Chagas disease and American visceral leishmaniasis. Current treatment options are far from safe, only partially effective and rarely available in the impoverished regions of Latin America where these ‘neglected tropical diseases’ prevail. Wider-reaching, sustainable protection against T. cruzi and L. infantum might best be achieved by intercepting key routes of zoonotic transmission, but this prophylactic approach requires a better understanding of how these parasites disperse and evolve at various spatiotemporal scales. This dissertation addresses key questions around trypanosomatid parasite biology and spatial epidemiology based on high-resolution, geo-referenced DNA sequence datasets constructed from disease foci throughout Latin America: Which forms of genetic exchange occur in T. cruzi, and are exchange events frequent enough to significantly alter the distribution of important epidemiological traits? How do demographic histories, for example, the recent invasive expansion of L. infantum into the Americas, impact parasite population structure, and do structural changes pose a threat to public health? Can environmental variables predict parasite dispersal patterns at the landscape scale? Following the first chapter’s review of population genetic and genomic approaches in the study of trypanosomatid diseases in Latin America, Chapter 2 describes how reproductive polymorphism segregates T. cruzi populations in southern Ecuador. The study is the first to clearly demonstrate meiotic sex in this species, for decades thought to exchange genetic material only very rarely, and only by non-Mendelian means. T. cruzi subpopulations from the Ecuadorian study site exhibit all major hallmarks of sexual reproduction, including genome-wide Hardy-Weinberg allele frequencies, rapid decay of linkage disequilibrium with map distance and genealogies that fluctuate among chromosomes. The presence of sex promotes the transfer and transformation of genotypes underlying important epidemiological traits, posing great challenges to disease surveillance and the development of diagnostics and drugs. Chapter 3 demonstrates that mating events are also pivotal to L. infantum population structure in Brazil, where introduction bottlenecks have led to striking genetic discontinuities between sympatric strains. Genetic hybridization occurs genome-wide, including at a recently identified ‘miltefosine sensitivity locus’ that appears to be deleted from the majority of Brazilian L. infantum genomes. The study combines an array of genomic and phenotypic analyses to determine whether rapid population expansion or strong purifying selection has driven this prominent > 12 kb deletion to high abundance across Brazil. Results expose deletion size differences that covary with phylogenetic structure and suggest that deletion-carrying strains do not form a private monophyletic clade. These observations are inconsistent with the hypothesis that the deletion genotype rose to high prevalence simply as the result of a founder effect. Enzymatic assays show that loss of ecto-3’-nucleotidase gene function within the deleted locus is coupled to increased ecto-ATPase activity, raising the possibility that alternative metabolic strategies enhance L. infantum fitness in its introduced range. The study also uses demographic simulation modelling to determine whether L. infantum populations in the Americas have expanded from just one or multiple introduction events. Comparison of observed vs. simulated summary statistics using random forests suggests a single introduction from the Old World, but better spatial sampling coverage is required to rule out other demographic scenarios in a pattern-process modelling approach. Further sampling is also necessary to substantiate signs of convergent selection introduced above. Chapter 4 therefore develops a ‘genome-wide locus sequence typing’ (GLST) tool to summarize parasite genetic polymorphism at a fraction of genomic sequencing cost. Applied directly to the infection source (e.g., vector or host tissue), the method also avoids bias from cell purification and culturing steps typically involved prior to sequencing of trypanosomatid and other obligate parasite genomes. GLST scans genomic pilot data for hundreds of polymorphic sequence fragments whose thermodynamic properties permit simultaneous PCR amplification in a single reaction tube. For proof of principle, GLST is applied to metagenomic DNA extracts from various Chagas disease vector species collected in Colombia, Venezuela, and Ecuador. Epimastigote DNA from several T. cruzi reference clones is also analyzed. The method distinguishes 387 single-nucleotide polymorphisms (SNPs) in T. cruzi sub-lineage TcI and an additional 393 SNPs in non-TcI clones. Genetic distances calculated from these SNPs correlate with geographic distances among samples but also distinguish parasites from triatomines collected at common collection sites. The method thereby appears suitable for agent-based spatio-genetic (simulation) analyses left wanted by Chapter 3 – and further formulated in Chapter 5. The potential to survey parasite genetic diversity abundantly across landscapes compels deeper, more systematic exploration of how environmental variables influence the spread of disease. As environmental context is only marginally considered in the population genetic analyses of Chapters 2 – 4, Chapter 5 proposes a new, spatially explicit modelling framework to predict vector-borne parasite gene flow through heterogeneous environment. In this framework, remotely sensed environmental raster values are re-coded and merged into a composite ‘resistance surface’ that summarizes hypothesized effects of landscape features on parasite transmission among vectors and hosts. Parasite population genetic differentiation is then simulated on this surface and fitted to observed diversity patterns in order to evaluate original hypotheses on how environmental variables modulate parasite gene flow. The chapter thereby makes a maiden step from standard population genetic to ‘landscape genomic’ approaches in understanding the ecology and evolution of vector-borne disease. In summary, this dissertation first demonstrates the power of population genetics and genomics to understand fundamental biological properties of important protist parasites, then identifies areas where analytical tools are missing and creates new technical and conceptual frameworks to help fill these gaps. The general discussion (Chapter 6) also outlines several follow-up projects on the key finding of meiotic genetic signatures in T. cruzi. Exploiting recently developed T. cruzi genome-editing systems for the detection of meiotic gene expression and heterozygosis will help understand why and in which life cycle stage some parasite populations use sex and others do not. Long-read sequencing of parental and recombinant genomes will help understand the extent to which sex is diversifying T. cruzi phenotypes, especially virulence and drug resistance properties conferred by surface molecules with repetitive genetic bases intractable to short-read analysis. Chapter 6 also provides follow-up plans for all other research chapters. Emphasis is placed on advancing the complementarity, transferability and public health benefit of the many different methods and concepts employed in this work

    Studying Evolutionary Change: Transdisciplinary Advances in Understanding and Measuring Evolution

    Get PDF
    Evolutionary processes can be found in almost any historical, i.e. evolving, system that erroneously copies from the past. Well studied examples do not only originate in evolutionary biology but also in historical linguistics. Yet an approach that would bind together studies of such evolving systems is still elusive. This thesis is an attempt to narrowing down this gap to some extend. An evolving system can be described using characters that identify their changing features. While the problem of a proper choice of characters is beyond the scope of this thesis and remains in the hands of experts we concern ourselves with some theoretical as well data driven approaches. Having a well chosen set of characters describing a system of different entities such as homologous genes, i.e. genes of same origin in different species, we can build a phylogenetic tree. Consider the special case of gene clusters containing paralogous genes, i.e. genes of same origin within a species usually located closely, such as the well known HOX cluster. These are formed by step- wise duplication of its members, often involving unequal crossing over forming hybrid genes. Gene conversion and possibly other mechanisms of concerted evolution further obfuscate phylogenetic relationships. Hence, it is very difficult or even impossible to disentangle the detailed history of gene duplications in gene clusters. Expanding gene clusters that use unequal crossing over as proposed by Walter Gehring leads to distinctive patterns of genetic distances. We show that this special class of distances helps in extracting phylogenetic information from the data still. Disregarding genome rearrangements, we find that the shortest Hamiltonian path then coincides with the ordering of paralogous genes in a cluster. This observation can be used to detect ancient genomic rearrangements of gene clus- ters and to distinguish gene clusters whose evolution was dominated by unequal crossing over within genes from those that expanded through other mechanisms. While the evolution of DNA or protein sequences is well studied and can be formally described, we find that this does not hold for other systems such as language evolution. This is due to a lack of detectable mechanisms that drive the evolutionary processes in other fields. Hence, it is hard to quantify distances between entities, e.g. languages, and therefore the characters describing them. Starting out with distortions of distances, we first see that poor choices of the distance measure can lead to incorrect phylogenies. Given that phylogenetic inference requires additive metrics we can infer the correct phylogeny from a distance matrix D if there is a monotonic, subadditive function ζ such that ζ^−1(D) is additive. We compute the metric-preserving transformation ζ as the solution of an optimization problem. This result shows that the problem of phylogeny reconstruction is well defined even if a detailed mechanistic model of the evolutionary process is missing. Yet, this does not hinder studies of language evolution using automated tools. As the amount of available and large digital corpora increased so did the possibilities to study them automatically. The obvious parallels between historical linguistics and phylogenetics lead to many studies adapting bioinformatics tools to fit linguistics means. Here, we use jAlign to calculate bigram alignments, i.e. an alignment algorithm that operates with regard to adjacency of letters. Its performance is tested in different cognate recognition tasks. Using pairwise alignments one major obstacle is the systematic errors they make such as underestimation of gaps and their misplacement. Applying multiple sequence alignments instead of a pairwise algorithm implicitly includes more evolutionary information and thus can overcome the problem of correct gap placement. They can be seen as a generalization of the string-to-string edit problem to more than two strings. With the steady increase in computational power, exact, dynamic programming solutions have become feasible in practice also for 3- and 4-way alignments. For the pairwise (2-way) case, there is a clear distinction between local and global alignments. As more sequences are consid- ered, this distinction, which can in fact be made independently for both ends of each sequence, gives rise to a rich set of partially local alignment problems. So far these have remained largely unexplored. Thus, a general formal frame- work that gives raise to a classification of partially local alignment problems is introduced. It leads to a generic scheme that guides the principled design of exact dynamic programming solutions for particular partially local alignment problems

    Characterization and Metabolic Engineering of Transcription Factors and Redox Dynamics in Candidate Consolidated Bioprocessing Biocatalysts

    Get PDF
    This thesis studies the metabolic engineering of candidate consolidated bioprocessing biocatalyst microorganisms through targeting regulatory genes, with an emphasis on redox metabolism. Consolidated bioprocessing is the single-step hydrolysis and conversion of lignocellulosic material to biofuels. The biocatalysts considered are Clostridium thermocellum and Caldicellulosiruptor bescii, and the primary product of interest is ethanol. Both organisms are thermophilic anaerobic bacteria which encode and express genes that facilitate the deconstruction and solubilization of lignocellulose into fermentable carbohydrates. Furthermore, these organisms ferment these carbohydrates into ethanol, organic acids, as well as other fermentation products. We seek to improve redox metabolism and osmotolerance in these organsisms toward a biorefining objective goal of engineering a biocatalyst capable of facilitating economically viable consolidated bioprocessing.Expression profiling, transcription factor regulon mapping, genetic engineering, and analytical fermentation were approaches employed to assay and understand which specific traits can be beneficially altered. The traits sought to be altered are characteristically complex, co-opting many cellular sub-processes to enable a molecular mechanism resulting in an observable trait. Such traits are notoriously difficult not only to understand, but to alter through classical metabolic engineering. Instead, the possibility of making system-wide changes through a minimal number of genetic alterations to methodically selected and/or screened regulatory genes was investigated.Active redox-dependent systems were characterized in both bacteria, many of which are controlled by the global redox-state sensing transcription factor Rex. Eliminating Rex control over gene expression in C. bescii resulted in a more reduced intracellular redox state, and ultimately drives increased ethanol synthesis. A method for quantifying important redox metabolites intracellularly is also adopted and validated for use with C. thermocellum. This approach was extended to less characterized gene targets and, arguably, even more complex traits. Screening of single-gene deletion mutants identified two strains of C. bescii showing phenotypic growth differences in elevated osmolarity conditions. One strain housed a deletion of the fapR gene, while the other a deletion of the fruR/cra gene. Characterizing these transcription factors and their regulons elucidates mechanisms which this organism uses to facilitate survival at elevated osmolarities. We are also able to construct genetic variants in C. bescii which are substantially more osmotolerant than native strains, highlighting the usefulness of these genes as targets and the applicability, and important considerations, of our metabolic engineering approach

    Single-Cell Massively Parallel Reporter Assays

    Get PDF
    Our understanding of gene regulation needs to be generalizable as well as specific. A generalizable understanding enables us to transfer our knowledge of one gene to another, and the specificity allows us to understand the precise regulation of gene expression through development. One framework that confers generalizability and specificity is the hierarchical and modular model of gene regulation. To test whether gene regulation is hierarchical and modular requires methods to systematically analyze how different factors collectively control gene expression. This thesis describes the development of two functional genomics methods at single-cell resolution. These methods systematically examined how cellular contexts, chromatin environment, and local regulatory sequence collectively control expression mean and noise. First, scMPRA measures cell-type specific expression of a library of core promoters in K562 and HEK 293 cell lines. scMPRA can also be applied to a complex tissue and performed MPRA ex vivo. Both general principles of gene expression and cell-type specific variant effects are found in the newborn mouse retina. scMPRA also measured the cell substate effect on expression mean, I found that cell substate has a large and general effect on expression mean for core promoters. I also deconvolved the extrinsic and intrinsic portion of expression noise, we found that developmental core promoters have larger extrinsic noise. Second, scTRIP measures the chromatin environment effect on expression noise. We found expression noise can be partially explained by expression noise. We also found the expression noise that is independent of expression mean is correlated with specific chromatin marks and transcription factor binding sites. Moreover, we identified the oscillation between cell substates as a major source of extrinsic noise regardless of the chromatin environment. Using all the information, we trained a logistic regression model with high accuracy. These observations and methods provide a framework to further explore the hierarchical and modular nature of gene regulation

    On the Origin of Phenotypic Variation: Novel Technologies to Dissect Molecular Determinants of Phenotype

    Get PDF
    This thesis describes the conception, design, and development of novel computational tools, theoretical models, and experimental techniques applied to the dissection of molecular factors underlying phenotypic variation. The first part of my work is focused on finding rare genetic variants in pooled DNA samples, leading to the development of a novel set of algorithms, SNPseeker and SPLINTER, applied to next-generation sequencing data. The second part of my work describes the creation of a reporter system for DNA methylation for the purpose of dissecting the genetic contribution of tissue-specific patterns of DNA methylation across the genome. Finally the last part of my work is focused on understanding the basis of stochastic variation in gene expression with a focus on modeling and dissecting the relationship between single-cell protein variance and mean at a genome-wide scale

    Analysis, Design, and Construction of Nucleic Acid Devices

    Get PDF
    Nucleic acids present great promise as building blocks for nanoscale devices. To achieve this potential, methods for the analysis and design of DNA and RNA need to be improved. In this thesis, traditional algorithms for analyzing nucleic acids at equilibrium are extended to handle a class of pseudoknots, with examples provided relevant to biologists and bioengineers. With these analytical tools in hand, nucleic acid sequences are designed to maximize the equilibrium probability of a desired fold. Upon analysis, it is concluded that both affinity and specificity are important when choosing a sequence; this conclusion holds for a wide range of target structures and is robust to random perturbations to the energy model. Applying the intuition gained from these studies, a process called hybridization chain reaction (HCR) is invented, and sequences are chosen that experimentally verify this phenomenon. In HCR, a small number of DNA or RNA molecules trigger a system wide configurational change, allowing the amplification and detection of specific, nucleic acid sequences. As an extension, HCR is combined with a pre-existing aptamer domain to successfully construct an ATP sensor, and the groundwork is laid for the future development of sensors for other small molecules. In addition, recent studies on multi-stranded algorithms and improvements to HCR are included in the appendices. Not only will these advancements increase our understanding of biological RNAs, but they will also provide valuable tools for the future development of nucleic acid nanotechnologies

    2016 Symposium Brochure

    Get PDF
    • …
    corecore