272 research outputs found

    Higher-order interactions in fitness landscapes are sparse

    Full text link
    Biological fitness arises from interactions between molecules, genes, and organisms. To discover the causative mechanisms of this complexity, we must differentiate the significant interactions from a large number of possibilities. Epistasis is the standard way to identify interactions in fitness landscapes. However, this intuitive approach breaks down in higher dimensions for example because the sign of epistasis takes on an arbitrary meaning, and the false discovery rate becomes high. These limitations make it difficult to evaluate the role of epistasis in higher dimensions. Here we develop epistatic filtrations, a dimensionally-normalized approach to define fitness landscape topography for higher dimensional spaces. We apply the method to higher-dimensional datasets from genetics and the gut microbiome. This reveals a sparse higher-order structure that often arises from lower-order. Despite sparsity, these higher-order effects carry significant effects on biological fitness and are consequential for ecology and evolution.Comment: 71 pages, various figure

    Computational Methods for Assessment and Prediction of Viral Evolutionary and Epidemiological Dynamics

    Get PDF
    The ability to comprehend the dynamics of viruses’ transmission and their evolution, even to a limited extent, can significantly enhance our capacity to predict and control the spread of infectious diseases. An example of such significance is COVID-19 caused by the severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2). In this dissertation, I am proposing computational models that present more precise and comprehensive approaches in viral outbreak investigations and epidemiology, providing invaluable insights into the transmission dynamics, and potential inter- ventions of infectious diseases by facilitating the timely detection of viral variants. The first model is a mathematical framework based on population dynamics for the calculation of a numerical measure of the fitness of SARS-CoV-2 subtypes. The second model I propose here is a transmissibility estimation method based on a Bayesian approach to calculate the most likely fitness landscape for SARS-CoV-2 using a generalized logistic sub-epidemic model. Using the proposed model I estimate the epistatic interaction networks of spike protein in SARS-CoV-2. Based on the community structure of these epistatic networks, I propose a computational framework that predicts emerging haplotypes of SARS-CoV-2 with altered transmissibility. The last method proposed in this dissertation is a maximum likelihood framework that integrates phylogenetic and random graph models to accurately infer transmission networks without requiring case-specific data

    High-throughput computational methods and software for quantitative trait locus (QTL) mapping

    Get PDF
    De afgelopen jaren zijn vele nieuwe technologieen zoals Tiling arrays en High throughput DNA sequencing een belangrijke rol gaan spelen binnen het onderzoeksveld van de systeem genetica. Voor onderzoekers is het extreem belangrijk om te begrijpen dat deze methodes hun manier van werken zullen gaan beinvloeden. Deit proefschrift beschrijft mogelijke oplossingen voor deze 'Big Data' lawine die systemen genetica heeft getroffen.Dit proefschrift beschrijft de werkzaamheden uitgevoerd aan het Groningen Bioinformatics Centre om slimmere en geoptimaliseerde algoritmen zoals Pheno2Geno en MQM te ontwikkelen en een systeem om 'collaborative' research mogelijk te maken genaamd xQTL werkbank om door middel van high-throughput systemen genetica data te analyseren.In recent years many new technologies such as tiling arrays and high-throughput sequencinghave come to play an important role in systems genetics research. For researchers it is ofthe utmost importance to understand how this affects their research. This work describespossible solutions to this ‘Big Data’ avalanche which has hit systems genetics.This thesis describes the work carried out during the author’s 4 year PHD project at theGroningen Bioinformatics Centre to develop smarter and more optimized algorithms suchas Pheno2Geno and MQM, and to use a collaborative approach such as xQTL workbench tostore and analyse high-throughput systems genetics data

    High-Order SNP Combinations Associated with Complex Diseases: Efficient Discovery, Statistical Power and Functional Interactions

    Get PDF
    There has been increased interest in discovering combinations of single-nucleotide polymorphisms (SNPs) that are strongly associated with a phenotype even if each SNP has little individual effect. Efficient approaches have been proposed for searching two-locus combinations from genome-wide datasets. However, for high-order combinations, existing methods either adopt a brute-force search which only handles a small number of SNPs (up to few hundreds), or use heuristic search that may miss informative combinations. In addition, existing approaches lack statistical power because of the use of statistics with high degrees-of-freedom and the huge number of hypotheses tested during combinatorial search. Due to these challenges, functional interactions in high-order combinations have not been systematically explored. We leverage discriminative-pattern-mining algorithms from the data-mining community to search for high-order combinations in case-control datasets. The substantially improved efficiency and scalability demonstrated on synthetic and real datasets with several thousands of SNPs allows the study of several important mathematical and statistical properties of SNP combinations with order as high as eleven. We further explore functional interactions in high-order combinations and reveal a general connection between the increase in discriminative power of a combination over its subsets and the functional coherence among the genes comprising the combination, supported by multiple datasets. Finally, we study several significant high-order combinations discovered from a lung-cancer dataset and a kidney-transplant-rejection dataset in detail to provide novel insights on the complex diseases. Interestingly, many of these associations involve combinations of common variations that occur in small fractions of population. Thus, our approach is an alternative methodology for exploring the genetics of rare diseases for which the current focus is on individually rare variations

    Multilocus Detection of Wolf x Dog Hybridization in Italy, and Guidelines for Marker Selection

    Get PDF
    Hybridization and introgression can impact the evolution of natural populations. Several wild canid species hybridize in nature, sometimes originating new taxa. However, hybridization with free-ranging dogs is threatening the genetic integrity of grey wolf populations (Canis lupus), or even the survival of endangered species (e.g., the Ethiopian wolf C. simensis). Efficient molecular tools to assess hybridization rates are essential in wolf conservation strategies. We evaluated the power of biparental and uniparental markers (39 autosomal and 4 Y-linked microsatellites, a melanistic deletion at the \u3b2-defensin CBD103 gene, the hypervariable domain of the mtDNA control-region) to identify the multilocus admixture patterns in wolf x dog hybrids. We used empirical data from 2 hybrid groups with different histories: 30 presumptive natural hybrids from Italy and 73 Czechoslovakian wolfdogs of known hybrid origin, as well as simulated data. We assessed the efficiency of various marker combinations and reference samples in admixture analyses using 69 dogs of different breeds and 99 wolves from Italy, Balkans and Carpathian Mountains. Results confirmed the occurrence of hybrids in Italy, some of them showing anomalous phenotypic traits and exogenous mtDNA or Y-chromosome introgression. Hybridization was mostly attributable to village dogs and not strictly patrilineal. The melanistic \u3b2-defensin deletion was found only in Italian dogs and in putative hybrids. The 24 most divergent microsatellites (largest wolf-dog FST values) were equally or more informative than the entire panel of 39 loci. A smaller panel of 12 microsatellites increased risks to identify false admixed individuals. The frequency of F1 and F2 was lower than backcrosses or introgressed individuals, suggesting hybridization already occurred some generations in the past, during early phases of wolf expansion from their historical core areas. Empirical and simulated data indicated the identification of the past generation backcrosses is always uncertain, and a larger number of ancestry-informative markers is needed

    Discovering Higher-order SNP Interactions in High-dimensional Genomic Data

    Get PDF
    In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise
    • …
    corecore