43 research outputs found

    Capturing the Spectrum of Interaction Effects in Genetic Association Studies by Simulated Evaporative Cooling Network Analysis

    Get PDF
    Evidence from human genetic studies of several disorders suggests that interactions between alleles at multiple genes play an important role in influencing phenotypic expression. Analytical methods for identifying Mendelian disease genes are not appropriate when applied to common multigenic diseases, because such methods investigate association with the phenotype only one genetic locus at a time. New strategies are needed that can capture the spectrum of genetic effects, from Mendelian to multifactorial epistasis. Random Forests (RF) and Relief-F are two powerful machine-learning methods that have been studied as filters for genetic case-control data due to their ability to account for the context of alleles at multiple genes when scoring the relevance of individual genetic variants to the phenotype. However, when variants interact strongly, the independence assumption of RF in the tree node-splitting criterion leads to diminished importance scores for relevant variants. Relief-F, on the other hand, was designed to detect strong interactions but is sensitive to large backgrounds of variants that are irrelevant to classification of the phenotype, which is an acute problem in genome-wide association studies. To overcome the weaknesses of these data mining approaches, we develop Evaporative Cooling (EC) feature selection, a flexible machine learning method that can integrate multiple importance scores while removing irrelevant genetic variants. To characterize detailed interactions, we construct a genetic-association interaction network (GAIN), whose edges quantify the synergy between variants with respect to the phenotype. We use simulation analysis to show that EC is able to identify a wide range of interaction effects in genetic association data. We apply the EC filter to a smallpox vaccine cohort study of single nucleotide polymorphisms (SNPs) and infer a GAIN for a collection of SNPs associated with adverse events. Our results suggest an important role for hubs in SNP disease susceptibility networks. The software is available at http://sites.google.com/site/McKinneyLab/software

    Detecting purely epistatic multi-locus interactions by an omnibus permutation test on ensembles of two-locus analyses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Purely epistatic multi-locus interactions cannot generally be detected via single-locus analysis in case-control studies of complex diseases. Recently, many two-locus and multi-locus analysis techniques have been shown to be promising for the epistasis detection. However, exhaustive multi-locus analysis requires prohibitively large computational efforts when problems involve large-scale or genome-wide data. Furthermore, there is no explicit proof that a combination of multiple two-locus analyses can lead to the correct identification of multi-locus interactions.</p> <p>Results</p> <p>The proposed 2LOmb algorithm performs an omnibus permutation test on ensembles of two-locus analyses. The algorithm consists of four main steps: two-locus analysis, a permutation test, global <it>p</it>-value determination and a progressive search for the best ensemble. 2LOmb is benchmarked against an exhaustive two-locus analysis technique, a set association approach, a correlation-based feature selection (CFS) technique and a tuned ReliefF (TuRF) technique. The simulation results indicate that 2LOmb produces a low false-positive error. Moreover, 2LOmb has the best performance in terms of an ability to identify all causative single nucleotide polymorphisms (SNPs) and a low number of output SNPs in purely epistatic two-, three- and four-locus interaction problems. The interaction models constructed from the 2LOmb outputs via a multifactor dimensionality reduction (MDR) method are also included for the confirmation of epistasis detection. 2LOmb is subsequently applied to a type 2 diabetes mellitus (T2D) data set, which is obtained as a part of the UK genome-wide genetic epidemiology study by the Wellcome Trust Case Control Consortium (WTCCC). After primarily screening for SNPs that locate within or near 372 candidate genes and exhibit no marginal single-locus effects, the T2D data set is reduced to 7,065 SNPs from 370 genes. The 2LOmb search in the reduced T2D data reveals that four intronic SNPs in <it>PGM1 </it>(phosphoglucomutase 1), two intronic SNPs in <it>LMX1A </it>(LIM homeobox transcription factor 1, alpha), two intronic SNPs in <it>PARK2 </it>(Parkinson disease (autosomal recessive, juvenile) 2, parkin) and three intronic SNPs in <it>GYS2 </it>(glycogen synthase 2 (liver)) are associated with the disease. The 2LOmb result suggests that there is no interaction between each pair of the identified genes that can be described by purely epistatic two-locus interaction models. Moreover, there are no interactions between these four genes that can be described by purely epistatic multi-locus interaction models with marginal two-locus effects. The findings provide an alternative explanation for the aetiology of T2D in a UK population.</p> <p>Conclusion</p> <p>An omnibus permutation test on ensembles of two-locus analyses can detect purely epistatic multi-locus interactions with marginal two-locus effects. The study also reveals that SNPs from large-scale or genome-wide case-control data which are discarded after single-locus analysis detects no association can still be useful for genetic epidemiology studies.</p

    Discovering Higher-order SNP Interactions in High-dimensional Genomic Data

    Get PDF
    In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise

    Towards an Information Theoretic Framework for Evolutionary Learning

    Get PDF
    The vital essence of evolutionary learning consists of information flows between the environment and the entities differentially surviving and reproducing therein. Gain or loss of information in individuals and populations due to evolutionary steps should be considered in evolutionary algorithm theory and practice. Information theory has rarely been applied to evolutionary computation - a lacuna that this dissertation addresses, with an emphasis on objectively and explicitly evaluating the ensemble models implicit in evolutionary learning. Information theoretic functionals can provide objective, justifiable, general, computable, commensurate measures of fitness and diversity. We identify information transmission channels implicit in evolutionary learning. We define information distance metrics and indices for ensembles. We extend Price\u27s Theorem to non-random mating, give it an effective fitness interpretation and decompose it to show the key factors influencing heritability and evolvability. We argue that heritability and evolvability of our information theoretic indicators are high. We illustrate use of our indices for reproductive and survival selection. We develop algorithms to estimate information theoretic quantities on mixed continuous and discrete data via the empirical copula and information dimension. We extend statistical resampling. We present experimental and real world application results: chaotic time series prediction; parity; complex continuous functions; industrial process control; and small sample social science data. We formalize conjectures regarding evolutionary learning and information geometry

    Complex and Adaptive Dynamical Systems: A Primer

    Full text link
    An thorough introduction is given at an introductory level to the field of quantitative complex system science, with special emphasis on emergence in dynamical systems based on network topologies. Subjects treated include graph theory and small-world networks, a generic introduction to the concepts of dynamical system theory, random Boolean networks, cellular automata and self-organized criticality, the statistical modeling of Darwinian evolution, synchronization phenomena and an introduction to the theory of cognitive systems. It inludes chapter on Graph Theory and Small-World Networks, Chaos, Bifurcations and Diffusion, Complexity and Information Theory, Random Boolean Networks, Cellular Automata and Self-Organized Criticality, Darwinian evolution, Hypercycles and Game Theory, Synchronization Phenomena and Elements of Cognitive System Theory.Comment: unformatted version of the textbook; published in Springer, Complexity Series (2008, second edition 2010

    Current Challenges in Modeling Cellular Metabolism

    Get PDF
    Mathematical and computational models play an essential role in understanding the cellular metabolism. They are used as platforms to integrate current knowledge on a biological system and to systematically test and predict the effect of manipulations to such systems. The recent advances in genome sequencing techniques have facilitated the reconstruction of genome-scale metabolic networks for a wide variety of organisms from microbes to human cells. These models have been successfully used in multiple biotechnological applications. Despite these advancements, modeling cellular metabolism still presents many challenges. The aim of this Research Topic is not only to expose and consolidate the state-of-the-art in metabolic modeling approaches, but also to push this frontier beyond the current edge through the introduction of innovative solutions. The articles presented in this e-book address some of the main challenges in the field, including the integration of different modeling formalisms, the integration of heterogeneous data sources into metabolic models, explicit representation of other biological processes during phenotype simulation, and standardization efforts in the representation of metabolic models and simulation results

    Methods to detect Evolutionary Constraints: Application to HIV

    Get PDF
    The pandemic spread of HIV started almost 30 years ago and has claimed the lives of several million people since then. The virus attacks the immune system of its host. Untreated, the HIV infection leads to drastic opportunistic diseases, advancing to AIDS after a few years. At this stage, the immune system is severely damaged and causes the decease of the infected person. Fortunately, the number of new infections per year decreases thanks to extensive HIV/AIDS-awareness campaigns and potent treatment regimen. Whilst to date no vaccination or cure for an HIV infection has been found, there exist antiviral treatments, which allow infected people to reach a close to normal life expectancy. Still, the pandemic can not be eradicated, with annually new infections world wide running into millions, especially in developing countries. The reason is, on the one hand, lacking HIV education, hence renouncing of prevention, and on the other hand, unavailable treatment options. Also in developed countries the virus continues to spread, despite of pervasive awareness- and treatment programs. The highest risk of further transmission is early after infection. This can be explained with the elevated virus load in the human body in the early (acute) phase after transmission, in combination with the initial unawareness of the own infection. Usually, medication lowers the risk of viral transmission. However, HIV is a rapidly mutating virus with a fast replication cycle, which allows it to quickly adapt to selective pressure. This entails the development of resistant mutations, leading to reduced or no effect of an antiviral drug. The combination of drugs needs to be changed accordingly, to prevent treatment failure. The situation gets critical, if multi-drug resistances occur and hardly any drug combination is working. Yet, mutational pathways of reaching resistant mutations include bottlenecks and can only be acquired with a certain genomic background. It remains an ongoing challenge to learn more about the constraints for the virus to develop drug resistance. The currently available antiviral drugs target different processes of the HIV life cycle by binding to crucial enzymes, and other involved proteins, or by competing with nucleosides to stop the reverse transcription process. Recently, new therapeutic strategies are advancing: Despite the encoding for essential proteins, RNAs comprise a multitude of regulatory elements and functional structures controlling virtually all biological processes. Hence, regulating motifs in RNAs are promising drug targets, and first achievements in this direction have been attained. Nevertheless, processes of the regulatory mechanisms in the HIV life cycle, as well as detailed structural information of the RNA are still underexplored. In this work, we present methods for the qualitative and quantitative inference of evolutionary constraints, including the Mutational Intereference Mapping Experiment (MIME) and direct coupling analysis (DCA). Based on the former, we provide the software MIMEAnTo to predict functional elements in RNA. Furthermore, results of the adaption of the MIME framework to in cell experiments are presented. We determined regulatory motifs in the 5' UTR of HIV-1, which are essential for viral RNA production in cells, as well as the packaging process of nascent virions, respectively. Lastly, we attempt to improve the prediction of functional regions with MIME by incorporating methods used in DCA. We set up a benchmark with different scenarios of mutational effects (disrupting function) including pairwise epistasis (evolutionary constraints), and could indeed see improvements for a number of cases. Yet, these are preliminary results, which encourage us to further address the approach in more detail.Seit ihrem Beginn vor etwa 30 Jahren, konnte die weltweite Epidemie von HIV noch immer nicht aufgehalten werden und kostete seither mehrere Millionen Menschenleben. Der Virus greift das Immunsystem seines Wirtes an. Bleibt eine HIV Infektion unbehandelt, führt sie zu schweren opportunistischen Erkrankungen, bis hin zu AIDS. In dieser Phase ist das Immunsystem bereits schwer geschädigt und führt rasch zum Tode der infizierten Person. Glücklicherweise ist die Zahl der jährlichen Neuinfektionen aufgrund umfassender Aufklärungskampagnen und effektiver Behandlungsmöglichkeiten rückläufig. Zwar wurde bis heute keine wirksame Impfung oder vollkommene Heilung entdeckt, jedoch können infizierte Personen mit Hilfe von antiretroviralen Therapien eine nahezu durchschnittliche Lebenserwartung erreichen. Allerdings ist HIV, bedingt durch hohe Mutationsrate und schnellem Replikationszyklus, ein besonders anpassungsfähiger Virus, was ihm ermöglicht, Resistenzen gegen Medikamente auszubilden. Die Situation wird besonders kritisch, wenn sich Multiresistenzen entwickeln. Die Entdeckung neuer Behandlungsstrategien ist daher von dringender Notwendigkeit. RNAs enthalten neben Genregionen, in denen essentielle Proteine kodiert sind, eine Vielzahl regulatorischer Elemente und funktionaler Strukturen, die die meisten biologischen Prozesse beeinflussen. Eine Veränderung dieser Regionen durch Mutation, würde unter Umständen zur Folge haben, dass der Organismus nicht überlebt, da lebenswichtige Funktionen nicht ausgeführt werden können. Funtionsrelevante Motive und Strukturen bilden sogenannte Evolutionary Constraints, also funktions- und strukturabhängige Hemmung von Evolution. Das macht sie zu vielversprecheneden Angriffspunkten für neue Wirkstoffe. Dies setzt detailliertes Wissen über die regulatorischen Mechanismen und notwendigen Strukturen vorraus, die an lebenswichtigen Prozessen des HIV Zyklus beteiligt sind. Jedoch sind die genauen Abläufe und Zusammenhänge dieser Mechanismen nicht ausgiebig erforscht. In der vorliegenden Arbeit behandeln wir Techniken zur Bestimmung von Evolutionary Constraints, im Kontext von genomischer RNA in HIV. Wir erläutern unterschiedliche Herangehensweisen, um qualitative und quantitative Rückschlüsse auf Evolutionary Constraints zu ziehen. Hauptaugenmerk legen wir dabei auf die beiden Methoden Direct Coupling Analysis (DCA) und Mutational Interference Mapping Experiment (MIME). Basierend auf Letzterer, präsentieren wir Software zur Vorhersage von funktionalen Elementen in RNA. Des Weiteren haben wir das MIME Framework für in cellulo Experimente adaptiert. Dabei konnten wir regulatorische Motive im 5' untranslatierten Bereich des HIV-1 Genoms detektieren, welche sowohl wichtig für die Produktion viraler RNA in Zellen, als auch für die Integration des viralen Genoms in neu entstehende Viren sind. Zum Schluss haben wir das Ziel verfolgt, die Vorhersage funktionaler Regionen in MIME zu verbessern, in dem wir an DCA angelehnte Methoden einbeziehen. Dazu führen wir einen Benchmark mit verschiedenen Szenarien durch. Wir generieren Daten mit unterschiedlich vielen Mutationseffekten, welche die Funktion der RNA einschränken, sowie paarweiser Epistasis (Evolutionary Constraints). Tatächlich konnten wir für einige Fälle Verbesserungen feststellen. Dies sind jedoch vorerst vorläufige Ergebnisse, die uns aber bestärken, diesen Ansatz weiter zu verfolgen

    A complex systems approach to education in Switzerland

    Get PDF
    The insights gained from the study of complex systems in biological, social, and engineered systems enables us not only to observe and understand, but also to actively design systems which will be capable of successfully coping with complex and dynamically changing situations. The methods and mindset required for this approach have been applied to educational systems with their diverse levels of scale and complexity. Based on the general case made by Yaneer Bar-Yam, this paper applies the complex systems approach to the educational system in Switzerland. It confirms that the complex systems approach is valid. Indeed, many recommendations made for the general case have already been implemented in the Swiss education system. To address existing problems and difficulties, further steps are recommended. This paper contributes to the further establishment complex systems approach by shedding light on an area which concerns us all, which is a frequent topic of discussion and dispute among politicians and the public, where billions of dollars have been spent without achieving the desired results, and where it is difficult to directly derive consequences from actions taken. The analysis of the education system's different levels, their complexity and scale will clarify how such a dynamic system should be approached, and how it can be guided towards the desired performance

    Environmental modulation of microbial ecosystems

    Get PDF
    Natural microbiota are essential to the health of living systems - from the human gut to coral reefs. Although advances in DNA sequencing have allowed us to catalogue many of the different organisms that make up these microbial communities, significant challenges remain in understanding the complex networks of interspecies metabolic interactions they exhibit. These interactions are crucial to community stability and function, and are highly context-dependent: the availability of different nutrients can determine whether a set of microbes will interact cooperatively or competitively, which can drastically change a community’s structure. Disentangling the environmental factors that determine these behaviors will not only fundamentally enhance our knowledge of their ecological properties, but will also bring us closer to the rational engineering of synthetic microbiomes with novel functions. Here, I integrate modeling and experimental approaches to quantify the dependence of microbial communities on environmental composition. I then show how this relationship can be leveraged to facilitate the design of synthetic consortia. The first chapter of this dissertation is a review article that introduces a framework for cataloguing interaction mechanisms, which enables quantitative comparisons and predictive models of these complex phenomena. The second chapter is a computational study that explores one such attribute – metabolic cost – in high detail. It demonstrates how a large variety of molecules can be secreted without imposing a fitness cost on microbial organisms, allowing for the emergence of beneficial interspecies interactions. The third chapter is an experimental study that determines how the number of unique environmental nutrients affects microbial community growth and taxonomic diversity. The integration of stoichiometric and consumer resource models enabled the discovery of basic ecological principles that govern this environment-phenotype relationship. The fourth chapter applies these principles to the design of engineered communities via a search algorithm that identifies environmental compositions that yield specific ecosystem properties. This dissertation then concludes with extensions of the modeling methods used throughout this work to additional model systems. Future work could further quantify how microbial community phenotypes depend on each of the individual factors explored in this thesis, while also leveraging emerging knowledge on interaction mechanisms to design synthetic consortia

    Genetic mapping of metabolic biomarkers of cardiometabolic diseases

    Get PDF
    Cardiometabolic disorders (CMDs) are a major public health problem worldwide. The main goal of this thesis is to characterize the genetic architecture of CMD-related metabolites in a Lebanese cohort. In order to maximise the extraction of meaningful biological information from this dataset, an important part of this thesis focuses on the evaluation and subsequent improvement of the standard methods currently used for molecular epidemiology studies. First, I describe MetaboSignal, a novel network-based approach to explore the genetic regulation of the metabolome. Second, I comprehensively compare the recovery of metabolic information in the different 1H NMR strategies routinely used for metabolic profiling of plasma (standard 1D, spin-echo and JRES). Third, I describe a new method for dimensionality reduction of 1H NMR datasets prior to statistical modelling. Finally, I use all this methodological knowledge to search for molecular biomarkers of CMDs in a Lebanese population. Metabolome-wide association analyses identified a number of metabolites associated with CMDs, as well as several associations involving N-glycan units from acute-phase glycoproteins. Genetic mapping of these metabolites validated previously reported gene-metabolite associations, and revealed two novel loci associated with CMD-related metabolites. Collectively, this work contributes to the ongoing efforts to characterize the molecular mechanisms underlying complex human diseases.Open Acces
    corecore