53 research outputs found

    Artificial evolution with Binary Decision Diagrams: a study in evolvability in neutral spaces

    Get PDF
    This thesis develops a new approach to evolving Binary Decision Diagrams, and uses it to study evolvability issues. For reasons that are not yet fully understood, current approaches to artificial evolution fail to exhibit the evolvability so readily exhibited in nature. To be able to apply evolvability to artificial evolution the field must first understand and characterise it; this will then lead to systems which are much more capable than they are currently. An experimental approach is taken. Carefully crafted, controlled experiments elucidate the mechanisms and properties that facilitate evolvability, focusing on the roles and interplay between neutrality, modularity, gradualism, robustness and diversity. Evolvability is found to emerge under gradual evolution as a biased distribution of functionality within the genotype-phenotype map, which serves to direct phenotypic variation. Neutrality facilitates fitness-conserving exploration, completely alleviating local optima. Population diversity, in conjunction with neutrality, is shown to facilitate the evolution of evolvability. The search is robust, scalable, and insensitive to the absence of initial diversity. The thesis concludes that gradual evolution in a search space that is free of local optima by way of neutrality can be a viable alternative to problematic evolution on multi-modal landscapes

    A Representation for Many Player Generalized Divide the Dollar Games

    Get PDF
    Divide the dollar is a simplified version of a two player bargaining problem game devised by John Nash. The generalized divide the dollar game has n \u3e 2 players. Evolutionary algorithms can be used to evolve individual players for this generalized game but representation—i.e., a genome plus a move or search operator(s)—must be carefully chosen since it affects the search process. This paper proposes an entirely new representation called a demand matrix. Each individual in the evolving population now represents a collection of n players rather than just an individual player. Players use previous outcomes to decide their choices (bids) in the current round. The representation scales linearly with the number of players and the move operator is a variant of an evolution strategy. The results indicate that this proposed representation for the generalized divide the dollar game permits the efficient evolution of large player populations with high payoffs and fair demand sets

    Visual analytics for relationships in scientific data

    Get PDF
    Domain scientists hope to address grand scientific challenges by exploring the abundance of data generated and made available through modern high-throughput techniques. Typical scientific investigations can make use of novel visualization tools that enable dynamic formulation and fine-tuning of hypotheses to aid the process of evaluating sensitivity of key parameters. These general tools should be applicable to many disciplines: allowing biologists to develop an intuitive understanding of the structure of coexpression networks and discover genes that reside in critical positions of biological pathways, intelligence analysts to decompose social networks, and climate scientists to model extrapolate future climate conditions. By using a graph as a universal data representation of correlation, our novel visualization tool employs several techniques that when used in an integrated manner provide innovative analytical capabilities. Our tool integrates techniques such as graph layout, qualitative subgraph extraction through a novel 2D user interface, quantitative subgraph extraction using graph-theoretic algorithms or by querying an optimized B-tree, dynamic level-of-detail graph abstraction, and template-based fuzzy classification using neural networks. We demonstrate our system using real-world workflows from several large-scale studies. Parallel coordinates has proven to be a scalable visualization and navigation framework for multivariate data. However, when data with thousands of variables are at hand, we do not have a comprehensive solution to select the right set of variables and order them to uncover important or potentially insightful patterns. We present algorithms to rank axes based upon the importance of bivariate relationships among the variables and showcase the efficacy of the proposed system by demonstrating autonomous detection of patterns in a modern large-scale dataset of time-varying climate simulation

    Discovering Higher-order SNP Interactions in High-dimensional Genomic Data

    Get PDF
    In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise

    High Dimensional Analysis of Genetic Data for the Classification of Type 2 Diabetes Using Advanced Machine Learning Algorithms

    Get PDF
    The prevalence of type 2 diabetes (T2D) has increased steadily over the last thirty years and has now reached epidemic proportions. The secondary complications associated with T2D have significant health and economic impacts worldwide and it is now regarded as the seventh leading cause of mortality. Therefore, understanding the underlying causes of T2D is high on government agendas. The condition is a multifactorial disorder with a complex aetiology. This means that T2D emerges from the convergence between genetics, the environment and diet, and lifestyle choices. The genetic determinants remain largely elusive, with only a handful of identified candidate genes. Genome-wide association studies (GWAS) have enhanced our understanding of genetic-based determinants in common complex human diseases. To date, 120 single nucleotide polymorphisms (SNPs) for T2D have been identified using GWAS. Standard statistical tests for single and multi-locus analysis, such as logistic regression, have demonstrated little effect in understanding the genetic architecture of complex human diseases. Logistic regression can capture linear interactions between SNPs and traits however it neglects the non-linear epistatic interactions that are often present within genetic data. Complex human diseases are caused by the contributions made by many interacting genetic variants. However, detecting epistatic interactions and understanding the underlying pathogenesis architecture of complex human disorders remains a significant challenge. This thesis presents a novel framework based on deep learning to reduce the high-dimensional space in GWAS and learn non-linear epistatic interactions in T2D genetic data for binary classification tasks. This framework includes traditional GWAS quality control, association analysis, deep learning stacked autoencoders, and a multilayer perceptron for classification. Quality control procedures are conducted to exclude genetic variants and individuals that do not meet a pre-specified criterion. Logistic association analysis under an additive genetic model adjusted for genomic control inflation factor is also conducted. SNPs generated with a p-value threshold of 10−2 are considered, resulting in 6609 SNPs (features), to remove statistically improbable SNPs and help minimise the computational requirements needed to process all SNPs. The 6609 SNPs are used for epistatic analysis through progressively smaller hidden layer units. Latent representations are extracted using stacked autoencoders to initialise a multilayer feedforward network for binary classification. The classifier is fine-tuned to discriminate between cases and controls using T2D genetic data. The performance of a deep learning stacked autoencoder model is evaluated and benchmarked against a multilayer perceptron and a random forest learning algorithm. The findings show that the best results were obtained using 2500 compressed hidden units (AUC=94.25%). However, the classification accuracy when using 300 compressed neurons remains reasonable with (AUC=80.78%). The results are promising. Using deep learning stacked autoencoders, it is possible to reduce high-dimensional features in T2D GWAS data and learn non-linear epistatic interactions between SNPs while enhancing overall model performance for binary classification purposes

    Composição de serviços para aplicações biomédicas

    Get PDF
    Doutoramento em Engenharia InformáticaA exigente inovação na área das aplicações biomédicas tem guiado a evolução das tecnologias de informação nas últimas décadas. Os desafios associados a uma gestão, integração, análise e interpretação eficientes dos dados provenientes das mais modernas tecnologias de hardware e software requerem um esforço concertado. Desde hardware para sequenciação de genes a registos electrónicos de paciente, passando por pesquisa de fármacos, a possibilidade de explorar com precisão os dados destes ambientes é vital para a compreensão da saúde humana. Esta tese engloba a discussão e o desenvolvimento de melhores estratégias informáticas para ultrapassar estes desafios, principalmente no contexto da composição de serviços, incluindo técnicas flexíveis de integração de dados, como warehousing ou federação, e técnicas avançadas de interoperabilidade, como serviços web ou LinkedData. A composição de serviços é apresentada como um ideal genérico, direcionado para a integração de dados e para a interoperabilidade de software. Relativamente a esta última, esta investigação debruçou-se sobre o campo da farmacovigilância, no contexto do projeto Europeu EU-ADR. As contribuições para este projeto, um novo standard de interoperabilidade e um motor de execução de workflows, sustentam a sucesso da EU-ADR Web Platform, uma plataforma para realizar estudos avançados de farmacovigilância. No contexto do projeto Europeu GEN2PHEN, esta investigação visou ultrapassar os desafios associados à integração de dados distribuídos e heterogéneos no campo do varíoma humano. Foi criada uma nova solução, WAVe - Web Analyses of the Variome, que fornece uma coleção rica de dados de variação genética através de uma interface Web inovadora e de uma API avançada. O desenvolvimento destas estratégias evidenciou duas oportunidades claras na área de software biomédico: melhorar o processo de implementação de software através do recurso a técnicas de desenvolvimento rápidas e aperfeiçoar a qualidade e disponibilidade dos dados através da adopção do paradigma de web semântica. A plataforma COEUS atravessa as fronteiras de integração e interoperabilidade, fornecendo metodologias para a aquisição e tradução flexíveis de dados, bem como uma camada de serviços interoperáveis para explorar semanticamente os dados agregados. Combinando as técnicas de desenvolvimento rápidas com a riqueza da perspectiva "Semantic Web in a box", a plataforma COEUS é uma aproximação pioneira, permitindo o desenvolvimento da próxima geração de aplicações biomédicas.The demand for innovation in the biomedical software domain has been an information technologies evolution driver over the last decades. The challenges associated with the effective management, integration, analyses and interpretation of the wealth of life sciences information stemming from modern hardware and software technologies require concerted efforts. From gene sequencing hardware to pharmacology research up to patient electronic health records, the ability to accurately explore data from these environments is vital to further improve our understanding of human health. This thesis encloses the discussion on building better informatics strategies to address these challenges, primarily in the context of service composition, including warehousing and federation strategies for resource integration, as well as web services or LinkedData for software interoperability. Service composition is introduced as a general principle, geared towards data integration and software interoperability. Concerning the latter, this research covers the service composition requirements within the pharmacovigilance field, namely on the European EU-ADR project. The contributions to this area, the definition of a new interoperability standard and the creation of a new workflow-wrapping engine, are behind the successful construction of the EUADR Web Platform, a workspace for delivering advanced pharmacovigilance studies. In the context of the European GEN2PHEN project, this research tackles the challenges associated with the integration of heterogeneous and distributed data in the human variome field. For this matter, a new lightweight solution was created: WAVe, Web Analysis of the Variome, provides a rich collection of genetic variation data through an innovative portal and an advanced API. The development of the strategies underlying these products highlighted clear opportunities in the biomedical software field: enhancing the software implementation process with rapid application development approaches and improving the quality and availability of data with the adoption of the Semantic Web paradigm. COEUS crosses the boundaries of integration and interoperability as it provides a framework for the flexible acquisition and translation of data into a semantic knowledge base, as well as a comprehensive set of interoperability services, from REST to LinkedData, to fully exploit gathered data semantically. By combining the lightness of rapid application development strategies with the richness of its "Semantic Web in a box" approach, COEUS is a pioneering framework to enhance the development of the next generation of biomedical applications

    Multi-layered model of individual HIV infection progression and mechanisms of phenotypical expression

    Get PDF
    Cite as: Perrin, Dimitri (2008) Multi-layered model of individual HIV infection progression and mechanisms of phenotypical expression. PhD thesis, Dublin City University

    Using MapReduce Streaming for Distributed Life Simulation on the Cloud

    Get PDF
    Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp
    corecore