183 research outputs found

    ConSemblEX: A Consensus-Based Transcriptome Assembly Approach that Extends ConSemble and Improves Transcriptome Assembly

    Get PDF
    An accurate transcriptome is essential to understanding biological systems enabling omics analyses such as gene expression, gene discovery, and gene-regulatory network construction. However, assembling an accurate transcriptome is challenging, especially for organisms without adequate reference genomes or transcriptomes. While several methods for transcriptome assembly with different approaches exist, it is still difficult to establish the most accurate methods. This thesis explores the different transcriptome assembly methods and compares their performances using simulated benchmark transcriptomes with varying complexity. We also introduce ConSemblEX to improve a consensus-based ensemble transcriptome assembler, ConSemble, in three main areas: we provide the ability to use any number of assemblers, provide a variety of consensus assembly outputs, and provide information about the effect of each assembler in the final assembly. Using five assembly methods both in the de novo and genome-guided approaches, we showed how ConSemblEX can be used to explore various strategies for consensus assembly, such as ConSemblEX-4+, to find the optimum assembly. Compared to the original ConSemble, ConSemblEX improved the de novo assembly performance, increasing the precision by 14% and F1 by 5%, and significantly reducing the FP by 49%. In the genome-guided assembly, ConSemblEX had identical performance to the original ConSemble. We showed that ConSemblEX provides tools to explore how different methods perform and behave depending on the datasets. With the ConSemblEX-select assembly, we further demonstrated that we can improve consensus-based assembly more by choosing optimum overlap sets among different methods. Such information provides the foundation to develop machine learning algorithms in the future to further improve transcriptome assembly performance. Adviser: Jitender Deogu

    A Field Guide to Genetic Programming

    Get PDF
    xiv, 233 p. : il. ; 23 cm.Libro ElectrónicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction -- Representation, initialisation and operators in Tree-based GP -- Getting ready to run genetic programming -- Example genetic programming run -- Alternative initialisations and operators in Tree-based GP -- Modular, grammatical and developmental Tree-based GP -- Linear and graph genetic programming -- Probalistic genetic programming -- Multi-objective genetic programming -- Fast and distributed genetic programming -- GP theory and its applications -- Applications -- Troubleshooting GP -- Conclusions.Contents xi 1 Introduction 1.1 Genetic Programming in a Nutshell 1.2 Getting Started 1.3 Prerequisites 1.4 Overview of this Field Guide I Basics 2 Representation, Initialisation and GP 2.1 Representation 2.2 Initialising the Population 2.3 Selection 2.4 Recombination and Mutation Operators in Tree-based 3 Getting Ready to Run Genetic Programming 19 3.1 Step 1: Terminal Set 19 3.2 Step 2: Function Set 20 3.2.1 Closure 21 3.2.2 Sufficiency 23 3.2.3 Evolving Structures other than Programs 23 3.3 Step 3: Fitness Function 24 3.4 Step 4: GP Parameters 26 3.5 Step 5: Termination and solution designation 27 4 Example Genetic Programming Run 4.1 Preparatory Steps 29 4.2 Step-by-Step Sample Run 31 4.2.1 Initialisation 31 4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming 5 Alternative Initialisations and Operators in 5.1 Constructing the Initial Population 5.1.1 Uniform Initialisation 5.1.2 Initialisation may Affect Bloat 5.1.3 Seeding 5.2 GP Mutation 5.2.1 Is Mutation Necessary? 5.2.2 Mutation Cookbook 5.3 GP Crossover 5.4 Other Techniques 32 5.5 Tree-based GP 39 6 Modular, Grammatical and Developmental Tree-based GP 47 6.1 Evolving Modular and Hierarchical Structures 47 6.1.1 Automatically Defined Functions 48 6.1.2 Program Architecture and Architecture-Altering 50 6.2 Constraining Structures 51 6.2.1 Enforcing Particular Structures 52 6.2.2 Strongly Typed GP 52 6.2.3 Grammar-based Constraints 53 6.2.4 Constraints and Bias 55 6.3 Developmental Genetic Programming 57 6.4 Strongly Typed Autoconstructive GP with PushGP 59 7 Linear and Graph Genetic Programming 61 7.1 Linear Genetic Programming 61 7.1.1 Motivations 61 7.1.2 Linear GP Representations 62 7.1.3 Linear GP Operators 64 7.2 Graph-Based Genetic Programming 65 7.2.1 Parallel Distributed GP (PDGP) 65 7.2.2 PADO 67 7.2.3 Cartesian GP 67 7.2.4 Evolving Parallel Programs using Indirect Encodings 68 8 Probabilistic Genetic Programming 8.1 Estimation of Distribution Algorithms 69 8.2 Pure EDA GP 71 8.3 Mixing Grammars and Probabilities 74 9 Multi-objective Genetic Programming 75 9.1 Combining Multiple Objectives into a Scalar Fitness Function 75 9.2 Keeping the Objectives Separate 76 9.2.1 Multi-objective Bloat and Complexity Control 77 9.2.2 Other Objectives 78 9.2.3 Non-Pareto Criteria 80 9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80 9.4 Multi-objective Optimisation via Operator Bias 81 10 Fast and Distributed Genetic Programming 83 10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83 10.2 Reducing Cost of Fitness with Caches 86 10.3 Parallel and Distributed GP are Not Equivalent 88 10.4 Running GP on Parallel Hardware 89 10.4.1 Master–slave GP 89 10.4.2 GP Running on GPUs 90 10.4.3 GP on FPGAs 92 10.4.4 Sub-machine-code GP 93 10.5 Geographically Distributed GP 93 11 GP Theory and its Applications 97 11.1 Mathematical Models 98 11.2 Search Spaces 99 11.3 Bloat 101 11.3.1 Bloat in Theory 101 11.3.2 Bloat Control in Practice 104 III Practical Genetic Programming 12 Applications 12.1 Where GP has Done Well 12.2 Curve Fitting, Data Modelling and Symbolic Regression 12.3 Human Competitive Results – the Humies 12.4 Image and Signal Processing 12.5 Financial Trading, Time Series, and Economic Modelling 12.6 Industrial Process Control 12.7 Medicine, Biology and Bioinformatics 12.8 GP to Create Searchers and Solvers – Hyper-heuristics xiii 12.9 Entertainment and Computer Games 127 12.10The Arts 127 12.11Compression 128 13 Troubleshooting GP 13.1 Is there a Bug in the Code? 13.2 Can you Trust your Results? 13.3 There are No Silver Bullets 13.4 Small Changes can have Big Effects 13.5 Big Changes can have No Effect 13.6 Study your Populations 13.7 Encourage Diversity 13.8 Embrace Approximation 13.9 Control Bloat 13.10 Checkpoint Results 13.11 Report Well 13.12 Convince your Customers 14 Conclusions Tricks of the Trade A Resources A.1 Key Books A.2 Key Journals A.3 Key International Meetings A.4 GP Implementations A.5 On-Line Resources 145 B TinyGP 151 B.1 Overview of TinyGP 151 B.2 Input Data Files for TinyGP 153 B.3 Source Code 154 B.4 Compiling and Running TinyGP 162 Bibliography 167 Inde

    Robust optimization of algorithmic trading systems

    Get PDF
    GAs (Genetic Algorithms) and GP (Genetic Programming) are investigated for finding robust Technical Trading Strategies (TTSs). TTSs evolved with standard GA/GP techniques tend to suffer from over-fitting as the solutions evolved are very fragile to small disturbances in the data. The main objective of this thesis is to explore optimization techniques for GA/GP which produce robust TTSs that have a similar performance during both optimization and evaluation, and are also able to operate in all market conditions and withstand severe market shocks. In this thesis, two novel techniques that increase the robustness of TTSs and reduce over-fitting are described and compared to standard GA/GP optimization techniques and the traditional investment strategy Buy & Hold. The first technique employed is a robust multi-market optimization methodology using a GA. Robustness is incorporated via the environmental variables of the problem, i.e. variablity in the dataset is introduced by conducting the search for the optimum parameters over several market indices, in the hope of exposing the GA to differing market conditions. This technique shows an increase in the robustness of the solutions produced, with results also showing an improvement in terms of performance when compared to those offered by conducting the optimization over a single market. The second technique is a random sampling method we use to discover robust TTSs using GP. Variability is introduced in the dataset by randomly sampling segments and evaluating each individual on different random samples. This technique has shown promising results, substantially beating Buy & Hold. Overall, this thesis concludes that Evolutionary Computation techniques such as GA and GP combined with robust optimization methods are very suitable for developing trading systems, and that the systems developed using these techniques can be used to provide significant economic profits in all market conditions

    Ant Colony Optimization

    Get PDF
    Ant Colony Optimization (ACO) is the best example of how studies aimed at understanding and modeling the behavior of ants and other social insects can provide inspiration for the development of computational algorithms for the solution of difficult mathematical problems. Introduced by Marco Dorigo in his PhD thesis (1992) and initially applied to the travelling salesman problem, the ACO field has experienced a tremendous growth, standing today as an important nature-inspired stochastic metaheuristic for hard optimization problems. This book presents state-of-the-art ACO methods and is divided into two parts: (I) Techniques, which includes parallel implementations, and (II) Applications, where recent contributions of ACO to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics are presented

    Multiobjective genetic programming can improve the explanatory capabilities of mechanism-based models of social systems

    Get PDF
    The generative approach to social science, in which agent-based simulations (or other complex systems models) are executed to reproduce a known social phenomenon, is an important tool for realist explanation. However, a generative model, when suitably calibrated and validated using empirical data, represents just one viable candidate set of entities and mechanisms. The model only partially addresses the needs of an abductive reasoning process - specifically it does not provide insight into other viable sets of entities or mechanisms, nor suggest which of these are fundamentally constitutive for the phenomenon to exist. In this paper, we propose a new model discovery framework that more fully captures the needs of realist explanation. The framework exploits the implicit ontology of an existing human-built generative model to propose and test a plurality of new candidate model structures. Genetic programming is used to automate this search process. A multi-objective approach is used, which enables multiple perspectives on the value of any particular generative model - such as goodness-of-fit, parsimony, and interpretability - to be represented simultaneously. We demonstrate this new framework using a complex systems modeling case study of change and stasis in societal alcohol use patterns in the US over the period 1980-2010. The framework is successful in identifying three competing explanations of these alcohol use patterns, using novel integrations of social role theory not previously considered by the human modeler. Practitioners in complex systems modeling should use model discovery to improve the explanatory utility of the generative approach to realist social science

    Formal methods paradigms for estimation and machine learning in dynamical systems

    Get PDF
    Formal methods are widely used in engineering to determine whether a system exhibits a certain property (verification) or to design controllers that are guaranteed to drive the system to achieve a certain property (synthesis). Most existing techniques require a large amount of accurate information about the system in order to be successful. The methods presented in this work can operate with significantly less prior information. In the domain of formal synthesis for robotics, the assumptions of perfect sensing and perfect knowledge of system dynamics are unrealistic. To address this issue, we present control algorithms that use active estimation and reinforcement learning to mitigate the effects of uncertainty. In the domain of cyber-physical system analysis, we relax the assumption that the system model is known and identify system properties automatically from execution data. First, we address the problem of planning the path of a robot under temporal logic constraints (e.g. "avoid obstacles and periodically visit a recharging station") while simultaneously minimizing the uncertainty about the state of an unknown feature of the environment (e.g. locations of fires after a natural disaster). We present synthesis algorithms and evaluate them via simulation and experiments with aerial robots. Second, we develop a new specification language for tasks that require gathering information about and interacting with a partially observable environment, e.g. "Maintain localization error below a certain level while also avoiding obstacles.'' Third, we consider learning temporal logic properties of a dynamical system from a finite set of system outputs. For example, given maritime surveillance data we wish to find the specification that corresponds only to those vessels that are deemed law-abiding. Algorithms for performing off-line supervised and unsupervised learning and on-line supervised learning are presented. Finally, we consider the case in which we want to steer a system with unknown dynamics to satisfy a given temporal logic specification. We present a novel reinforcement learning paradigm to solve this problem. Our procedure gives "partial credit'' for executions that almost satisfy the specification, which can lead to faster convergence rates and produce better solutions when the specification is not satisfiable

    A brief history of learning classifier systems: from CS-1 to XCS and its variants

    Get PDF
    © 2015, Springer-Verlag Berlin Heidelberg. The direction set by Wilson’s XCS is that modern Learning Classifier Systems can be characterized by their use of rule accuracy as the utility metric for the search algorithm(s) discovering useful rules. Such searching typically takes place within the restricted space of co-active rules for efficiency. This paper gives an overview of the evolution of Learning Classifier Systems up to XCS, and then of some of the subsequent developments of Wilson’s algorithm to different types of learning
    • …
    corecore