158 research outputs found

    Gin: Genetic Improvement Research Made Easy

    Get PDF
    Genetic improvement (GI) is a young field of research on the cusp of transforming software development. GI uses search to improve existing software. Researchers have already shown that GI can improve human-written code, ranging from program repair to optimising run-time, from reducing energy-consumption to the transplantation of new functionality. Much remains to be done. The cost of re-implementing GI to investigate new approaches is hindering progress. Therefore, we present Gin, an extensible and modifiable toolbox for GI experimentation, with a novel combination of features. Instantiated in Java and targeting the Java ecosystem, Gin automatically transforms, builds, and tests Java projects. Out of the box, Gin supports automated test-generation and source code profiling. We show, through examples and a case study, how Gin facilitates experimentation and will speed innovation in GI

    From RNA folding to inverse folding: a computational study: Folding and design of RNA molecules

    Get PDF
    Since the discovery of the structure of DNA in the early 1953s and its double-chained complement of information hinting at its means of replication, biologists have recognized the strong connection between molecular structure and function. In the past two decades, there has been a surge of research on an ever-growing class of RNA molecules that are non-coding but whose various folded structures allow a diverse array of vital functions. From the well-known splicing and modification of ribosomal RNA, non-coding RNAs (ncRNAs) are now known to be intimately involved in possibly every stage of DNA translation and protein transcription, as well as RNA signalling and gene regulation processes. Despite the rapid development and declining cost of modern molecular methods, they typically can only describe ncRNA's structural conformations in vitro, which differ from their in vivo counterparts. Moreover, it is estimated that only a tiny fraction of known ncRNAs has been documented experimentally, often at a high cost. There is thus a growing realization that computational methods must play a central role in the analysis of ncRNAs. Not only do computational approaches hold the promise of rapidly characterizing many ncRNAs yet to be described, but there is also the hope that by understanding the rules that determine their structure, we will gain better insight into their function and design. Many studies revealed that the ncRNA functions are performed by high-level structures that often depend on their low-level structures, such as the secondary structure. This thesis studies the computational folding mechanism and inverse folding of ncRNAs at the secondary level. In this thesis, we describe the development of two bioinformatic tools that have the potential to improve our understanding of RNA secondary structure. These tools are as follows: (1) RAFFT for efficient prediction of pseudoknot-free RNA folding pathways using the fast Fourier transform (FFT)}; (2) aRNAque, an evolutionary algorithm inspired by Lévy flights for RNA inverse folding with or without pseudoknot (A secondary structure that often poses difficulties for bio-computational detection). The first tool, RAFFT, implements a novel heuristic to predict RNA secondary structure formation pathways that has two components: (i) a folding algorithm and (ii) a kinetic ansatz. When considering the best prediction in the ensemble of 50 secondary structures predicted by RAFFT, its performance matches the recent deep-learning-based structure prediction methods. RAFFT also acts as a folding kinetic ansatz, which we tested on two RNAs: the CFSE and a classic bi-stable sequence. In both test cases, fewer structures were required to reproduce the full kinetics, whereas known methods (such as Treekin) required a sample of 20,000 structures and more. The second tool, aRNAque, implements an evolutionary algorithm (EA) inspired by the Lévy flight, allowing both local global search and which supports pseudoknotted target structures. The number of point mutations at every step of aRNAque's EA is drawn from a Zipf distribution. Therefore, our proposed method increases the diversity of designed RNA sequences and reduces the average number of evaluations of the evolutionary algorithm. The overall performance showed improved empirical results compared to existing tools through intensive benchmarks on both pseudoknotted and pseudoknot-free datasets. In conclusion, we highlight some promising extensions of the versatile RAFFT method to RNA-RNA interaction studies. We also provide an outlook on both tools' implications in studying evolutionary dynamics

    Faster Evolutionary Multi-Objective Optimization via GALE, the Geometric Active Learner

    Get PDF
    Goal optimization has long been a topic of great interest in computer science. The literature contains many thousands of papers that discuss methods for the search of optimal solutions to complex problems. In the case of multi-objective optimization, such a search yields iteratively improved approximations to the Pareto frontier, i.e. the set of best solutions contained along a trade-off curve of competing objectives.;To approximate the Pareto frontier, one method that is ubiquitous throughout the field of optimization is stochastic search. Stochastic search engines explore solution spaces by randomly mutating candidate guesses to generate new solutions. This mutation policy is employed by the most commonly used tools (e.g. NSGA-II, SPEA2, etc.), with the goal of a) avoiding local optima, and b) expand upon diversity in the set of generated approximations. Such blind mutation policies explore many sub-optimal solutions that are discarded when better solutions are found. Hence, this approach has two problems. Firstly, stochastic search can be unnecessarily computationally expensive due to evaluating an overwhelming number of candidates. Secondly, the generated approximations to the Pareto frontier are usually very large, and can be difficult to understand.;To solve these two problems, a more-directed, less-stochastic approach than standard search tools is necessary. This thesis presents GALE (Geometric Active Learning). GALE is an active learner that finds approximations to the Pareto frontier by spectrally clustering candidates using a near-linear time recursive descent algorithm that iteratively divides candidates into halves (called leaves at the bottom level). Active learning in GALE selects a minimally most-informative subset of candidates by only evaluating the two-most different candidates during each descending split; hence, GALE only requires at most, 2Log2(N) evaluations per generation. The candidates of each leaf are thereafter non-stochastically mutated in the most promising directions along each piece. Those leafs are piece-wise approximations to the Pareto frontier.;The experiments of this thesis lead to the following conclusion: a near-linear time recursive binary division of the decision space of candidates in a multi-objective optimization algorithm can find useful directions to mutate instances and find quality solutions much faster than traditional randomization approaches. Specifically, in comparative studies with standard methods (NSGA-II and SPEA2) applied to a variety of models, GALE required orders of magnitude fewer evaluations to find solutions. As a result, GALE can perform dramatically faster than the other methods, especially for realistic models
    corecore