6,624 research outputs found
Simplification of genetic programs: a literature survey
Genetic programming (GP), a widely used evolutionary computing technique, suffers from bloat—the problem of excessive growth in individuals’ sizes. As a result, its ability to efficiently explore complex search spaces reduces. The resulting solutions are less robust and generalisable. Moreover, it is difficult to understand and explain models which contain bloat. This phenomenon is well researched, primarily from the angle of controlling bloat: instead, our focus in this paper is to review the literature from an explainability point of view, by looking at how simplification can make GP models more explainable by reducing their sizes. Simplification is a code editing technique whose primary purpose is to make GP models more explainable. However, it can offer bloat control as an additional benefit when implemented and applied with caution. Researchers have proposed several simplification techniques and adopted various strategies to implement them. We organise the literature along multiple axes to identify the relative strengths and weaknesses of simplification techniques and to identify emerging trends and areas for future exploration. We highlight design and integration challenges and propose several avenues for research. One of them is to consider simplification as a standalone operator, rather than an extension of the standard crossover or mutation operators. Its role is then more clearly complementary to other GP operators, and it can be integrated as an optional feature into an existing GP setup. Another proposed avenue is to explore the lack of utilisation of complexity measures in simplification. So far, size is the most discussed measure, with only two pieces of prior work pointing out the benefits of using time as a measure when controlling bloat
Towards identifying salient patterns in genetic programming individuals
This thesis addresses the problem of offline identification of salient patterns in genetic programming individuals. It discusses the main issues related to automatic pattern identification systems, namely that these (a) should help in understanding the final solutions of the evolutionary run, (b) should give insight into the course of evolution and (c) should be helpful in optimizing future runs. Moreover, it proposes an algorithm, Extended Pattern Growing Algorithm ([E]PGA) to extract, filter and sort the identified patterns so that these fulfill as many as possible of the following criteria: (a) they are representative for the evolutionary run and/or search space, (b) they are human-friendly and (c) their numbers are within reasonable limits. The results are demonstrated on six problems from different domains
Towards identifying salient patterns in genetic programming individuals
This thesis addresses the problem of offline identification of salient patterns in genetic programming individuals. It discusses the main issues related to automatic pattern identification systems, namely that these (a) should help in understanding the final solutions of the evolutionary run, (b) should give insight into the course of evolution and (c) should be helpful in optimizing future runs. Moreover, it proposes an algorithm, Extended Pattern Growing Algorithm ([E]PGA) to extract, filter and sort the identified patterns so that these fulfill as many as possible of the following criteria: (a) they are representative for the evolutionary run and/or search space, (b) they are human-friendly and (c) their numbers are within reasonable limits. The results are demonstrated on six problems from different domains.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Numerical Simplification and its Effect on Fragment Distributions in Genetic Programming
In tree-based genetic programming (GP) there is a tendency for the program trees to increase in size from one generation to the next. If this increase in program size is not accompanied by an improvement in fitness
then this unproductive increase is known as bloat. It is standard practice to place some form of control on program size. This can be done
by limiting the number of nodes or the depth of the program trees, or by adding a component to the fitness function that rewards smaller programs (parsimony pressure) or by simplifying individual programs using
algebraic methods. This thesis proposes a novel program simplification
method called numerical simplification that uses only the range of values
the nodes take during fitness evaluation.
The effect of online program simplification, both algebraic and numerical, on program size and resource usage is examined. This thesis also examines the distribution of program fragments within a genetic programming population and how this is changed by using simplification.
It is shown that both simplification approaches result in reductions in
average program size, memory used and computation time and that numerical simplification performs at least as well as algebraic simplification,
and in some cases will outperform algebraic simplification. This reduction
in program size and the resources required to process the GP run come
without any significant reduction in accuracy. It is also shown that although the two online simplification methods destroy some existing program fragments, they generate new fragments during evolution, which
compensates for any negative effects from the disruption of existing fragments. It is also shown that, after the first few generations, the rate new fragments
are created, the rate fragments are lost from the population, and the
number of distinct (different) fragments in the population remain within
a very narrow range of values for the remainder of the run
Competent Program Evolution, Doctoral Dissertation, December 2006
Heuristic optimization methods are adaptive when they sample problem solutions based on knowledge of the search space gathered from past sampling. Recently, competent evolutionary optimization methods have been developed that adapt via probabilistic modeling of the search space. However, their effectiveness requires the existence of a compact problem decomposition in terms of prespecified solution parameters. How can we use these techniques to effectively and reliably solve program learning problems, given that program spaces will rarely have compact decompositions? One method is to manually build a problem-specific representation that is more tractable than the general space. But can this process be automated? My thesis is that the properties of programs and program spaces can be leveraged as inductive bias to reduce the burden of manual representation-building, leading to competent program evolution. The central contributions of this dissertation are a synthesis of the requirements for competent program evolution, and the design of a procedure, meta-optimizing semantic evolutionary search (MOSES), that meets these requirements. In support of my thesis, experimental results are provided to analyze and verify the effectiveness of MOSES, demonstrating scalability and real-world applicability
Differentiable Genetic Programming for High-dimensional Symbolic Regression
Symbolic regression (SR) is the process of discovering hidden relationships
from data with mathematical expressions, which is considered an effective way
to reach interpretable machine learning (ML). Genetic programming (GP) has been
the dominator in solving SR problems. However, as the scale of SR problems
increases, GP often poorly demonstrates and cannot effectively address the
real-world high-dimensional problems. This limitation is mainly caused by the
stochastic evolutionary nature of traditional GP in constructing the trees. In
this paper, we propose a differentiable approach named DGP to construct GP
trees towards high-dimensional SR for the first time. Specifically, a new data
structure called differentiable symbolic tree is proposed to relax the discrete
structure to be continuous, thus a gradient-based optimizer can be presented
for the efficient optimization. In addition, a sampling method is proposed to
eliminate the discrepancy caused by the above relaxation for valid symbolic
expressions. Furthermore, a diversification mechanism is introduced to promote
the optimizer escaping from local optima for globally better solutions. With
these designs, the proposed DGP method can efficiently search for the GP trees
with higher performance, thus being capable of dealing with high-dimensional
SR. To demonstrate the effectiveness of DGP, we conducted various experiments
against the state of the arts based on both GP and deep neural networks. The
experiment results reveal that DGP can outperform these chosen peer competitors
on high-dimensional regression benchmarks with dimensions varying from tens to
thousands. In addition, on the synthetic SR problems, the proposed DGP method
can also achieve the best recovery rate even with different noisy levels. It is
believed this work can facilitate SR being a powerful alternative to
interpretable ML for a broader range of real-world problems
Iterative Schedule Optimization for Parallelization in the Polyhedron Model
In high-performance computing, one primary objective is to exploit the performance that the given target hardware can deliver to the fullest. Compilers that have the ability to automatically optimize programs for a specific target hardware can be highly useful in this context. Iterative (or search-based) compilation requires little or no prior knowledge and can adapt more easily to concrete programs and target hardware than static cost models and heuristics. Thereby, iterative compilation helps in situations in which static heuristics do not reflect the combination of input program and target hardware well. Moreover, iterative compilation may enable the derivation of more accurate cost models and heuristics for optimizing compilers. In this context, the polyhedron model is of help as it provides not only a mathematical representation of programs but, more importantly, a uniform representation of complex sequences of program transformations by schedule functions. The latter facilitates the systematic exploration of the set of legal transformations of a given program.
Early approaches to purely iterative schedule optimization in the polyhedron model do not limit their search to schedules that preserve program semantics and, thereby, suffer from the need to explore numbers of illegal schedules. More recent research ensures the legality of program transformations but presumes a sequential rather than a parallel execution of the transformed program. Other approaches do not perform a purely iterative optimization.
We propose an approach to iterative schedule optimization for parallelization and tiling in the polyhedron model. Our approach targets loop programs that profit from data locality optimization and coarse-grained loop parallelization. The schedule search space can be explored either randomly or by means of a genetic algorithm.
To determine a schedule's profitability, we rely primarily on measuring the transformed code's execution time. While benchmarking is accurate, it increases the time and resource consumption of program optimization tremendously and can even make it impractical. We address this limitation by proposing to learn surrogate models from schedules generated and evaluated in previous runs of the iterative optimization and to replace benchmarking by performance prediction to the extent possible.
Our evaluation on the PolyBench 4.1 benchmark set reveals that, in a given setting, iterative schedule optimization yields significantly higher speedups in the execution of the program to be optimized. Surrogate performance models learned from training data that was generated during previous iterative optimizations can reduce the benchmarking effort without strongly impairing the optimization result. A prerequisite for this approach is a sufficient similarity between the training programs and the program to be optimized
Two-dimensional placement compaction using an evolutionary approach: a study
The placement problem of two-dimensional objects over planar surfaces optimizing
given utility functions is a combinatorial optimization problem. Our main drive is that of
surveying genetic algorithms and hybrid metaheuristics in terms of final positioning area
compaction of the solution. Furthermore, a new hybrid evolutionary approach, combining
a genetic algorithm merged with a non-linear compaction method is introduced and
compared with referenced literature heuristics using both randomly generated instances
and benchmark problems. A wide variety of experiments is made, and the respective
results and discussions are presented. Finally, conclusions are drawn, and future research
is defined
Bioinformatics
This article is made available for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.After reading this chapter, you should know the answers to these questions:
Why is sequence, structure, and biological pathway information relevant to medicine?
Where on the Internet should you look for a DNA sequence, a protein sequence, or a protein structure?
What are two problems encountered in analyzing biological sequence, structure, and function?
How has the age of genomics changed the landscape of bioinformatics?
What two changes should we anticipate in the medical record as a result of these new information sources?
What are two computational challenges in bioinformatics for the future
- …