3,908 research outputs found
Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing
The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval arithmetic is applied to ensure the consistency of a model.In order to prevent over-fitting, we merit a model not only on predictions in the data points, but also on the complexity of a model.For the complexity we introduce a new measure.We compare our new method with the Kriging meta-model and against a Symbolic Regression meta-model based on Genetic Programming.We conclude that Pareto Simulated Annealing based Symbolic Regression is very competitive compared to the other meta-model approachesapproximation;meta-modeling;pareto simulated annealing;symbolic regression
How Noisy Data Affects Geometric Semantic Genetic Programming
Noise is a consequence of acquiring and pre-processing data from the
environment, and shows fluctuations from different sources---e.g., from
sensors, signal processing technology or even human error. As a machine
learning technique, Genetic Programming (GP) is not immune to this problem,
which the field has frequently addressed. Recently, Geometric Semantic Genetic
Programming (GSGP), a semantic-aware branch of GP, has shown robustness and
high generalization capability. Researchers believe these characteristics may
be associated with a lower sensibility to noisy data. However, there is no
systematic study on this matter. This paper performs a deep analysis of the
GSGP performance over the presence of noise. Using 15 synthetic datasets where
noise can be controlled, we added different ratios of noise to the data and
compared the results obtained with those of a canonical GP. The results show
that, as we increase the percentage of noisy instances, the generalization
performance degradation is more pronounced in GSGP than GP. However, in
general, GSGP is more robust to noise than GP in the presence of up to 10% of
noise, and presents no statistical difference for values higher than that in
the test bed.Comment: 8 pages, In proceedings of Genetic and Evolutionary Computation
Conference (GECCO 2017), Berlin, German
Genetic Programming is Naturally Suited to Evolve Bagging Ensembles
Learning ensembles by bagging can substantially improve the generalization
performance of low-bias, high-variance estimators, including those evolved by
Genetic Programming (GP). To be efficient, modern GP algorithms for evolving
(bagging) ensembles typically rely on several (often inter-connected)
mechanisms and respective hyper-parameters, ultimately compromising ease of
use. In this paper, we provide experimental evidence that such complexity might
not be warranted. We show that minor changes to fitness evaluation and
selection are sufficient to make a simple and otherwise-traditional GP
algorithm evolve ensembles efficiently. The key to our proposal is to exploit
the way bagging works to compute, for each individual in the population,
multiple fitness values (instead of one) at a cost that is only marginally
higher than the one of a normal fitness evaluation. Experimental comparisons on
classification and regression tasks taken and reproduced from prior studies
show that our algorithm fares very well against state-of-the-art ensemble and
non-ensemble GP algorithms. We further provide insights into the proposed
approach by (i) scaling the ensemble size, (ii) ablating the changes to
selection, (iii) observing the evolvability induced by traditional subtree
variation. Code: https://github.com/marcovirgolin/2SEGP.Comment: Added interquartile range in tables 1, 2, and 3; improved Fig. 3 and
its analysis, improved experiment design of section 7.
A Probabilistic Linear Genetic Programming with Stochastic Context-Free Grammar for solving Symbolic Regression problems
Traditional Linear Genetic Programming (LGP) algorithms are based only on the
selection mechanism to guide the search. Genetic operators combine or mutate
random portions of the individuals, without knowing if the result will lead to
a fitter individual. Probabilistic Model Building Genetic Programming (PMB-GP)
methods were proposed to overcome this issue through a probability model that
captures the structure of the fit individuals and use it to sample new
individuals. This work proposes the use of LGP with a Stochastic Context-Free
Grammar (SCFG), that has a probability distribution that is updated according
to selected individuals. We proposed a method for adapting the grammar into the
linear representation of LGP. Tests performed with the proposed probabilistic
method, and with two hybrid approaches, on several symbolic regression
benchmark problems show that the results are statistically better than the
obtained by the traditional LGP.Comment: Genetic and Evolutionary Computation Conference (GECCO) 2017, Berlin,
German
Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing
The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval arithmetic is applied to ensure the consistency of a model.In order to prevent over-fitting, we merit a model not only on predictions in the data points, but also on the complexity of a model.For the complexity we introduce a new measure.We compare our new method with the Kriging meta-model and against a Symbolic Regression meta-model based on Genetic Programming.We conclude that Pareto Simulated Annealing based Symbolic Regression is very competitive compared to the other meta-model approaches
Taylor Genetic Programming for Symbolic Regression
Genetic programming (GP) is a commonly used approach to solve symbolic regression (SR) problems. Compared with the machine learning or deep learning methods that depend on the pre-defined model and the training dataset for solving SR problems, GP is more focused on finding the solution in a search space. Although GP has good performance on large-scale benchmarks, it randomly transforms individuals to search results without taking advantage of the characteristics of the dataset. So, the search process of GP is usually slow, and the final results could be unstable. To guide GP by these characteristics, we propose a new method for SR, called Taylor genetic programming (TaylorGP). TaylorGP leverages a Taylor polynomial to approximate the symbolic equation that fits the dataset. It also utilizes the Taylor polynomial to extract the features of the symbolic equation: low order polynomial discrimination, variable separability, boundary, monotonic, and parity. GP is enhanced by these Taylor polynomial techniques. Experiments are conducted on three kinds of benchmarks: classical SR, machine learning, and physics. The experimental results show that TaylorGP not only has higher accuracy than the nine baseline methods, but also is faster in finding stable results
- …