3,316 research outputs found
Sequential Symbolic Regression with Genetic Programming
This chapter describes the Sequential Symbolic Regression (SSR) method, a new strategy for function approximation in symbolic regression. The SSR method is inspired by the sequential covering strategy from machine learning, but instead of sequentially reducing the size of the
problem being solved, it sequentially transforms the original problem into potentially simpler problems. This transformation is performed according to the semantic distances between the desired and obtained outputs and a geometric semantic operator. The rationale behind SSR is that, after generating a suboptimal function f via symbolic regression, the output errors can be approximated by another function in a subsequent iteration. The method was tested in eight polynomial functions, and compared with canonical genetic programming (GP) and geometric semantic genetic programming (SGP). Results showed that SSR significantly outperforms SGP and presents no statistical difference to GP. More importantly, they show the potential of the proposed strategy: an effective way of applying geometric semantic operators to combine different (partial) solutions, avoiding the exponential growth problem arising from the use of these operators
Temporal Feature Selection with Symbolic Regression
Building and discovering useful features when constructing machine learning models is the central task for the machine learning practitioner. Good features are useful not only in increasing the predictive power of a model but also in illuminating the underlying drivers of a target variable. In this research we propose a novel feature learning technique in which Symbolic regression is endowed with a ``Range Terminal\u27\u27 that allows it to explore functions of the aggregate of variables over time. We test the Range Terminal on a synthetic data set and a real world data in which we predict seasonal greenness using satellite derived temperature and snow data over a portion of the Arctic. On the synthetic data set we find Symbolic regression with the Range Terminal outperforms standard Symbolic regression and Lasso regression. On the Arctic data set we find it outperforms standard Symbolic regression, fails to beat the Lasso regression, but finds useful features describing the interaction between Land Surface Temperature, Snow, and seasonal vegetative growth in the Arctic
Symbolic regression of generative network models
Networks are a powerful abstraction with applicability to a variety of
scientific fields. Models explaining their morphology and growth processes
permit a wide range of phenomena to be more systematically analysed and
understood. At the same time, creating such models is often challenging and
requires insights that may be counter-intuitive. Yet there currently exists no
general method to arrive at better models. We have developed an approach to
automatically detect realistic decentralised network growth models from
empirical data, employing a machine learning technique inspired by natural
selection and defining a unified formalism to describe such models as computer
programs. As the proposed method is completely general and does not assume any
pre-existing models, it can be applied "out of the box" to any given network.
To validate our approach empirically, we systematically rediscover pre-defined
growth laws underlying several canonical network generation models and credible
laws for diverse real-world networks. We were able to find programs that are
simple enough to lead to an actual understanding of the mechanisms proposed,
namely for a simple brain and a social network
Elite Bases Regression: A Real-time Algorithm for Symbolic Regression
Symbolic regression is an important but challenging research topic in data
mining. It can detect the underlying mathematical models. Genetic programming
(GP) is one of the most popular methods for symbolic regression. However, its
convergence speed might be too slow for large scale problems with a large
number of variables. This drawback has become a bottleneck in practical
applications. In this paper, a new non-evolutionary real-time algorithm for
symbolic regression, Elite Bases Regression (EBR), is proposed. EBR generates a
set of candidate basis functions coded with parse-matrix in specific mapping
rules. Meanwhile, a certain number of elite bases are preserved and updated
iteratively according to the correlation coefficients with respect to the
target model. The regression model is then spanned by the elite bases. A
comparative study between EBR and a recent proposed machine learning method for
symbolic regression, Fast Function eXtraction (FFX), are conducted. Numerical
results indicate that EBR can solve symbolic regression problems more
effectively.Comment: The 2017 13th International Conference on Natural Computation, Fuzzy
Systems and Knowledge Discovery (ICNC-FSKD 2017
Exhaustive Symbolic Regression
Symbolic Regression (SR) algorithms learn analytic expressions which both
accurately fit data and, unlike traditional machine-learning approaches, are
highly interpretable. Conventional SR suffers from two fundamental issues which
we address in this work. First, since the number of possible equations grows
exponentially with complexity, typical SR methods search the space
stochastically and hence do not necessarily find the best function. In many
cases, the target problems of SR are sufficiently simple that a brute-force
approach is not only feasible, but desirable. Second, the criteria used to
select the equation which optimally balances accuracy with simplicity have been
variable and poorly motivated. To address these issues we introduce a new
method for SR -- Exhaustive Symbolic Regression (ESR) -- which systematically
and efficiently considers all possible equations and is therefore guaranteed to
find not only the true optimum but also a complete function ranking. Utilising
the minimum description length principle, we introduce a principled method for
combining these preferences into a single objective statistic. To illustrate
the power of ESR we apply it to a catalogue of cosmic chronometers and the
Pantheon+ sample of supernovae to learn the Hubble rate as a function of
redshift, finding 40 functions (out of 5.2 million considered) that fit
the data more economically than the Friedmann equation. These low-redshift data
therefore do not necessarily prefer a CDM expansion history, and
traditional SR algorithms that return only the Pareto-front, even if they found
this successfully, would not locate CDM. We make our code and full
equation sets publicly available.Comment: 14 pages, 6 figures, 2 tables. Submitted to IEEE Transactions on
Pattern Analysis and Machine Intelligenc
- …