6 research outputs found
Temporal Feature Selection with Symbolic Regression
Building and discovering useful features when constructing machine learning models is the central task for the machine learning practitioner. Good features are useful not only in increasing the predictive power of a model but also in illuminating the underlying drivers of a target variable. In this research we propose a novel feature learning technique in which Symbolic regression is endowed with a ``Range Terminal\u27\u27 that allows it to explore functions of the aggregate of variables over time. We test the Range Terminal on a synthetic data set and a real world data in which we predict seasonal greenness using satellite derived temperature and snow data over a portion of the Arctic. On the synthetic data set we find Symbolic regression with the Range Terminal outperforms standard Symbolic regression and Lasso regression. On the Arctic data set we find it outperforms standard Symbolic regression, fails to beat the Lasso regression, but finds useful features describing the interaction between Land Surface Temperature, Snow, and seasonal vegetative growth in the Arctic
Where are we now? A large benchmark study of recent symbolic regression methods
In this paper we provide a broad benchmarking of recent genetic programming
approaches to symbolic regression in the context of state of the art machine
learning approaches. We use a set of nearly 100 regression benchmark problems
culled from open source repositories across the web. We conduct a rigorous
benchmarking of four recent symbolic regression approaches as well as nine
machine learning approaches from scikit-learn. The results suggest that
symbolic regression performs strongly compared to state-of-the-art gradient
boosting algorithms, although in terms of running times is among the slowest of
the available methodologies. We discuss the results in detail and point to
future research directions that may allow symbolic regression to gain wider
adoption in the machine learning community.Comment: 8 pages, 4 figures. GECCO 201
Global solar irradiation prediction using a multi-gene genetic programming approach
This is the author accepted manuscript. The final version is available from AIP Publishing via the DOI in this record.In this paper, a nonlinear symbolic regression technique using an evolutionary algorithm known as multi-gene genetic programming (MGGP) is applied for a data-driven modelling between the dependent and the independent variables. The technique is applied for modelling the measured global solar irradiation and validated through numerical simulations. The proposed modelling technique shows improved results over the fuzzy logic and artificial neural network (ANN) based approaches as attempted by contemporary researchers. The method proposed here results in nonlinear analytical expressions, unlike those with neural networks which is essentially a black box modelling approach. This additional flexibility is an advantage from the modelling perspective and helps to discern the important variables which affect the prediction. Due to the evolutionary nature of the algorithm, it is able to get out of local minima and converge to a global optimum unlike the back-propagation (BP) algorithm used for training neural networks. This results in a better percentage fit than the ones obtained using neural networks by contemporary researchers. Also a hold-out cross validation is done on the obtained genetic programming (GP) results which show that the results generalize well to new data and do not over-fit the training samples. The multi-gene GP results are compared with those, obtained using its single-gene version and also the same with four classical regression models in order to show the effectiveness of the adopted approach