9,881 research outputs found
A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data
Symbolic regression (SR) is a powerful technique for discovering the
underlying mathematical expressions from observed data. Inspired by the success
of deep learning, recent efforts have focused on two categories for SR methods.
One is using a neural network or genetic programming to search the expression
tree directly. Although this has shown promising results, the large search
space poses difficulties in learning constant factors and processing
high-dimensional problems. Another approach is leveraging a transformer-based
model training on synthetic data and offers advantages in inference speed.
However, this method is limited to fixed small numbers of dimensions and may
encounter inference problems when given data is out-of-distribution compared to
the synthetic data. In this work, we propose DySymNet, a novel neural-guided
Dynamic Symbolic Network for SR. Instead of searching for expressions within a
large search space, we explore DySymNet with various structures and optimize
them to identify expressions that better-fitting the data. With a topology
structure like neural networks, DySymNet not only tackles the challenge of
high-dimensional problems but also proves effective in optimizing constants.
Based on extensive numerical experiments using low-dimensional public standard
benchmarks and the well-known SRBench with more variables, our method achieves
state-of-the-art performance in terms of fitting accuracy and robustness to
noise
A Multi-Gene Genetic Programming Application for Predicting Students Failure at School
Several efforts to predict student failure rate (SFR) at school accurately
still remains a core problem area faced by many in the educational sector. The
procedure for forecasting SFR are rigid and most often times require data
scaling or conversion into binary form such as is the case of the logistic
model which may lead to lose of information and effect size attenuation. Also,
the high number of factors, incomplete and unbalanced dataset, and black boxing
issues as in Artificial Neural Networks and Fuzzy logic systems exposes the
need for more efficient tools. Currently the application of Genetic Programming
(GP) holds great promises and has produced tremendous positive results in
different sectors. In this regard, this study developed GPSFARPS, a software
application to provide a robust solution to the prediction of SFR using an
evolutionary algorithm known as multi-gene genetic programming. The approach is
validated by feeding a testing data set to the evolved GP models. Result
obtained from GPSFARPS simulations show its unique ability to evolve a suitable
failure rate expression with a fast convergence at 30 generations from a
maximum specified generation of 500. The multi-gene system was also able to
minimize the evolved model expression and accurately predict student failure
rate using a subset of the original expressionComment: 14 pages, 9 figures, Journal paper. arXiv admin note: text overlap
with arXiv:1403.0623 by other author
Temporal Feature Selection with Symbolic Regression
Building and discovering useful features when constructing machine learning models is the central task for the machine learning practitioner. Good features are useful not only in increasing the predictive power of a model but also in illuminating the underlying drivers of a target variable. In this research we propose a novel feature learning technique in which Symbolic regression is endowed with a ``Range Terminal\u27\u27 that allows it to explore functions of the aggregate of variables over time. We test the Range Terminal on a synthetic data set and a real world data in which we predict seasonal greenness using satellite derived temperature and snow data over a portion of the Arctic. On the synthetic data set we find Symbolic regression with the Range Terminal outperforms standard Symbolic regression and Lasso regression. On the Arctic data set we find it outperforms standard Symbolic regression, fails to beat the Lasso regression, but finds useful features describing the interaction between Land Surface Temperature, Snow, and seasonal vegetative growth in the Arctic
Differentiable Genetic Programming
We introduce the use of high order automatic differentiation, implemented via
the algebra of truncated Taylor polynomials, in genetic programming. Using the
Cartesian Genetic Programming encoding we obtain a high-order Taylor
representation of the program output that is then used to back-propagate errors
during learning. The resulting machine learning framework is called
differentiable Cartesian Genetic Programming (dCGP). In the context of symbolic
regression, dCGP offers a new approach to the long unsolved problem of constant
representation in GP expressions. On several problems of increasing complexity
we find that dCGP is able to find the exact form of the symbolic expression as
well as the constants values. We also demonstrate the use of dCGP to solve a
large class of differential equations and to find prime integrals of dynamical
systems, presenting, in both cases, results that confirm the efficacy of our
approach
Deriving Models for Software Project Effort Estimation By Means of Genetic Programming
Software engineering, effort estimation, genetic programming, symbolic regression. This paper presents the application of a computational intelligence methodology in effort estimation for software projects. Namely, we apply a genetic programming model for symbolic regression; aiming to produce mathematical expressions that (1) are highly accurate and (2) can be used for estimating the development effort by revealing relationships between the project’s features and the required work. We selected to investigate the effectiveness of this methodology into two software engineering domains. The system was proved able to generate models in the form of handy mathematical expressions that are more accurate than those found in literature.
- …