3,316 research outputs found

    Sequential Symbolic Regression with Genetic Programming

    Get PDF
    This chapter describes the Sequential Symbolic Regression (SSR) method, a new strategy for function approximation in symbolic regression. The SSR method is inspired by the sequential covering strategy from machine learning, but instead of sequentially reducing the size of the problem being solved, it sequentially transforms the original problem into potentially simpler problems. This transformation is performed according to the semantic distances between the desired and obtained outputs and a geometric semantic operator. The rationale behind SSR is that, after generating a suboptimal function f via symbolic regression, the output errors can be approximated by another function in a subsequent iteration. The method was tested in eight polynomial functions, and compared with canonical genetic programming (GP) and geometric semantic genetic programming (SGP). Results showed that SSR significantly outperforms SGP and presents no statistical difference to GP. More importantly, they show the potential of the proposed strategy: an effective way of applying geometric semantic operators to combine different (partial) solutions, avoiding the exponential growth problem arising from the use of these operators

    Temporal Feature Selection with Symbolic Regression

    Get PDF
    Building and discovering useful features when constructing machine learning models is the central task for the machine learning practitioner. Good features are useful not only in increasing the predictive power of a model but also in illuminating the underlying drivers of a target variable. In this research we propose a novel feature learning technique in which Symbolic regression is endowed with a ``Range Terminal\u27\u27 that allows it to explore functions of the aggregate of variables over time. We test the Range Terminal on a synthetic data set and a real world data in which we predict seasonal greenness using satellite derived temperature and snow data over a portion of the Arctic. On the synthetic data set we find Symbolic regression with the Range Terminal outperforms standard Symbolic regression and Lasso regression. On the Arctic data set we find it outperforms standard Symbolic regression, fails to beat the Lasso regression, but finds useful features describing the interaction between Land Surface Temperature, Snow, and seasonal vegetative growth in the Arctic

    Symbolic regression of generative network models

    Full text link
    Networks are a powerful abstraction with applicability to a variety of scientific fields. Models explaining their morphology and growth processes permit a wide range of phenomena to be more systematically analysed and understood. At the same time, creating such models is often challenging and requires insights that may be counter-intuitive. Yet there currently exists no general method to arrive at better models. We have developed an approach to automatically detect realistic decentralised network growth models from empirical data, employing a machine learning technique inspired by natural selection and defining a unified formalism to describe such models as computer programs. As the proposed method is completely general and does not assume any pre-existing models, it can be applied "out of the box" to any given network. To validate our approach empirically, we systematically rediscover pre-defined growth laws underlying several canonical network generation models and credible laws for diverse real-world networks. We were able to find programs that are simple enough to lead to an actual understanding of the mechanisms proposed, namely for a simple brain and a social network

    Elite Bases Regression: A Real-time Algorithm for Symbolic Regression

    Full text link
    Symbolic regression is an important but challenging research topic in data mining. It can detect the underlying mathematical models. Genetic programming (GP) is one of the most popular methods for symbolic regression. However, its convergence speed might be too slow for large scale problems with a large number of variables. This drawback has become a bottleneck in practical applications. In this paper, a new non-evolutionary real-time algorithm for symbolic regression, Elite Bases Regression (EBR), is proposed. EBR generates a set of candidate basis functions coded with parse-matrix in specific mapping rules. Meanwhile, a certain number of elite bases are preserved and updated iteratively according to the correlation coefficients with respect to the target model. The regression model is then spanned by the elite bases. A comparative study between EBR and a recent proposed machine learning method for symbolic regression, Fast Function eXtraction (FFX), are conducted. Numerical results indicate that EBR can solve symbolic regression problems more effectively.Comment: The 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD 2017

    Exhaustive Symbolic Regression

    Full text link
    Symbolic Regression (SR) algorithms learn analytic expressions which both accurately fit data and, unlike traditional machine-learning approaches, are highly interpretable. Conventional SR suffers from two fundamental issues which we address in this work. First, since the number of possible equations grows exponentially with complexity, typical SR methods search the space stochastically and hence do not necessarily find the best function. In many cases, the target problems of SR are sufficiently simple that a brute-force approach is not only feasible, but desirable. Second, the criteria used to select the equation which optimally balances accuracy with simplicity have been variable and poorly motivated. To address these issues we introduce a new method for SR -- Exhaustive Symbolic Regression (ESR) -- which systematically and efficiently considers all possible equations and is therefore guaranteed to find not only the true optimum but also a complete function ranking. Utilising the minimum description length principle, we introduce a principled method for combining these preferences into a single objective statistic. To illustrate the power of ESR we apply it to a catalogue of cosmic chronometers and the Pantheon+ sample of supernovae to learn the Hubble rate as a function of redshift, finding \sim40 functions (out of 5.2 million considered) that fit the data more economically than the Friedmann equation. These low-redshift data therefore do not necessarily prefer a Λ\LambdaCDM expansion history, and traditional SR algorithms that return only the Pareto-front, even if they found this successfully, would not locate Λ\LambdaCDM. We make our code and full equation sets publicly available.Comment: 14 pages, 6 figures, 2 tables. Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligenc
    corecore