7,152 research outputs found
Elite Bases Regression: A Real-time Algorithm for Symbolic Regression
Symbolic regression is an important but challenging research topic in data
mining. It can detect the underlying mathematical models. Genetic programming
(GP) is one of the most popular methods for symbolic regression. However, its
convergence speed might be too slow for large scale problems with a large
number of variables. This drawback has become a bottleneck in practical
applications. In this paper, a new non-evolutionary real-time algorithm for
symbolic regression, Elite Bases Regression (EBR), is proposed. EBR generates a
set of candidate basis functions coded with parse-matrix in specific mapping
rules. Meanwhile, a certain number of elite bases are preserved and updated
iteratively according to the correlation coefficients with respect to the
target model. The regression model is then spanned by the elite bases. A
comparative study between EBR and a recent proposed machine learning method for
symbolic regression, Fast Function eXtraction (FFX), are conducted. Numerical
results indicate that EBR can solve symbolic regression problems more
effectively.Comment: The 2017 13th International Conference on Natural Computation, Fuzzy
Systems and Knowledge Discovery (ICNC-FSKD 2017
Global solar irradiation prediction using a multi-gene genetic programming approach
This is the author accepted manuscript. The final version is available from AIP Publishing via the DOI in this record.In this paper, a nonlinear symbolic regression technique using an evolutionary algorithm known as multi-gene genetic programming (MGGP) is applied for a data-driven modelling between the dependent and the independent variables. The technique is applied for modelling the measured global solar irradiation and validated through numerical simulations. The proposed modelling technique shows improved results over the fuzzy logic and artificial neural network (ANN) based approaches as attempted by contemporary researchers. The method proposed here results in nonlinear analytical expressions, unlike those with neural networks which is essentially a black box modelling approach. This additional flexibility is an advantage from the modelling perspective and helps to discern the important variables which affect the prediction. Due to the evolutionary nature of the algorithm, it is able to get out of local minima and converge to a global optimum unlike the back-propagation (BP) algorithm used for training neural networks. This results in a better percentage fit than the ones obtained using neural networks by contemporary researchers. Also a hold-out cross validation is done on the obtained genetic programming (GP) results which show that the results generalize well to new data and do not over-fit the training samples. The multi-gene GP results are compared with those, obtained using its single-gene version and also the same with four classical regression models in order to show the effectiveness of the adopted approach
Symbolic regression-based genetic approximations of the Colebrook equation for flow friction
Widely used in hydraulics, the Colebrook equation for flow friction relates implicitly to the input parameters; the Reynolds number, Re and the relative roughness of an inner pipe surface, epsilon/D with an unknown output parameter; the flow friction factor, ; = f (, Re, epsilon/D). In this paper, a few explicit approximations to the Colebrook equation; approximate to f (Re, epsilon/D), are generated using the ability of artificial intelligence to make inner patterns to connect input and output parameters in an explicit way not knowing their nature or the physical law that connects them, but only knowing raw numbers, {Re, epsilon/D}{}. The fact that the used genetic programming tool does not know the structure of the Colebrook equation, which is based on computationally expensive logarithmic law, is used to obtain a better structure of the approximations, which is less demanding for calculation but also enough accurate. All generated approximations have low computational cost because they contain a limited number of logarithmic forms used for normalization of input parameters or for acceleration, but they are also sufficiently accurate. The relative error regarding the friction factor , in in the best case is up to 0.13% with only two logarithmic forms used. As the second logarithm can be accurately approximated by the Pade approximation, practically the same error is obtained also using only one logarithm.Web of Science109art. no. 117
RANS Turbulence Model Development using CFD-Driven Machine Learning
This paper presents a novel CFD-driven machine learning framework to develop
Reynolds-averaged Navier-Stokes (RANS) models. The CFD-driven training is an
extension of the gene expression programming method (Weatheritt and Sandberg,
2016), but crucially the fitness of candidate models is now evaluated by
running RANS calculations in an integrated way, rather than using an algebraic
function. Unlike other data-driven methods that fit the Reynolds stresses of
trained models to high-fidelity data, the cost function for the CFD-driven
training can be defined based on any flow feature from the CFD results. This
extends the applicability of the method especially when the training data is
limited. Furthermore, the resulting model, which is the one providing the most
accurate CFD results at the end of the training, inherently shows good
performance in RANS calculations. To demonstrate the potential of this new
method, the CFD-driven machine learning approach is applied to model
development for wake mixing in turbomachines. A new model is trained based on a
high-pressure turbine case and then tested for three additional cases, all
representative of modern turbine nozzles. Despite the geometric configurations
and operating conditions being different among the cases, the predicted wake
mixing profiles are significantly improved in all of these a posteriori tests.
Moreover, the model equation is explicitly given and available for analysis,
thus it could be deduced that the enhanced wake prediction is predominantly due
to the extra diffusion introduced by the CFD-driven model.Comment: Accepted by Journal of Computational Physic
Data-driven PDE discovery with evolutionary approach
The data-driven models allow one to define the model structure in cases when
a priori information is not sufficient to build other types of models. The
possible way to obtain physical interpretation is the data-driven differential
equation discovery techniques. The existing methods of PDE (partial derivative
equations) discovery are bound with the sparse regression. However, sparse
regression is restricting the resulting model form, since the terms for PDE are
defined before regression. The evolutionary approach described in the article
has a symbolic regression as the background instead and thus has fewer
restrictions on the PDE form. The evolutionary method of PDE discovery (EPDE)
is described and tested on several canonical PDEs. The question of robustness
is examined on a noised data example
Tracking economic growth by evolving expectations via genetic programming: A two-step approach
The main objective of this study is to present a two-step approach to generate estimates of economic growth based on agents’ expectations from tendency surveys. First, we design a genetic programming experiment to derive mathematical functional forms that approximate the target variable by combining survey data on expectations about different economic variables. We use evolutionary algorithms to estimate a symbolic regression that links survey-based expectations to a quantitative variable used as a yardstick (economic growth). In a second step, this set of empirically-generated proxies of economic growth are linearly combined to track the evolution of GDP. To evaluate the forecasting performance of the generated estimates of GDP, we use them to assess the impact of the 2008 financial crisis on the accuracy of agents' expectations about the evolution of the economic activity in 28 countries of the OECD. While in most economies we find an improvement in the capacity of agents' to anticipate the evolution of GDP after the crisis, predictive accuracy worsens in relation to the period prior to the crisis. The most accurate GDP forecasts are obtained for Sweden, Austria and Finland
Evolutionary computation for macroeconomic forecasting
The final publication is available at Springer via http://dx.doi.org/10.1007/s10614-017-9767-4The main objective of this study is twofold. First, we propose an empirical modelling approach based on genetic programming to forecast economic growth by means of survey data on expectations. We use evolutionary algorithms to estimate a symbolic regression that links survey-based expectations to a quantitative variable used as a yardstick, deriving mathematical functional forms that approximate the target variable. The set of empirically-generated proxies of economic growth are used as building blocks to forecast the evolution of GDP. Second, we use these estimates of GDP to assess the impact of the 2008 financial crisis on the accuracy of agents’ expectations about the evolution of the economic activity in four Scandinavian economies. While we find an improvement in the capacity of agents’ to anticipate economic growth after the crisis, predictive accuracy worsens in relation to the period prior to the crisis. The most accurate GDP forecasts are obtained for Sweden.Peer ReviewedPostprint (author's final draft
- …