7,152 research outputs found

    Elite Bases Regression: A Real-time Algorithm for Symbolic Regression

    Full text link
    Symbolic regression is an important but challenging research topic in data mining. It can detect the underlying mathematical models. Genetic programming (GP) is one of the most popular methods for symbolic regression. However, its convergence speed might be too slow for large scale problems with a large number of variables. This drawback has become a bottleneck in practical applications. In this paper, a new non-evolutionary real-time algorithm for symbolic regression, Elite Bases Regression (EBR), is proposed. EBR generates a set of candidate basis functions coded with parse-matrix in specific mapping rules. Meanwhile, a certain number of elite bases are preserved and updated iteratively according to the correlation coefficients with respect to the target model. The regression model is then spanned by the elite bases. A comparative study between EBR and a recent proposed machine learning method for symbolic regression, Fast Function eXtraction (FFX), are conducted. Numerical results indicate that EBR can solve symbolic regression problems more effectively.Comment: The 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD 2017

    Global solar irradiation prediction using a multi-gene genetic programming approach

    Get PDF
    This is the author accepted manuscript. The final version is available from AIP Publishing via the DOI in this record.In this paper, a nonlinear symbolic regression technique using an evolutionary algorithm known as multi-gene genetic programming (MGGP) is applied for a data-driven modelling between the dependent and the independent variables. The technique is applied for modelling the measured global solar irradiation and validated through numerical simulations. The proposed modelling technique shows improved results over the fuzzy logic and artificial neural network (ANN) based approaches as attempted by contemporary researchers. The method proposed here results in nonlinear analytical expressions, unlike those with neural networks which is essentially a black box modelling approach. This additional flexibility is an advantage from the modelling perspective and helps to discern the important variables which affect the prediction. Due to the evolutionary nature of the algorithm, it is able to get out of local minima and converge to a global optimum unlike the back-propagation (BP) algorithm used for training neural networks. This results in a better percentage fit than the ones obtained using neural networks by contemporary researchers. Also a hold-out cross validation is done on the obtained genetic programming (GP) results which show that the results generalize well to new data and do not over-fit the training samples. The multi-gene GP results are compared with those, obtained using its single-gene version and also the same with four classical regression models in order to show the effectiveness of the adopted approach

    Symbolic regression-based genetic approximations of the Colebrook equation for flow friction

    Get PDF
    Widely used in hydraulics, the Colebrook equation for flow friction relates implicitly to the input parameters; the Reynolds number, Re and the relative roughness of an inner pipe surface, epsilon/D with an unknown output parameter; the flow friction factor, ; = f (, Re, epsilon/D). In this paper, a few explicit approximations to the Colebrook equation; approximate to f (Re, epsilon/D), are generated using the ability of artificial intelligence to make inner patterns to connect input and output parameters in an explicit way not knowing their nature or the physical law that connects them, but only knowing raw numbers, {Re, epsilon/D}{}. The fact that the used genetic programming tool does not know the structure of the Colebrook equation, which is based on computationally expensive logarithmic law, is used to obtain a better structure of the approximations, which is less demanding for calculation but also enough accurate. All generated approximations have low computational cost because they contain a limited number of logarithmic forms used for normalization of input parameters or for acceleration, but they are also sufficiently accurate. The relative error regarding the friction factor , in in the best case is up to 0.13% with only two logarithmic forms used. As the second logarithm can be accurately approximated by the Pade approximation, practically the same error is obtained also using only one logarithm.Web of Science109art. no. 117

    RANS Turbulence Model Development using CFD-Driven Machine Learning

    Full text link
    This paper presents a novel CFD-driven machine learning framework to develop Reynolds-averaged Navier-Stokes (RANS) models. The CFD-driven training is an extension of the gene expression programming method (Weatheritt and Sandberg, 2016), but crucially the fitness of candidate models is now evaluated by running RANS calculations in an integrated way, rather than using an algebraic function. Unlike other data-driven methods that fit the Reynolds stresses of trained models to high-fidelity data, the cost function for the CFD-driven training can be defined based on any flow feature from the CFD results. This extends the applicability of the method especially when the training data is limited. Furthermore, the resulting model, which is the one providing the most accurate CFD results at the end of the training, inherently shows good performance in RANS calculations. To demonstrate the potential of this new method, the CFD-driven machine learning approach is applied to model development for wake mixing in turbomachines. A new model is trained based on a high-pressure turbine case and then tested for three additional cases, all representative of modern turbine nozzles. Despite the geometric configurations and operating conditions being different among the cases, the predicted wake mixing profiles are significantly improved in all of these a posteriori tests. Moreover, the model equation is explicitly given and available for analysis, thus it could be deduced that the enhanced wake prediction is predominantly due to the extra diffusion introduced by the CFD-driven model.Comment: Accepted by Journal of Computational Physic

    Data-driven PDE discovery with evolutionary approach

    Full text link
    The data-driven models allow one to define the model structure in cases when a priori information is not sufficient to build other types of models. The possible way to obtain physical interpretation is the data-driven differential equation discovery techniques. The existing methods of PDE (partial derivative equations) discovery are bound with the sparse regression. However, sparse regression is restricting the resulting model form, since the terms for PDE are defined before regression. The evolutionary approach described in the article has a symbolic regression as the background instead and thus has fewer restrictions on the PDE form. The evolutionary method of PDE discovery (EPDE) is described and tested on several canonical PDEs. The question of robustness is examined on a noised data example

    Tracking economic growth by evolving expectations via genetic programming: A two-step approach

    Get PDF
    The main objective of this study is to present a two-step approach to generate estimates of economic growth based on agents’ expectations from tendency surveys. First, we design a genetic programming experiment to derive mathematical functional forms that approximate the target variable by combining survey data on expectations about different economic variables. We use evolutionary algorithms to estimate a symbolic regression that links survey-based expectations to a quantitative variable used as a yardstick (economic growth). In a second step, this set of empirically-generated proxies of economic growth are linearly combined to track the evolution of GDP. To evaluate the forecasting performance of the generated estimates of GDP, we use them to assess the impact of the 2008 financial crisis on the accuracy of agents' expectations about the evolution of the economic activity in 28 countries of the OECD. While in most economies we find an improvement in the capacity of agents' to anticipate the evolution of GDP after the crisis, predictive accuracy worsens in relation to the period prior to the crisis. The most accurate GDP forecasts are obtained for Sweden, Austria and Finland

    Evolutionary computation for macroeconomic forecasting

    Get PDF
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10614-017-9767-4The main objective of this study is twofold. First, we propose an empirical modelling approach based on genetic programming to forecast economic growth by means of survey data on expectations. We use evolutionary algorithms to estimate a symbolic regression that links survey-based expectations to a quantitative variable used as a yardstick, deriving mathematical functional forms that approximate the target variable. The set of empirically-generated proxies of economic growth are used as building blocks to forecast the evolution of GDP. Second, we use these estimates of GDP to assess the impact of the 2008 financial crisis on the accuracy of agents’ expectations about the evolution of the economic activity in four Scandinavian economies. While we find an improvement in the capacity of agents’ to anticipate economic growth after the crisis, predictive accuracy worsens in relation to the period prior to the crisis. The most accurate GDP forecasts are obtained for Sweden.Peer ReviewedPostprint (author's final draft
    • …
    corecore