525 research outputs found

    A typology of different development and testing options for symbolic regression modelling of measured and calculated datasets

    Get PDF
    AbstractData-driven modelling is used to develop two alternative types of predictive environmental model: a simulator, a model of a real-world process developed from either a conceptual understanding of physical relations and/or using measured records, and an emulator, an imitator of some other model developed on predicted outputs calculated by that source model. A simple four-way typology called Emulation Simulation Typology (EST) is proposed that distinguishes between (i) model type and (ii) different uses of model development period and model test period datasets. To address the question of to what extent simulator and emulator solutions might be considered interchangeable i.e. provide similar levels of output accuracy when tested on data different from that used in their development, a pair of counterpart pan evaporation models was created using symbolic regression. Each model type delivered similar levels of predictive skill to that other of published solutions. Input–output sensitivity analysis of the two different model types likewise confirmed two very similar underlying response functions. This study demonstrates that the type and quality of data on which a model is tested, has a greater influence on model accuracy assessment, than the type and quality of data on which a model is developed, providing that the development record is sufficiently representative of the conceptual underpinnings of the system being examined. Thus, previously reported substantial disparities occurring in goodness-of-fit statistics for pan evaporation models are most likely explained by the use of either measured or calculated data to test particular models, where lower scores do not necessarily represent major deficiencies in the solution itself

    Water temperature prediction in a subtropical subalpine lake using soft computing techniques

    Get PDF
    Lake water temperature is one of the key parameters in determining the ecological conditions within a lake, as it influences both chemical and biological processes. Therefore, accurate prediction of water temperature is crucially important for lake management. In this paper, the performance of soft computing techniques including gene expression programming (GEP), which is a variant of genetic programming (GP), adaptive neuro fuzzy inference system (ANFIS) and artificial neural networks (ANNs) to predict hourly water temperature at a buoy station in the Yuan-Yang Lake (YYL) in north-central Taiwan at various measured depths was evaluated. To evaluate the performance of the soft computing techniques, three different statistical indicators were used, including the root mean squared error (RMSE), the mean absolute error (MAE), and the coefficient of correlation (R). Results showed that the GEP had the best performances among other studied methods in the prediction of hourly water temperature at 0, 2 and 3 meter depths below water surface, but there was a different trend in the 1 meter depth below water surface. In this depth, the ANN had better accuracy than the GEP and ANFIS. Despite the error (RMSE value) is smaller in ANN than GEP, there is an upper bound in scatter plot of ANN that imposes a constant value, which is not suitable for predictive purposes. As a conclusion, results from the current study demonstrated that GEP provided moderately reasonable trends for the prediction of hourly water temperature in different depths. ResumenLa temperatura del agua es uno de los parámetros básicos para determinar las condiciones ecológicas de un lago, ya que está influenciada por procesos químicos y biológicos. Además, la exactitud en la predicción de la temperatura del agua es esencial para el manejo del lago. En este artículo se evalúa el desempeño de técnicas de soft computing como la Programación de Expresiones de Genes (PEG), que es una variante de la Programación Genética (PG), el Sistema Neuro-fuzzy de Inferencia Adaptativa (Anfis, en inglés) y las Redes Neuronales Artificiales (RNA) para predecir la temperatura del agua en diferentes niveles de una estación flotante del lago Yuan-Yang (YYL), en el centro-norte de Taiwán. Se utilizaron tres indicadores estadísticos, el Error Cuadrático Medio (ECM), el Error Absoluto Medio (MAE, en inglés) y el Coeficiente de Correlación (R) para evaluar el desempeño de las técnicas de computación. Los resultados muestran que la PEG es más exacta en la predicción de la temperatura del agua entre 0,2 y 3 metros de profundidad. Sin embargo, se evidencia una tendencia diferente a partir del metro de profundidad. A esta distancia de la superficie, las RNA son más exactas que la PEG y el Anfis. Los resultados de este estudio probaron claramente la usabilidad del PEG y las RNA en la predicción de la temperatura del agua a diferentes profundidades

    Modeling pan evaporation using Gaussian Process Regression, K-Nearest Neighbors, Random Forest, and Support Vector Machines: Comparative analysis

    Get PDF
    Evaporation is a very important process; it is one of the most critical factors in agricultural, hydrological, and meteorological studies. Due to the interactions of multiple climatic factors, evaporation is considered as a complex and nonlinear phenomenon to model. Thus, machine learning methods have gained popularity in this realm. In the present study, four machine learning methods of Gaussian Process Regression (GPR), K-Nearest Neighbors (KNN), Random Forest (RF) and Support Vector Regression (SVR) were used to predict the pan evaporation (PE). Meteorological data including PE, temperature (T), relative humidity (RH), wind speed (W), and sunny hours (S) collected from 2011 through 2017. The accuracy of the studied methods was determined using the statistical indices of Root Mean Squared Error (RMSE), correlation coefficient (R) and Mean Absolute Error (MAE). Furthermore, the Taylor charts utilized for evaluating the accuracy of the mentioned models. The results of this study showed that at Gonbad-e Kavus, Gorgan and Bandar Torkman stations, GPR with RMSE of 1.521 mm/day, 1.244 mm/day, and 1.254 mm/day, KNN with RMSE of 1.991 mm/day, 1.775 mm/day, and 1.577 mm/day, RF with RMSE of 1.614 mm/day, 1.337 mm/day, and 1.316 mm/day, and SVR with RMSE of 1.55 mm/day, 1.262 mm/day, and 1.275 mm/day had more appropriate performances in estimating PE values. It was found that GPR for Gonbad-e Kavus Station with input parameters of T, W and S and GPR for Gorgan and Bandar Torkmen stations with input parameters of T, RH, W and S had the most accurate predictions and were proposed for precise estimation of PE. The findings of the current study indicated that the PE values may be accurately estimated with few easily measured meteorological parameters

    Prediction of daily water level using new hybridized GS-GMDH and ANFIS-FCM models

    Get PDF
    Accurate prediction of water level (WL) is essential for the optimal management of different water resource projects. The development of a reliable model for WL prediction remains a challenging task in water resources management. In this study, novel hybrid models, namely, Generalized Structure�Group Method of Data Handling (GS-GMDH) and Adaptive Neuro-Fuzzy Inference System with Fuzzy C-Means (ANFIS-FCM) were proposed to predict the daily WL at Telom and Bertam stations located in Cameron Highlands of Malaysia. Different percentage ratio for data division i.e. 50%–50% (scenario�1), 60%–40% (scenario-2), and 70%–30% (scenario-3) were adopted for training and testing of these models. To show the efficiency of the proposed hybrid models, their results were compared with the standalone models that include the Gene Expression Programming (GEP) and Group Method of Data Handling (GMDH). The results of the investigation revealed that the hybrid GS-GMDH and ANFIS-FCM models outperformed the standalone GEP and GMDH models for the prediction of daily WL at both study sites. In addition, the results indicate the best performance for WL prediction was obtained in scenario-3 (70%–30%). In summary, the results highlight the better suitability and supremacy of the proposed hybrid GS-GMDH and ANFIS-FCM models in daily WL prediction, and can, serve as robust and reliable predictive tools for the study regio

    Global solar irradiation prediction using a multi-gene genetic programming approach

    Get PDF
    This is the author accepted manuscript. The final version is available from AIP Publishing via the DOI in this record.In this paper, a nonlinear symbolic regression technique using an evolutionary algorithm known as multi-gene genetic programming (MGGP) is applied for a data-driven modelling between the dependent and the independent variables. The technique is applied for modelling the measured global solar irradiation and validated through numerical simulations. The proposed modelling technique shows improved results over the fuzzy logic and artificial neural network (ANN) based approaches as attempted by contemporary researchers. The method proposed here results in nonlinear analytical expressions, unlike those with neural networks which is essentially a black box modelling approach. This additional flexibility is an advantage from the modelling perspective and helps to discern the important variables which affect the prediction. Due to the evolutionary nature of the algorithm, it is able to get out of local minima and converge to a global optimum unlike the back-propagation (BP) algorithm used for training neural networks. This results in a better percentage fit than the ones obtained using neural networks by contemporary researchers. Also a hold-out cross validation is done on the obtained genetic programming (GP) results which show that the results generalize well to new data and do not over-fit the training samples. The multi-gene GP results are compared with those, obtained using its single-gene version and also the same with four classical regression models in order to show the effectiveness of the adopted approach

    Estimation of Reference Evapotranspiration using Climatic Data.

    Get PDF
    M.S. Thesis. University of Hawaiʻi at Mānoa 2017

    Comparison of predictions of daily evapotranspiration based on climate variables using different data mining and empirical methods in various climates of Iran

    Get PDF
    To accurately manage water resources, a precise prediction of reference evapotranspiration (ETref) is necessary. The best empirical equations to determine ETref are usually the temperature-based Baier and Robertson (BARO), the radiation-based Jensen and Haise (JEHA), and the mass transfer-based Penman (PENM) ones. Two machine learning (ML) models were used: least squares support vector regression (LSSVR) and ANFIS optimized using the particle swarm optimization algorithm (ANFPSO). These models were applied to the daily ETref at 100 synoptic stations for different climates of Iran. Performance of studied models was evaluated by the correlation coefficient (R), coefficient of determination (R2), mean absolute error (MAE), root mean square error (RMSE), scatter index (SI) and the Nash-Sutcliffe efficiency (NSE). The combination-based ML models (LSSVR4 and ANFPSO4) had the lowest error (RMSE = 0.34–2.85 mm d−1) and the best correlation (R = 0.66–0.99). The temperature-based empirical relationships had more precision than the radiation- and mass transfer-based empirical equations

    Comparison of DEEP-LSTM and MLP Models in Estimation of Evaporation Pan for Arid Regions

    Get PDF
    The importance of evaporation estimation in water resources and agricultural studies is undeniable. Evaporation pans (EP) are used as an indicator to determine the evaporation of lakes and reservoirs around the world due to the ease of interpreting its data. The purpose of this study is to evaluate the efficiency of the Long- Short Term Memory (LSTM) model to estimate evaporation from a pan and compare it with the Multilayer Perceptron (MLP) model in Semnan and Garmsar. For this purpose, daily meteorological data recorded between 2000 and 2018 (19 consecutive years) in Semnan and Garmsar synoptic stations were used. Minimum and maximum air temperature (Tmax, Tmin), wind speed (WS), sunshine hours (SH), air pressure (PA), relative humidity (RH) were selected as input data and evaporation data from the pan (EP) was considered as the output of the case. Also, in modeling both networks in the input section, 4 different scenarios were used. The two studied models were evaluated by the evaluation criteria of coefficient of determination (R2), root mean square error (RMSE) and mean absolute error (MAE). The results showed that among the studied scenarios, the fourth scenario (considering all input parameters) had the highest R2 and the lowest RMSE and MAE. In general, the two models performed well in predicting the rate of evaporation. Also, in both stations, the LSTM model had more R2 and less RMSE and MAE than the MLP model. The values of R2, RMSE and MAE for the best DEEP-LSTM model (LSTM4) for Semnan city were 0.9451, 1.8345 and 0.5437 and for Garmsar city 0.9204, 1.8323 and 1.3531 respectively
    corecore