84 research outputs found

    Evaluating the Strategy of Ensemble Empirical and Tree-Based Methods in Estimating Reference Evapotranspiration

    Get PDF
    In the present research, three data-driven models including M5P, REP tree, and random forest were used to estimate daily reference evapotranspiration. The abilities of these three models to estimate reference evapotranspiration were studied in single and combined modes. To this end, the daily meteorological data of five synoptic stations in Kerman province in the period from 2000 to 2020 were used. A combination of meteorological variables, using sensitivity analysis versus the reference evapotranspiration values ​​obtained from FAO-Penman-Monteith, was considered as input for each of the mentioned models. Finally, the accuracy of the mentioned models and empirical methods in estimating the evapotranspiration of the reference plant were compared using statistical indicators, and the superior model was selected. The results of validation data showed that the M5P model in the form of individually (RMSE = 0.083 and NS = 0.998 in Bam station) and the weighted averaging in the form of the ensemble (RMSE = 0.155 and NS = 0.994 in Bam and Sirjan stations) in all stations had better results for estimating evapotranspiration rates than other methods. In general, tree models, especially M5P, had better results in estimating daily evapotranspiration than empirical models

    Evaluating Capabilities of Gradient Boosted Tree and Optimized Random Forest Models in Estimating Daily Dew Point Temperature

    Get PDF
    Dew point temperature is very important in various fields including meteorology for weather forecasts. Therefore, it is necessary to provide suitable models to accurately predict the value of this meteorological variable for the practical use of agricultural engineers and nearby stations where it is not possible to measure this temperature. In the present study, we investigated the ability of four data-driven models, including gradient reinforcement tree, M5P tree model, random forest, and random forest optimized with genetic algorithm, in estimating daily dew point temperature. For this purpose, the daily meteorological data of two stations in Ardabil and Parsabad were used in the period of 2014 to 2019. The used meteorological parameters include minimum, maximum, and average temperature, relative humidity, sunshine hour, and wind speed, which were considered input variables for each of the mentioned models in 10 different combinations. The comparison of the results obtained for both stations showed that the M5P-8 model with a root mean square error of 0.54°C and a Wilmot coefficient equal to 0.998 in the Ardabil station and the M5P-6 model with a root mean square error of 0.29°C and Wilmot coefficient equal to 1.00 was introduced as the best models in Parsabad station

    Estimating longitudinal dispersion coefficient in natural streams using empirical models and machine learning algorithms

    Get PDF
    The longitudinal dispersion coefficient (LDC) plays an important role in modeling the transport of pollutants and sediment in natural rivers. As a result of transportation processes, the concentration of pollutants changes along the river. Various studies have been conducted to provide simple equations for estimating LDC. In this study, machine learning methods, namely support vector regression, Gaussian process regression, M5 model tree (M5P) and random forest, and multiple linear regression were examined in predicting the LDC in natural streams. Data sets from 60 rivers around the world with different hydraulic and geometric features were gathered to develop models for LDC estimation. Statistical criteria, including correlation coefficient (CC), root mean squared error (RMSE) and mean absolute error (MAE), were used to scrutinize the models. The LDC values estimated by these models were compared with the corresponding results of common empirical models. The Taylor chart was used to evaluate the models and the results showed that among the machine learning models, M5P had superior performance, with CC of 0.823, RMSE of 454.9 and MAE of 380.9. The model of Sahay and Dutta, with CC of 0.795, RMSE of 460.7 and MAE of 306.1, gave more precise results than the other empirical models. The main advantage of M5P models is their ability to provide practical formulae. In conclusion, the results proved that the developed M5P model with simple formulations was superior to other machine learning models and empirical models; therefore, it can be used as a proper tool for estimating the LDC in rivers

    Evaluation of Random Forest-Genetic Algorithm Hybrid Model in Estimating Daily Solar Radiation

    Get PDF
    Solar energy is the most important source of renewable energy, in other words, the main source of energy on Earth. Therefore, estimating the solar radiation parameter with high accuracy is very important. In this regard, in the present study, meteorological data of 3 meteorological stations of Ardabil province, including Meshginshahr, Germi, and Nir for a period of 2 years (2017-2018) on a daily scale were used. Then, the intensity of daily solar radiation in each of the mentioned stations was estimated using random forest and random forest methods-genetic algorithm. The meteorological variables used included minimum, maximum and average temperature, relative humidity, and wind speed, which in eight different combinations were considered as input data in the model calculations. The obtained results were compared with each other using statistical parameters and the best models were selected. By comparing the results, the models of Nir, Meshginshahr, and Germi stations were ranked from highest to lowest modeling accuracy, respectively; So that the GA-RF-V model in Nir station with the root mean square error of 0.346 MJ/m2d and Kling-Gupta efficiency of 0.687 with the least error was introduced as the best model in this study. Also, the results showed that the genetic algorithm has helped to increase the accuracy of all utilized models

    Usporedna analiza modela za prognozu koncentracija ozona pomoću evolucijskog programiranja gena i višestruke linearne regresije

    Get PDF
    ground-level ozone (O3) has been a serious air pollution problem for several decades and in many metropolitan areas, due to its adverse impact on the human respiratory system. Therefore, to reduce the risks of O3 related damages, developing, maintaining and improving short term ozone forecasting models is needed. This paper presents the results of two prognostic models including gene expression programming (gEP), which is a variant of genetic programming (gP), and multiple linear regression (MLR) to forecast ozone levels in real-time up to 6 hours ahead at four stations in Bilbao, Spain. The inputs to the gEP were meteorological conditions (wind speed and direction, temperature, relative humidity, pressure, solar radiation and thermal gradient), hourly ozone levels and traffic parameters (number of vehicles, occupation percentage and velocity), which were measured in the years of 1993–94. The performances of developed models were compared with observed values and were evaluated using specific performance measurements for the air quality models established in the Model Validation Kit and recommended by the US Environmental Protection Agency. It was found that the gEP in most cases gives superior predictions. Finally it can be concluded on the basis of the results of this study that gene expression programming appears to be a promising technique for the prediction of pollutant concentrations.Zbog štetnog utjecaja na dišni sustav prizemni ozon (O3) već nekoliko desetljeća predstavlja ozbiljan problem u mnogim onečišćenim urbanim područjima. Kako bi se smanjili rizici od oštećenja uzrokovanih ozonom, potrebno je razvijati, održavati i poboljšavati modele kratkoročne prognoze ozona. Ovaj rad prikazuje rezultate dvaju prognostičkih modela, evolucijskog programiranja gena (GEP), koje je varijanta genetskog programiranja (GP), te prognoziranje razina ozona u realnom vremenu višestrukom linearnom regresijom (MLR) do šest sati unaprijed na četiri postaje u Bilbau u Španjolskoj. Ulazni podaci za GEP su meteorološki uvjeti (brzina i smjer vjetra, temperatura, relativna vlažnost zraka, tlak, sunčevo zračenje i termički gradijent), satne razine ozona i parametri prometa (broj vozila, udio vremena zauzetosti ceste vozilima i njihova brzina), koji su izmjereni u razdoblju 1993–1994. Performanse razvijenih modela ocijenjene su usporedbom s mjerenjima te upotrebom alata za validaciju modela koje je predložila američka Agencija za zaštitu okoliša. Utvrđeno je da GEP u većini slučajeva daje bolje prognoze. Na kraju je zaključeno da je evolucijsko programiranje gena obećavajuća tehnika za prognozu koncentracija onečišćujućih tvari

    Modeling pan evaporation using Gaussian Process Regression, K-Nearest Neighbors, Random Forest, and Support Vector Machines: Comparative analysis

    Get PDF
    Evaporation is a very important process; it is one of the most critical factors in agricultural, hydrological, and meteorological studies. Due to the interactions of multiple climatic factors, evaporation is considered as a complex and nonlinear phenomenon to model. Thus, machine learning methods have gained popularity in this realm. In the present study, four machine learning methods of Gaussian Process Regression (GPR), K-Nearest Neighbors (KNN), Random Forest (RF) and Support Vector Regression (SVR) were used to predict the pan evaporation (PE). Meteorological data including PE, temperature (T), relative humidity (RH), wind speed (W), and sunny hours (S) collected from 2011 through 2017. The accuracy of the studied methods was determined using the statistical indices of Root Mean Squared Error (RMSE), correlation coefficient (R) and Mean Absolute Error (MAE). Furthermore, the Taylor charts utilized for evaluating the accuracy of the mentioned models. The results of this study showed that at Gonbad-e Kavus, Gorgan and Bandar Torkman stations, GPR with RMSE of 1.521 mm/day, 1.244 mm/day, and 1.254 mm/day, KNN with RMSE of 1.991 mm/day, 1.775 mm/day, and 1.577 mm/day, RF with RMSE of 1.614 mm/day, 1.337 mm/day, and 1.316 mm/day, and SVR with RMSE of 1.55 mm/day, 1.262 mm/day, and 1.275 mm/day had more appropriate performances in estimating PE values. It was found that GPR for Gonbad-e Kavus Station with input parameters of T, W and S and GPR for Gorgan and Bandar Torkmen stations with input parameters of T, RH, W and S had the most accurate predictions and were proposed for precise estimation of PE. The findings of the current study indicated that the PE values may be accurately estimated with few easily measured meteorological parameters

    Estimation of Daily Reference Evapotranspiration in Humid Climates Using Data-Driven Methods of Gaussian Process Regression, Support Vector Regression and Random Forest

    Get PDF
    Accurate estimation of reference evapotranspiration has great importance in irrigation scheduling. Moreover, the lack of availability of lysimetric data has led researchers to use indirect methods, including data-driven approaches. In the present study, the ability of Gaussian process regression (GPR), support vector regression (SVR) and random forest (RF) data-driven methods was investigated to estimate the evapotranspiration of the reference plant. For this purpose, meteorological data on average temperature, wind speed, relative humidity and sunny hours in the period 2013-18 were collected in nine northern stations of Iran including Astara, Bandar Anzali, Rasht, Ramsar, Nowshahr, Sari, Turkmen port, Gorgan, and Gonbad Kavous. Evapotranspiration calculated using FAO-Penman-Montith method was considered as the target output and four combined scenarios of meteorological parameters were considered to calibrate and validate the studied methods. The accuracy of the mentioned methods was compared using the statistical parameters of correlation coefficient, scatter index, and Wilmott’s coefficient. The results showed that GPR4 model with scatter index in the range of 0.132 to 0.179 in Astara, Bandar Anzali, Rasht, Ramsar, Nowshahr and Sari stations, SVR4 model with dispersion index of 0.116 to 0.120 in Turkmen and Gonbad Kavous stations and the Hargreaves-Samani method with a scatter index of 0.509 at Gorgan station had much more accurate estimates of the evapotranspiration of the reference plant

    Water temperature prediction in a subtropical subalpine lake using soft computing techniques

    Get PDF
    Lake water temperature is one of the key parameters in determining the ecological conditions within a lake, as it influences both chemical and biological processes. Therefore, accurate prediction of water temperature is crucially important for lake management. In this paper, the performance of soft computing techniques including gene expression programming (GEP), which is a variant of genetic programming (GP), adaptive neuro fuzzy inference system (ANFIS) and artificial neural networks (ANNs) to predict hourly water temperature at a buoy station in the Yuan-Yang Lake (YYL) in north-central Taiwan at various measured depths was evaluated. To evaluate the performance of the soft computing techniques, three different statistical indicators were used, including the root mean squared error (RMSE), the mean absolute error (MAE), and the coefficient of correlation (R). Results showed that the GEP had the best performances among other studied methods in the prediction of hourly water temperature at 0, 2 and 3 meter depths below water surface, but there was a different trend in the 1 meter depth below water surface. In this depth, the ANN had better accuracy than the GEP and ANFIS. Despite the error (RMSE value) is smaller in ANN than GEP, there is an upper bound in scatter plot of ANN that imposes a constant value, which is not suitable for predictive purposes. As a conclusion, results from the current study demonstrated that GEP provided moderately reasonable trends for the prediction of hourly water temperature in different depths. ResumenLa temperatura del agua es uno de los parámetros básicos para determinar las condiciones ecológicas de un lago, ya que está influenciada por procesos químicos y biológicos. Además, la exactitud en la predicción de la temperatura del agua es esencial para el manejo del lago. En este artículo se evalúa el desempeño de técnicas de soft computing como la Programación de Expresiones de Genes (PEG), que es una variante de la Programación Genética (PG), el Sistema Neuro-fuzzy de Inferencia Adaptativa (Anfis, en inglés) y las Redes Neuronales Artificiales (RNA) para predecir la temperatura del agua en diferentes niveles de una estación flotante del lago Yuan-Yang (YYL), en el centro-norte de Taiwán. Se utilizaron tres indicadores estadísticos, el Error Cuadrático Medio (ECM), el Error Absoluto Medio (MAE, en inglés) y el Coeficiente de Correlación (R) para evaluar el desempeño de las técnicas de computación. Los resultados muestran que la PEG es más exacta en la predicción de la temperatura del agua entre 0,2 y 3 metros de profundidad. Sin embargo, se evidencia una tendencia diferente a partir del metro de profundidad. A esta distancia de la superficie, las RNA son más exactas que la PEG y el Anfis. Los resultados de este estudio probaron claramente la usabilidad del PEG y las RNA en la predicción de la temperatura del agua a diferentes profundidades
    corecore