260 research outputs found

    Imputation of Rainfall Data Using the Sine Cosine Function Fitting Neural Network

    Get PDF
    Missing rainfall data have reduced the quality of hydrological data analysis because they are the essential input for hydrological modeling. Much research has focused on rainfall data imputation. However, the compatibility of precipitation (rainfall) and non-precipitation (meteorology) as input data has received less attention. First, we propose a novel pre-processing mechanism for non-precipitation data by using principal component analysis (PCA). Before the imputation, PCA is used to extract the most relevant features from the meteorological data. The final output of the PCA is combined with the rainfall data from the nearest neighbor gauging stations and then used as the input to the neural network for missing data imputation. Second, a sine cosine algorithm is presented to optimize neural network for infilling the missing rainfall data. The proposed sine cosine function fitting neural network (SC-FITNET) was compared with the sine cosine feedforward neural network (SCFFNN), feedforward neural network (FFNN) and long short-term memory (LSTM) approaches. The results showed that the proposed SC-FITNET outperformed LSTM, SC-FFNN and FFNN imputation in terms of mean absolute error (MAE), root mean square error (RMSE) and correlation coefficient (R), with an average accuracy of 90.9%. This study revealed that as the percentage of missingness increased, the precision of the four imputation methods reduced. In addition, this study also revealed that PCA has potential in pre-processing meteorological data into an understandable format for the missing data imputation

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    Metaheuristic Algorithms to Enhance the Performance of a Feedforward Neural Network in Addressing Missing Hourly Precipitation

    Get PDF
    This research study investigates the implementation of three metaheuristic algorithms, namely, Grey wolf optimizer (GWO), Multi-verse optimizer (MVO), and Moth-flame optimisation (MFO), for coupling with a feedforward neural network (FNN) in addressing missing hourly rainfall observations, while overcoming the limitation of conventional training algorithm of artificial neural network that often traps in local optima. The proposed GWOFNN, MVOFNN, and MFOFNN were compared against the conventional Levenberg Marquardt Feedforward Neural Network (LMFNN) in addressing the artificially introduced missing hourly rainfall records of Kuching Third Mile Station. The findings show that the proposed approaches are superior to LMFNN in predicting the 20% hourly rainfall observations in terms of mean absolute error (MAE) and coefficient of correlation (r). The best performance ANN model is GWOFNN, followed with MVOFNN, MFOFNN and lastly LMFNN

    Metaheuristic Algorithms to Enhance the Performance of a Feedforward Neural Network in Addressing Missing Hourly Precipitation

    Get PDF
    This research study investigates the implementation of three metaheuristic algorithms, namely, Grey wolf optimizer (GWO), Multi-verse optimizer (MVO), and Moth-flame optimisation (MFO), for coupling with a feedforward neural network (FNN) in addressing missing hourly rainfall observations, while overcoming the limitation of conventional training algorithm of artificial neural network that often traps in local optima. The proposed GWOFNN, MVOFNN, and MFOFNN were compared against the conventional Levenberg Marquardt Feedforward Neural Network (LMFNN) in addressing the artificially introduced missing hourly rainfall records of Kuching Third Mile Station. The findings show that the proposed approaches are superior to LMFNN in predicting the 20% hourly rainfall observations in terms of mean absolute error (MAE) and coefficient of correlation (r). The best performance ANN model is GWOFNN, followed with MVOFNN, MFOFNN and lastly LMFNN

    Imputation of rainfall data using the sine cosine function fitting neural network

    Get PDF
    Missing rainfall data have reduced the quality of hydrological data analysis because they are the essential input for hydrological modeling. Much research has focused on rainfall data imputation. However, the compatibility of precipitation (rainfall) and non-precipitation (meteorology) as input data has received less attention. First, we propose a novel pre-processing mechanism for non-precipitation data by using principal component analysis (PCA). Before the imputation, PCA is used to extract the most relevant features from the meteorological data. The final output of the PCA is combined with the rainfall data from the nearest neighbor gauging stations and then used as the input to the neural network for missing data imputation. Second, a sine cosine algorithm is presented to optimize neural network for infilling the missing rainfall data. The proposed sine cosine function fitting neural network (SC-FITNET) was compared with the sine cosine feedforward neural network (SC-FFNN), feedforward neural network (FFNN) and long short-term memory (LSTM) approaches. The results showed that the proposed SC-FITNET outperformed LSTM, SC-FFNN and FFNN imputation in terms of mean absolute error (MAE), root mean square error (RMSE) and correlation coefficient (R), with an average accuracy of 90.9%. This study revealed that as the percentage of missingness increased, the precision of the four imputation methods reduced. In addition, this study also revealed that PCA has potential in pre-processing meteorological data into an understandable format for the missing data imputation

    A framework for cloud cover prediction using machine learning with data imputation

    Get PDF
    The climatic conditions of a region are affected by multiple factors. These factors are dew point temperature, humidity, wind speed, and wind direction. These factors are closely related to each other. In this paper, the correlation between these factors is studied and an approach has been proposed for data imputation. The idea is to utilize all these features to obtain the prediction of the total cloud cover of a region instead of removing the missing values. Total cloud cover prediction is significant because it affects the agriculture, aviation, and energy sectors. Based on the imputed data which is obtained as the output of the proposed method, a machine learning-based model is proposed. The foundation of this proposed model is the bi-directional approach of the long short-term memory (LSTM) model. It is trained for 8 stations for two different approaches. In the first approach, 80% of the entire data is considered for training and 20% of the data is considered for testing. In the second approach, 90% of the entire data is accounted for training and 10% of the data is accounted for testing. It is observed that in the first approach, the model gives less error for prediction

    PERBANDINGAN IMPUTASI DAN PARAMETER SUPPORT VECTOR REGRESSION UNTUK PERAMALAN CUACA

    Get PDF
    Curah hujan adalah informasi penting di bidang transportasi, pertanian, industri dll. Dengan mengetahui informasi curah hujan, tindakan dapat diambil secara tepat di beberapa bidang tersebut. sehingga tidak ada kerugian karena kesalahan dalam informasi curah hujan. Makalah ini bertujuan untuk menemukan metode yang sesuai dalam peramalan curah hujan yang terkait dengan metode pemrosesan data imputasi dan nilai parameter dalam Support Vector Regression (SVR). Hasil percobaan menunjukkan bahwa metode preprocessing data imputasi terbaik diperoleh untuk digunakan ke dalam SVR berdasarkan nilai Mean Squared Error (MSE) dan Mean Absolute Error (MAE). Berdasarkan hasil MSE, k-nearest neighbor adalah metode terbaik yang digunakan untuk preprocessing data imputasi. Data preprocessing menghasilkan eksperimen pada SVR Polinomial dengan parameter C 1000, toleransi 0,001, epsilon 0,01 dan iterasi tak terbatas. Di sisi lain, hasil MAE menunjukkan bahwa Artificial Neural Network (ANN) adalah metode terbaik dalam imputasi data preprocessing. ANN dengan radial basis function kernel, gamma 0,001, C 1000, toleransi 0,001 dan iterasi tanpa batas. JST diuji pada RBF SVR dengan gamma 0,001, C 1000, toleransi 0,001 dan iterasi tak terbatas
    corecore