Search CORE

1,627 research outputs found

Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain

Author: Bellido-Jiménez Juan Antonio
Estévez Gualda Javier
García-Marín A.P.
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

The presence of missing data in hydrometeorological datasets is a common problem, usually due to sensor malfunction, deficiencies in records storage and transmission, or other recovery procedures issues. These missing values are the primary source of problems when analyzing and modeling their spatial and temporal variability. Thus, accurate gap-filling techniques for rainfall time series are necessary to have complete datasets, which is crucial in studying climate change evolution. In this work, several machine learning models have been assessed to gap-fill rainfall data, using different approaches and locations in the semiarid region of Andalusia (Southern Spain). Based on the obtained results, the use of neighbor data, located within a 50 km radius, highly outperformed the rest of the assessed approaches, with RMSE (root mean squared error) values up to 1.246 mm/day, MBE (mean bias error) values up to −0.001 mm/day, and R2 values up to 0.898. Besides, inland area results outperformed coastal area in most locations, arising the efficiency effects based on the distance to the sea (up to an improvement of 63.89% in terms of RMSE). Finally, machine learning (ML) models (especially MLP (multilayer perceptron)) notably outperformed simple linear regression estimations in the coastal sites, whereas in inland locations, the improvements were not such significant

Multidisciplinary Digital Publishing Institute

Repositorio Institucional de la Universidad de Córdoba

Directory of Open Access Journals

A novel clustering algorithm based on mathematical morphology for wind power generation prediction

Author: Dong Lei
Hao Ying
Liang Jun
Liao Xiaozhong
Wang Bo
Wang Lijie
Publication venue: 'Elsevier BV'
Publication date: 01/06/2019
Field of study

Wind power has the characteristic of daily similarity. Furthermore, days with wind power variation trends reflect similar meteorological phenomena. Therefore, wind power prediction accuracy can be improved and computational complexity during model simulation reduced by choosing the historical days whose numerical weather prediction information is similar to that of the predicted day as training samples. This paper proposes a new prediction model based on a novel dilation and erosion (DE) clustering algorithm for wind power generation. In the proposed model, the days with similar numerical weather prediction (NWP) information to the predicted day are selected via the proposed DE clustering algorithm, which is based on the basic operations in mathematical morphology. And the proposed DE clustering algorithm can cluster automatically without supervision. Case study conducted using data from Yilan wind farm in northeast China indicate that the performance of the new generalized regression neural network (GRNN) prediction model based on the proposed DE clustering algorithm (DE clustering-GRNN) is better than that of the DPK-medoids clustering-GRNN, the K-means clustering-GRNN, and the AM-GRNN in terms of day-ahead wind power prediction. Further, the proposed DE clustering-GRNN model is adaptive

Online Research @ Cardiff

Hydrological modeling based on the KNN algorithm: An application for the forecast of daily flows of the Ramis river, Peru

Author: Huamani Juan Carlos
Lujano Laura Efrain
Lujano Apolinario
Lujano Rene
Publication venue: 'Instituto Mexicano de Tecnologia del Agua'
Publication date: 01/01/2023
Field of study

The forecast of river stream flows is of significant importance for the development of early warning systems. Artificial intelligence algorithms have proven to be an effective tool in hydrological modeling data-driven, since they allow establishing relationships between input and output data of a watershed and thus make decisions data-driven. This article investigates the applicability of the k-nearest neighbor (KNN) algorithm for forecasting the mean daily flows of the Ramis river, at the Ramis hydrometric station. As input to the KNN machine learning algorithm, we used a data set of mean basin precipitation and mean daily flow from hydrometeorological stations with various lags. The performance of the KNN algorithm was quantitatively evaluated with hydrological ability metrics such as mean absolute percentage error (MAPE), anomaly correlation coefficient (ACC), Nash-Sutcliffe efficiency (NSE), Kling-Gupta efficiency (KGE') and the spectral angle (SA). The results for forecasting the flows of the Ramis river with the k-nearest neighbor machine learning algorithm reached high levels of reliability with flow lags of one and two days and precipitation with three days. The algorithm used is simple but robust to make short-term flow forecasts and can be integrated as an alternative to strengthen the daily hydrological forecast of the Ramis river

Repositorio Institucional - Servicio Nacional de Meteorología e Hidrología del Perú

단기 기상 예측을 위한 기계 학습 기법

Author: 문승현
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :공과대학 전기·컴퓨터공학부,2020. 2. 문병로.Machine learning is the study of artificial intelligence that automatically generates programs from data. It is distinguished from conventional programming, which needs to write a series of specific instructions directly to perform a specific task. Machine learning is preferred when it is difficult to develop an effective algorithm for given tasks such as natural language processing or computer vision. Traditionally, numerical weather prediction (NWP) has been a prevailing method to forecast weather. The NWP predicts future weather through simulations using mathematical models based on current weather conditions. However, the NWP has some problems: errors in the current observations are amplified as simulation proceeds; spatial and temporal resolutions are limited; and there is a spin-up problem, in which initial forecasts are unreliable while the model attempts to stabilize. An alternative approach is needed to complement NWP on small spatial and temporal scales. Therefore, we propose short-range weather forecast models that employ machine learning techniques appropriate for a given forecasting problem. First, we introduce dimensionality reduction techniques to construct effective forecasting models with high-dimensional input data. As the dimension of input data increases, the amount of time or memory required by machine learning techniques can increase significantly. This phenomenon is referred to as the curse of dimensionality, which can be ialleviated by dimensionality reduction techniques. Dimensionality reduction techniques include feature selection and feature extraction. Feature selection selects a subset of input variables, while feature extraction projects high-dimensional features to a lower dimensional space. The details of correlation-based feature selection, and principal component analysis (PCA) which is a representative feature extraction are provided. We then propose a scheme for precipitation type forecast as an example of meteorological forecasting using dimensionality reduction techniques. This scheme takes 93 meteorological variables as input, and uses feature selection to assemble an effective subset of input variables. Multinomial logistic regression is used to classify precipitation as rain, snow, or sleet. This scheme achieved predictions which are 13 % more accurate than the original forecasts, and feature selection improved the accuracy to a statistically significant level. Second, we present sampling techniques that help predict rare meteorological events. Machine learning algorithms tend to sacrifice performance on rare instances to overall performance, which is referred to as class imbalance problem. To resolve this problem, undersampling reduces the number of common instances. As an example of meteorological forecasting using undersampling, we propose a scheme for lightning forecast. Meteorological variables from European Centre for Medium-range Weather Forecasts provide the input to our scheme, in which an undersampling is used to alleviate the class imbalance problem, and SVMs are used to forecast lightning activities within a particular location and time interval. When the scheme was trained with the original input data, it could not predict any lightning. After undersampling, however, the scheme successfully detected about 38 % of the lightning strikes. Finally, we propose a selective discretization technique that automatically selects and discretizes suitable variables for discretization. Discretization is a preprocessing technique that converts continuous variables into categorical ones. Conventional discretization techniques apply discretization to all variables, which may lead to significant information loss. The selective discretization minimizes information loss by discretizing only variables that have nonlinear relationship with the dependent variable. We suggest a scheme for heavy rainfall forecast as an example of meteorological forecasting using the selective discretization. This scheme takes input from automatic weather stations, and predicts whether or not the heavy rain criterion will be met within the next three hours. The input variables are preprocessed to have a compressed yet efficient representation through the selective discretization and iiPCA. Logistic regression uses the preprocessed data to predict whether or not the heavy rain condition will be satisfied. The selective discretization selectively discretized continuous variables such as date and temperature, contributing to the improvement of predictive performance to a statistically significant level. We present effective machine learning techniques for short-range weather forecast, and provide case studies that apply machine learning to precipitation type forecast, lightning forecast, and heavy rainfall forecast. We combine appropriate techniques to solve each forecasting problem effectively, and the resulting prediction models were good enough to be used for operational forecasting system.기계 학습은 주어진 데이터를 통해 자동으로 프로그램을 생성해내는 기법으로서 인공지능 의 한 분야이다. 특정 업무를 수행하기 위해 일련의 구체적인 명령어를 직접 기입해야만 했던 종래의 프로그래밍과 구분되며, 자연어 처리나 컴퓨터 비전에서와 같이 효과적인 알고리즘을 개발하기 힘든 경우 기계 학습이 선호된다. 전통적으로 기상 예보는 수치 예보 기법을 통해 이루어진다. 수치 예보는 현재의 기상 정 보를 바탕으로 수학적 모델을 이용한 시뮬레이션을 통해 미래의 날씨를 예측한다. 하지만 수치 예보 기법은 초기 자료로 사용한 데이터에 오류가 있을 경우 시뮬레이션을 해나가며 그 오류가 증폭되고, 시공간적으로 비교적 낮은 해상도를 지니고 있으며, 일정 시간이 지나야만 예보가 안정화되기 때문에 국소적이면서 단기적인 기상 예측 문제에는 적합하지 않다. 이를 해결하기 위해 주어진 예측 문제에 적절한 기계 학습 기법을 사용하여 효과적으로 단기 기상 예측을 수행하는 방법들을 제안한다. 첫 번째로, 고차원의 입력 데이터를 가지고 효과적인 예측 모델을 만들기 위한 차원 축소 기법들을 소개한다. 입력 데이터의 차원이 증가함에 따라 기계학습 기법들이 필요로 하는 시간 이나 메모리 요구량이 폭발적으로 증가하는 차원의 저주가 발생하는데, 차원 축소 기술은 이를 완화하기 위한 기법들이다. 차원 축소 기술에는 특징 선택과 특징 추출이 있다. 특징 선택은 전체 입력 인자들 중 일부의 입력 인자들만을 선택하는 반면, 특징 추출은 고차원의 입력 데 이터를 저차원의 공간에 투영한다. 상관 관계 기반의 특징 선택과 대표적인 특징 추출 기법인 주성분 분석이 제시되며, 차원 축소 기술을 사용한 기상 예측 사례로서 강수 유형 예측 모델이 제안된다. 해당 모델은 단기 기상 예보에 포함된 93개의 기상 인자를 입력으로 받아 겨울철 강수 유형을 예측한다. 유효한 입력 인자 집합을 선택하기 위해 특징 선택 기법을 사용하며, 다중 로지스틱 회귀는 선택된 입력 인자들을 이용하여 비, 눈, 그리고 진눈깨비 중 어느 형태로 강수가 발생할 것인지 예측하기 위해 사용된다. 본 예측 모델은 강수유형 예측 정확도를 13 % 이상 개선했으며, 본 모델에서 특징 선택은 통계적으로 유의한 수준으로 정확도를 향상시켰다. 두 번째로, 흔치 않은 기상 이벤트를 예측하는 데에 도움을 주는 샘플링 기법들이 소개된다. 훈련 데이터에 클래스가 불균형하게 분포하는 경우 기계 학습 기법들은 전체 정확도를 높이고자 희귀한 예제들에 대한 예측 성능을 희생하는 경향이 있다. 이러한 클래스 불균형 학습 문제를 해결하기 위해 언더샘플링 기법은 흔한 예제의 숫자를 줄인다. 언더샘플링 기법을 사용한 기상 예측 사례로서 뇌전 예측 모델이 제시된다. 해당 모델은 유럽 중기 예보 센터로부터 단기 기상 예보를 입력으로 받아 뇌전 발생 유무를 예측한다. 클래스 불균형 학습 문제를 해결하기 위해 언더샘플링이 사용되며, 지지 벡터 기계를 사용하여 특정 시간대에 특정 지역에서의 뇌전 발생 유무를 예측한다. 원래의 입력 데이터에서는 뇌전을 하나도 예측하지 못했지만 언더샘플링을 통해 약 38 %의 뇌전을 성공적으로 감지해냈다. 마지막으로, 이산화하기에 적합한 인자를 자동으로 선별하여 이산화하는 선택적 이산화 기법이 소개된다. 이산화는 연속형 변수를 범주형 변수로 변환하는 전처리 기법이다. 종래의 이산화 기법은 모든 변수에 대해 이산화를 적용하는데 이 과정에서 정보 손실은 불가피하다. 선택적 이산화 기법은 종속 변수와 비선형 관계에 있는 변수만을 이산화하여 정보 손실을 최 소화한다. 이러한 선택적 이산화 기법을 사용한 기상 예측 사례로서 집중 호우 예측 모델이 제시된다. 본 모델은 자동 기상 관측 시스템으로부터 입력을 받아 세 시간 이내에 호우 주의보 조건이 충족될 것인지를 예측한다. 입력 데이터는 선택적 이산화 기법과 주성분 분석을 통해 응축된 양질의 정보를 담도록 전처리되고, 로지스틱 회귀는 전처리된 입력 데이터를 이용하 여 호우 주의보 조건이 만족될 것인지 예측한다. 선택적 이산화 기법은 일자나 기온과 같은 인자들을 선택적으로 이산화하여 통계적으로 유의한 수준으로 예측 성능향상에 기여했다. 본 논문은 단기 기상 예보를 위한 효과적인 기계 학습 기법들을 제시하고, 강수 유형, 뇌전, 그리고 집중 호우 예측에 기계 학습을 효과적으로 적용한 사례들을 제공한다. 각 사례에서는 해당 예측 문제를 효과적으로 풀 수 있는 기법들을 조합했으며, 우리가 만든 예측 모델들은 실제 운용 목적으로 사용할 수 있을 정도의 성공적인 예측 품질을 보여주었다.1 Introduction 1 1.1 Machine Learning 1 1.1.1 Data Preprocessing 2 1.1.2 Classification 3 1.2 Meteorological Forecasts 4 1.2.1 Precipitation Types 5 1.2.2 Lightning 5 1.2.3 Heavy Rainfall 6 1.3 Main Contributions . 6 1.4 Organization 8 2 Dimensional Reduction Techniques 9 2.1 Correlation-based Feature Selection 10 2.2 Principal Component Analysis 12 2.3 Case Study: Precipitation Type Forecast 14 2.3.1 Introduction 14 2.3.2 Forecast Model 16 2.3.3 Experiments 26 2.3.4 Discussions 37 3 Sampling Techniques 40 3.1 Undersampling 40 3.2 Oversampling 42 3.3 Case Study: Lightning Forecast 43 3.3.1 Introduction 44 3.3.2 Forecast Model 45 3.3.3 Experiments 54 3.3.4 Discussions 62 4 Discretization Techniques 65 4.1 Selective Discretization 66 4.2 Minimum Description Length Discretization 68 4.3 Case Study: Heavy Rainfall Forecast 70 4.3.1 Introduction 71 4.3.2 Early Warning System 73 4.3.3 Experiments 80 4.3.4 Discussions 92 5 Conclusions 95Docto

SNU Open Repository and Archive