1,627 research outputs found

    Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain

    Get PDF
    The presence of missing data in hydrometeorological datasets is a common problem, usually due to sensor malfunction, deficiencies in records storage and transmission, or other recovery procedures issues. These missing values are the primary source of problems when analyzing and modeling their spatial and temporal variability. Thus, accurate gap-filling techniques for rainfall time series are necessary to have complete datasets, which is crucial in studying climate change evolution. In this work, several machine learning models have been assessed to gap-fill rainfall data, using different approaches and locations in the semiarid region of Andalusia (Southern Spain). Based on the obtained results, the use of neighbor data, located within a 50 km radius, highly outperformed the rest of the assessed approaches, with RMSE (root mean squared error) values up to 1.246 mm/day, MBE (mean bias error) values up to โˆ’0.001 mm/day, and R2 values up to 0.898. Besides, inland area results outperformed coastal area in most locations, arising the efficiency effects based on the distance to the sea (up to an improvement of 63.89% in terms of RMSE). Finally, machine learning (ML) models (especially MLP (multilayer perceptron)) notably outperformed simple linear regression estimations in the coastal sites, whereas in inland locations, the improvements were not such significant

    A novel clustering algorithm based on mathematical morphology for wind power generation prediction

    Get PDF
    Wind power has the characteristic of daily similarity. Furthermore, days with wind power variation trends reflect similar meteorological phenomena. Therefore, wind power prediction accuracy can be improved and computational complexity during model simulation reduced by choosing the historical days whose numerical weather prediction information is similar to that of the predicted day as training samples. This paper proposes a new prediction model based on a novel dilation and erosion (DE) clustering algorithm for wind power generation. In the proposed model, the days with similar numerical weather prediction (NWP) information to the predicted day are selected via the proposed DE clustering algorithm, which is based on the basic operations in mathematical morphology. And the proposed DE clustering algorithm can cluster automatically without supervision. Case study conducted using data from Yilan wind farm in northeast China indicate that the performance of the new generalized regression neural network (GRNN) prediction model based on the proposed DE clustering algorithm (DE clustering-GRNN) is better than that of the DPK-medoids clustering-GRNN, the K-means clustering-GRNN, and the AM-GRNN in terms of day-ahead wind power prediction. Further, the proposed DE clustering-GRNN model is adaptive

    Hydrological modeling based on the KNN algorithm: An application for the forecast of daily flows of the Ramis river, Peru

    Get PDF
    The forecast of river stream flows is of significant importance for the development of early warning systems. Artificial intelligence algorithms have proven to be an effective tool in hydrological modeling data-driven, since they allow establishing relationships between input and output data of a watershed and thus make decisions data-driven. This article investigates the applicability of the k-nearest neighbor (KNN) algorithm for forecasting the mean daily flows of the Ramis river, at the Ramis hydrometric station. As input to the KNN machine learning algorithm, we used a data set of mean basin precipitation and mean daily flow from hydrometeorological stations with various lags. The performance of the KNN algorithm was quantitatively evaluated with hydrological ability metrics such as mean absolute percentage error (MAPE), anomaly correlation coefficient (ACC), Nash-Sutcliffe efficiency (NSE), Kling-Gupta efficiency (KGE') and the spectral angle (SA). The results for forecasting the flows of the Ramis river with the k-nearest neighbor machine learning algorithm reached high levels of reliability with flow lags of one and two days and precipitation with three days. The algorithm used is simple but robust to make short-term flow forecasts and can be integrated as an alternative to strengthen the daily hydrological forecast of the Ramis river

    ๋‹จ๊ธฐ ๊ธฐ์ƒ ์˜ˆ์ธก์„ ์œ„ํ•œ ๊ธฐ๊ณ„ ํ•™์Šต ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ๋ฌธ๋ณ‘๋กœ.Machine learning is the study of artificial intelligence that automatically generates programs from data. It is distinguished from conventional programming, which needs to write a series of specific instructions directly to perform a specific task. Machine learning is preferred when it is difficult to develop an effective algorithm for given tasks such as natural language processing or computer vision. Traditionally, numerical weather prediction (NWP) has been a prevailing method to forecast weather. The NWP predicts future weather through simulations using mathematical models based on current weather conditions. However, the NWP has some problems: errors in the current observations are amplified as simulation proceeds; spatial and temporal resolutions are limited; and there is a spin-up problem, in which initial forecasts are unreliable while the model attempts to stabilize. An alternative approach is needed to complement NWP on small spatial and temporal scales. Therefore, we propose short-range weather forecast models that employ machine learning techniques appropriate for a given forecasting problem. First, we introduce dimensionality reduction techniques to construct effective forecasting models with high-dimensional input data. As the dimension of input data increases, the amount of time or memory required by machine learning techniques can increase significantly. This phenomenon is referred to as the curse of dimensionality, which can be ialleviated by dimensionality reduction techniques. Dimensionality reduction techniques include feature selection and feature extraction. Feature selection selects a subset of input variables, while feature extraction projects high-dimensional features to a lower dimensional space. The details of correlation-based feature selection, and principal component analysis (PCA) which is a representative feature extraction are provided. We then propose a scheme for precipitation type forecast as an example of meteorological forecasting using dimensionality reduction techniques. This scheme takes 93 meteorological variables as input, and uses feature selection to assemble an effective subset of input variables. Multinomial logistic regression is used to classify precipitation as rain, snow, or sleet. This scheme achieved predictions which are 13 % more accurate than the original forecasts, and feature selection improved the accuracy to a statistically significant level. Second, we present sampling techniques that help predict rare meteorological events. Machine learning algorithms tend to sacrifice performance on rare instances to overall performance, which is referred to as class imbalance problem. To resolve this problem, undersampling reduces the number of common instances. As an example of meteorological forecasting using undersampling, we propose a scheme for lightning forecast. Meteorological variables from European Centre for Medium-range Weather Forecasts provide the input to our scheme, in which an undersampling is used to alleviate the class imbalance problem, and SVMs are used to forecast lightning activities within a particular location and time interval. When the scheme was trained with the original input data, it could not predict any lightning. After undersampling, however, the scheme successfully detected about 38 % of the lightning strikes. Finally, we propose a selective discretization technique that automatically selects and discretizes suitable variables for discretization. Discretization is a preprocessing technique that converts continuous variables into categorical ones. Conventional discretization techniques apply discretization to all variables, which may lead to significant information loss. The selective discretization minimizes information loss by discretizing only variables that have nonlinear relationship with the dependent variable. We suggest a scheme for heavy rainfall forecast as an example of meteorological forecasting using the selective discretization. This scheme takes input from automatic weather stations, and predicts whether or not the heavy rain criterion will be met within the next three hours. The input variables are preprocessed to have a compressed yet efficient representation through the selective discretization and iiPCA. Logistic regression uses the preprocessed data to predict whether or not the heavy rain condition will be satisfied. The selective discretization selectively discretized continuous variables such as date and temperature, contributing to the improvement of predictive performance to a statistically significant level. We present effective machine learning techniques for short-range weather forecast, and provide case studies that apply machine learning to precipitation type forecast, lightning forecast, and heavy rainfall forecast. We combine appropriate techniques to solve each forecasting problem effectively, and the resulting prediction models were good enough to be used for operational forecasting system.๊ธฐ๊ณ„ ํ•™์Šต์€ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด ์ž๋™์œผ๋กœ ํ”„๋กœ๊ทธ๋žจ์„ ์ƒ์„ฑํ•ด๋‚ด๋Š” ๊ธฐ๋ฒ•์œผ๋กœ์„œ ์ธ๊ณต์ง€๋Šฅ ์˜ ํ•œ ๋ถ„์•ผ์ด๋‹ค. ํŠน์ • ์—…๋ฌด๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ์ผ๋ จ์˜ ๊ตฌ์ฒด์ ์ธ ๋ช…๋ น์–ด๋ฅผ ์ง์ ‘ ๊ธฐ์ž…ํ•ด์•ผ๋งŒ ํ–ˆ๋˜ ์ข…๋ž˜์˜ ํ”„๋กœ๊ทธ๋ž˜๋ฐ๊ณผ ๊ตฌ๋ถ„๋˜๋ฉฐ, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋‚˜ ์ปดํ“จํ„ฐ ๋น„์ „์—์„œ์™€ ๊ฐ™์ด ํšจ๊ณผ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ๋ฐœํ•˜๊ธฐ ํž˜๋“  ๊ฒฝ์šฐ ๊ธฐ๊ณ„ ํ•™์Šต์ด ์„ ํ˜ธ๋œ๋‹ค. ์ „ํ†ต์ ์œผ๋กœ ๊ธฐ์ƒ ์˜ˆ๋ณด๋Š” ์ˆ˜์น˜ ์˜ˆ๋ณด ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์ด๋ฃจ์–ด์ง„๋‹ค. ์ˆ˜์น˜ ์˜ˆ๋ณด๋Š” ํ˜„์žฌ์˜ ๊ธฐ์ƒ ์ • ๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ˆ˜ํ•™์  ๋ชจ๋ธ์„ ์ด์šฉํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ†ตํ•ด ๋ฏธ๋ž˜์˜ ๋‚ ์”จ๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค. ํ•˜์ง€๋งŒ ์ˆ˜์น˜ ์˜ˆ๋ณด ๊ธฐ๋ฒ•์€ ์ดˆ๊ธฐ ์ž๋ฃŒ๋กœ ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ์— ์˜ค๋ฅ˜๊ฐ€ ์žˆ์„ ๊ฒฝ์šฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ•ด๋‚˜๊ฐ€๋ฉฐ ๊ทธ ์˜ค๋ฅ˜๊ฐ€ ์ฆํญ๋˜๊ณ , ์‹œ๊ณต๊ฐ„์ ์œผ๋กœ ๋น„๊ต์  ๋‚ฎ์€ ํ•ด์ƒ๋„๋ฅผ ์ง€๋‹ˆ๊ณ  ์žˆ์œผ๋ฉฐ, ์ผ์ • ์‹œ๊ฐ„์ด ์ง€๋‚˜์•ผ๋งŒ ์˜ˆ๋ณด๊ฐ€ ์•ˆ์ •ํ™”๋˜๊ธฐ ๋•Œ๋ฌธ์— ๊ตญ์†Œ์ ์ด๋ฉด์„œ ๋‹จ๊ธฐ์ ์ธ ๊ธฐ์ƒ ์˜ˆ์ธก ๋ฌธ์ œ์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ฃผ์–ด์ง„ ์˜ˆ์ธก ๋ฌธ์ œ์— ์ ์ ˆํ•œ ๊ธฐ๊ณ„ ํ•™์Šต ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ํšจ๊ณผ์ ์œผ๋กœ ๋‹จ๊ธฐ ๊ธฐ์ƒ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ๊ณ ์ฐจ์›์˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ํšจ๊ณผ์ ์ธ ์˜ˆ์ธก ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ์ฐจ์› ์ถ•์†Œ ๊ธฐ๋ฒ•๋“ค์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ์ฐจ์›์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ๊ธฐ๊ณ„ํ•™์Šต ๊ธฐ๋ฒ•๋“ค์ด ํ•„์š”๋กœ ํ•˜๋Š” ์‹œ๊ฐ„ ์ด๋‚˜ ๋ฉ”๋ชจ๋ฆฌ ์š”๊ตฌ๋Ÿ‰์ด ํญ๋ฐœ์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๋Š” ์ฐจ์›์˜ ์ €์ฃผ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š”๋ฐ, ์ฐจ์› ์ถ•์†Œ ๊ธฐ์ˆ ์€ ์ด๋ฅผ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•๋“ค์ด๋‹ค. ์ฐจ์› ์ถ•์†Œ ๊ธฐ์ˆ ์—๋Š” ํŠน์ง• ์„ ํƒ๊ณผ ํŠน์ง• ์ถ”์ถœ์ด ์žˆ๋‹ค. ํŠน์ง• ์„ ํƒ์€ ์ „์ฒด ์ž…๋ ฅ ์ธ์ž๋“ค ์ค‘ ์ผ๋ถ€์˜ ์ž…๋ ฅ ์ธ์ž๋“ค๋งŒ์„ ์„ ํƒํ•˜๋Š” ๋ฐ˜๋ฉด, ํŠน์ง• ์ถ”์ถœ์€ ๊ณ ์ฐจ์›์˜ ์ž…๋ ฅ ๋ฐ ์ดํ„ฐ๋ฅผ ์ €์ฐจ์›์˜ ๊ณต๊ฐ„์— ํˆฌ์˜ํ•œ๋‹ค. ์ƒ๊ด€ ๊ด€๊ณ„ ๊ธฐ๋ฐ˜์˜ ํŠน์ง• ์„ ํƒ๊ณผ ๋Œ€ํ‘œ์ ์ธ ํŠน์ง• ์ถ”์ถœ ๊ธฐ๋ฒ•์ธ ์ฃผ์„ฑ๋ถ„ ๋ถ„์„์ด ์ œ์‹œ๋˜๋ฉฐ, ์ฐจ์› ์ถ•์†Œ ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•œ ๊ธฐ์ƒ ์˜ˆ์ธก ์‚ฌ๋ก€๋กœ์„œ ๊ฐ•์ˆ˜ ์œ ํ˜• ์˜ˆ์ธก ๋ชจ๋ธ์ด ์ œ์•ˆ๋œ๋‹ค. ํ•ด๋‹น ๋ชจ๋ธ์€ ๋‹จ๊ธฐ ๊ธฐ์ƒ ์˜ˆ๋ณด์— ํฌํ•จ๋œ 93๊ฐœ์˜ ๊ธฐ์ƒ ์ธ์ž๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ๊ฒจ์šธ์ฒ  ๊ฐ•์ˆ˜ ์œ ํ˜•์„ ์˜ˆ์ธกํ•œ๋‹ค. ์œ ํšจํ•œ ์ž…๋ ฅ ์ธ์ž ์ง‘ํ•ฉ์„ ์„ ํƒํ•˜๊ธฐ ์œ„ํ•ด ํŠน์ง• ์„ ํƒ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋ฉฐ, ๋‹ค์ค‘ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋Š” ์„ ํƒ๋œ ์ž…๋ ฅ ์ธ์ž๋“ค์„ ์ด์šฉํ•˜์—ฌ ๋น„, ๋ˆˆ, ๊ทธ๋ฆฌ๊ณ  ์ง„๋ˆˆ๊นจ๋น„ ์ค‘ ์–ด๋Š ํ˜•ํƒœ๋กœ ๊ฐ•์ˆ˜๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒƒ์ธ์ง€ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋œ๋‹ค. ๋ณธ ์˜ˆ์ธก ๋ชจ๋ธ์€ ๊ฐ•์ˆ˜์œ ํ˜• ์˜ˆ์ธก ์ •ํ™•๋„๋ฅผ 13 % ์ด์ƒ ๊ฐœ์„ ํ–ˆ์œผ๋ฉฐ, ๋ณธ ๋ชจ๋ธ์—์„œ ํŠน์ง• ์„ ํƒ์€ ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜ํ•œ ์ˆ˜์ค€์œผ๋กœ ์ •ํ™•๋„๋ฅผ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ํ”์น˜ ์•Š์€ ๊ธฐ์ƒ ์ด๋ฒคํŠธ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ์— ๋„์›€์„ ์ฃผ๋Š” ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ•๋“ค์ด ์†Œ๊ฐœ๋œ๋‹ค. ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ํด๋ž˜์Šค๊ฐ€ ๋ถˆ๊ท ํ˜•ํ•˜๊ฒŒ ๋ถ„ํฌํ•˜๋Š” ๊ฒฝ์šฐ ๊ธฐ๊ณ„ ํ•™์Šต ๊ธฐ๋ฒ•๋“ค์€ ์ „์ฒด ์ •ํ™•๋„๋ฅผ ๋†’์ด๊ณ ์ž ํฌ๊ท€ํ•œ ์˜ˆ์ œ๋“ค์— ๋Œ€ํ•œ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ํฌ์ƒํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ํ•™์Šต ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์–ธ๋”์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ•์€ ํ”ํ•œ ์˜ˆ์ œ์˜ ์ˆซ์ž๋ฅผ ์ค„์ธ๋‹ค. ์–ธ๋”์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๊ธฐ์ƒ ์˜ˆ์ธก ์‚ฌ๋ก€๋กœ์„œ ๋‡Œ์ „ ์˜ˆ์ธก ๋ชจ๋ธ์ด ์ œ์‹œ๋œ๋‹ค. ํ•ด๋‹น ๋ชจ๋ธ์€ ์œ ๋Ÿฝ ์ค‘๊ธฐ ์˜ˆ๋ณด ์„ผํ„ฐ๋กœ๋ถ€ํ„ฐ ๋‹จ๊ธฐ ๊ธฐ์ƒ ์˜ˆ๋ณด๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ๋‡Œ์ „ ๋ฐœ์ƒ ์œ ๋ฌด๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค. ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ํ•™์Šต ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์–ธ๋”์ƒ˜ํ”Œ๋ง์ด ์‚ฌ์šฉ๋˜๋ฉฐ, ์ง€์ง€ ๋ฒกํ„ฐ ๊ธฐ๊ณ„๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ • ์‹œ๊ฐ„๋Œ€์— ํŠน์ • ์ง€์—ญ์—์„œ์˜ ๋‡Œ์ „ ๋ฐœ์ƒ ์œ ๋ฌด๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค. ์›๋ž˜์˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์—์„œ๋Š” ๋‡Œ์ „์„ ํ•˜๋‚˜๋„ ์˜ˆ์ธกํ•˜์ง€ ๋ชปํ–ˆ์ง€๋งŒ ์–ธ๋”์ƒ˜ํ”Œ๋ง์„ ํ†ตํ•ด ์•ฝ 38 %์˜ ๋‡Œ์ „์„ ์„ฑ๊ณต์ ์œผ๋กœ ๊ฐ์ง€ํ•ด๋ƒˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ด์‚ฐํ™”ํ•˜๊ธฐ์— ์ ํ•ฉํ•œ ์ธ์ž๋ฅผ ์ž๋™์œผ๋กœ ์„ ๋ณ„ํ•˜์—ฌ ์ด์‚ฐํ™”ํ•˜๋Š” ์„ ํƒ์  ์ด์‚ฐํ™” ๊ธฐ๋ฒ•์ด ์†Œ๊ฐœ๋œ๋‹ค. ์ด์‚ฐํ™”๋Š” ์—ฐ์†ํ˜• ๋ณ€์ˆ˜๋ฅผ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ์ „์ฒ˜๋ฆฌ ๊ธฐ๋ฒ•์ด๋‹ค. ์ข…๋ž˜์˜ ์ด์‚ฐํ™” ๊ธฐ๋ฒ•์€ ๋ชจ๋“  ๋ณ€์ˆ˜์— ๋Œ€ํ•ด ์ด์‚ฐํ™”๋ฅผ ์ ์šฉํ•˜๋Š”๋ฐ ์ด ๊ณผ์ •์—์„œ ์ •๋ณด ์†์‹ค์€ ๋ถˆ๊ฐ€ํ”ผํ•˜๋‹ค. ์„ ํƒ์  ์ด์‚ฐํ™” ๊ธฐ๋ฒ•์€ ์ข…์† ๋ณ€์ˆ˜์™€ ๋น„์„ ํ˜• ๊ด€๊ณ„์— ์žˆ๋Š” ๋ณ€์ˆ˜๋งŒ์„ ์ด์‚ฐํ™”ํ•˜์—ฌ ์ •๋ณด ์†์‹ค์„ ์ตœ ์†Œํ™”ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์„ ํƒ์  ์ด์‚ฐํ™” ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๊ธฐ์ƒ ์˜ˆ์ธก ์‚ฌ๋ก€๋กœ์„œ ์ง‘์ค‘ ํ˜ธ์šฐ ์˜ˆ์ธก ๋ชจ๋ธ์ด ์ œ์‹œ๋œ๋‹ค. ๋ณธ ๋ชจ๋ธ์€ ์ž๋™ ๊ธฐ์ƒ ๊ด€์ธก ์‹œ์Šคํ…œ์œผ๋กœ๋ถ€ํ„ฐ ์ž…๋ ฅ์„ ๋ฐ›์•„ ์„ธ ์‹œ๊ฐ„ ์ด๋‚ด์— ํ˜ธ์šฐ ์ฃผ์˜๋ณด ์กฐ๊ฑด์ด ์ถฉ์กฑ๋  ๊ฒƒ์ธ์ง€๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค. ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋Š” ์„ ํƒ์  ์ด์‚ฐํ™” ๊ธฐ๋ฒ•๊ณผ ์ฃผ์„ฑ๋ถ„ ๋ถ„์„์„ ํ†ตํ•ด ์‘์ถ•๋œ ์–‘์งˆ์˜ ์ •๋ณด๋ฅผ ๋‹ด๋„๋ก ์ „์ฒ˜๋ฆฌ๋˜๊ณ , ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋Š” ์ „์ฒ˜๋ฆฌ๋œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜ ์—ฌ ํ˜ธ์šฐ ์ฃผ์˜๋ณด ์กฐ๊ฑด์ด ๋งŒ์กฑ๋  ๊ฒƒ์ธ์ง€ ์˜ˆ์ธกํ•œ๋‹ค. ์„ ํƒ์  ์ด์‚ฐํ™” ๊ธฐ๋ฒ•์€ ์ผ์ž๋‚˜ ๊ธฐ์˜จ๊ณผ ๊ฐ™์€ ์ธ์ž๋“ค์„ ์„ ํƒ์ ์œผ๋กœ ์ด์‚ฐํ™”ํ•˜์—ฌ ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜ํ•œ ์ˆ˜์ค€์œผ๋กœ ์˜ˆ์ธก ์„ฑ๋Šฅํ–ฅ์ƒ์— ๊ธฐ์—ฌํ–ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ๋‹จ๊ธฐ ๊ธฐ์ƒ ์˜ˆ๋ณด๋ฅผ ์œ„ํ•œ ํšจ๊ณผ์ ์ธ ๊ธฐ๊ณ„ ํ•™์Šต ๊ธฐ๋ฒ•๋“ค์„ ์ œ์‹œํ•˜๊ณ , ๊ฐ•์ˆ˜ ์œ ํ˜•, ๋‡Œ์ „, ๊ทธ๋ฆฌ๊ณ  ์ง‘์ค‘ ํ˜ธ์šฐ ์˜ˆ์ธก์— ๊ธฐ๊ณ„ ํ•™์Šต์„ ํšจ๊ณผ์ ์œผ๋กœ ์ ์šฉํ•œ ์‚ฌ๋ก€๋“ค์„ ์ œ๊ณตํ•œ๋‹ค. ๊ฐ ์‚ฌ๋ก€์—์„œ๋Š” ํ•ด๋‹น ์˜ˆ์ธก ๋ฌธ์ œ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ’€ ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋ฒ•๋“ค์„ ์กฐํ•ฉํ–ˆ์œผ๋ฉฐ, ์šฐ๋ฆฌ๊ฐ€ ๋งŒ๋“  ์˜ˆ์ธก ๋ชจ๋ธ๋“ค์€ ์‹ค์ œ ์šด์šฉ ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์„ ์ •๋„์˜ ์„ฑ๊ณต์ ์ธ ์˜ˆ์ธก ํ’ˆ์งˆ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค.1 Introduction 1 1.1 Machine Learning 1 1.1.1 Data Preprocessing 2 1.1.2 Classification 3 1.2 Meteorological Forecasts 4 1.2.1 Precipitation Types 5 1.2.2 Lightning 5 1.2.3 Heavy Rainfall 6 1.3 Main Contributions . 6 1.4 Organization 8 2 Dimensional Reduction Techniques 9 2.1 Correlation-based Feature Selection 10 2.2 Principal Component Analysis 12 2.3 Case Study: Precipitation Type Forecast 14 2.3.1 Introduction 14 2.3.2 Forecast Model 16 2.3.3 Experiments 26 2.3.4 Discussions 37 3 Sampling Techniques 40 3.1 Undersampling 40 3.2 Oversampling 42 3.3 Case Study: Lightning Forecast 43 3.3.1 Introduction 44 3.3.2 Forecast Model 45 3.3.3 Experiments 54 3.3.4 Discussions 62 4 Discretization Techniques 65 4.1 Selective Discretization 66 4.2 Minimum Description Length Discretization 68 4.3 Case Study: Heavy Rainfall Forecast 70 4.3.1 Introduction 71 4.3.2 Early Warning System 73 4.3.3 Experiments 80 4.3.4 Discussions 92 5 Conclusions 95Docto
    • โ€ฆ
    corecore