367 research outputs found

    Analysis, Characterization, Prediction and Attribution of Extreme Atmospheric Events with Machine Learning: a Review

    Full text link
    Atmospheric Extreme Events (EEs) cause severe damages to human societies and ecosystems. The frequency and intensity of EEs and other associated events are increasing in the current climate change and global warming risk. The accurate prediction, characterization, and attribution of atmospheric EEs is therefore a key research field, in which many groups are currently working by applying different methodologies and computational tools. Machine Learning (ML) methods have arisen in the last years as powerful techniques to tackle many of the problems related to atmospheric EEs. This paper reviews the ML algorithms applied to the analysis, characterization, prediction, and attribution of the most important atmospheric EEs. A summary of the most used ML techniques in this area, and a comprehensive critical review of literature related to ML in EEs, are provided. A number of examples is discussed and perspectives and outlooks on the field are drawn.Comment: 93 pages, 18 figures, under revie

    Fuzzy rough and evolutionary approaches to instance selection

    Get PDF

    Advanced techniques for classification of polarimetric synthetic aperture radar data

    Get PDF
    With various remote sensing technologies to aid Earth Observation, radar-based imaging is one of them gaining major interests due to advances in its imaging techniques in form of syn-thetic aperture radar (SAR) and polarimetry. The majority of radar applications focus on mon-itoring, detecting, and classifying local or global areas of interests to support humans within their efforts of decision-making, analysis, and interpretation of Earthโ€™s environment. This thesis focuses on improving the classification performance and process particularly concerning the application of land use and land cover over polarimetric SAR (PolSAR) data. To achieve this, three contributions are studied related to superior feature description and ad-vanced machine-learning techniques including classifiers, principles, and data exploitation. First, this thesis investigates the application of color features within PolSAR image classi-fication to provide additional discrimination on top of the conventional scattering information and texture features. The color features are extracted over the visual presentation of fully and partially polarimetric SAR data by generation of pseudo color images. Within the experiments, the obtained results demonstrated that with the addition of the considered color features, the achieved classification performances outperformed results with common PolSAR features alone as well as achieved higher classification accuracies compared to the traditional combination of PolSAR and texture features. Second, to address the large-scale learning challenge in PolSAR image classification with the utmost efficiency, this thesis introduces the application of an adaptive and data-driven supervised classification topology called Collective Network of Binary Classifiers, CNBC. This topology incorporates active learning to support human users with the analysis and interpretation of PolSAR data focusing on collections of images, where changes or updates to the existing classifier might be required frequently due to surface, terrain, and object changes as well as certain variations in capturing time and position. Evaluations demonstrated the capabilities of CNBC over an extensive set of experimental results regarding the adaptation and data-driven classification of single as well as collections of PolSAR images. The experimental results verified that the evolutionary classification topology, CNBC, did provide an efficient solution for the problems of scalability and dynamic adaptability allowing both feature space dimensions and the number of terrain classes in PolSAR image collections to vary dynamically. Third, most PolSAR classification problems are undertaken by supervised machine learn-ing, which require manually labeled ground truth data available. To reduce the manual labeling efforts, supervised and unsupervised learning approaches are combined into semi-supervised learning to utilize the huge amount of unlabeled data. The application of semi-supervised learning in this thesis is motivated by ill-posed classification tasks related to the small training size problem. Therefore, this thesis investigates how much ground truth is actually necessary for certain classification problems to achieve satisfactory results in a supervised and semi-supervised learning scenario. To address this, two semi-supervised approaches are proposed by unsupervised extension of the training data and ensemble-based self-training. The evaluations showed that significant speed-ups and improvements in classification performance are achieved. In particular, for a remote sensing application such as PolSAR image classification, it is advantageous to exploit the location-based information from the labeled training data. Each of the developed techniques provides its stand-alone contribution from different viewpoints to improve land use and land cover classification. The introduction of a new fea-ture for better discrimination is independent of the underlying classification algorithms used. The application of the CNBC topology is applicable to various classification problems no matter how the underlying data have been acquired, for example in case of remote sensing data. Moreover, the semi-supervised learning approach tackles the challenge of utilizing the unlabeled data. By combining these techniques for superior feature description and advanced machine-learning techniques exploiting classifier topologies and data, further contributions to polarimetric SAR image classification are made. According to the performance evaluations conducted including visual and numerical assessments, the proposed and investigated tech-niques showed valuable improvements and are able to aid the analysis and interpretation of PolSAR image data. Due to the generic nature of the developed techniques, their applications to other remote sensing data will require only minor adjustments

    ๋‹จ๊ธฐ ๊ธฐ์ƒ ์˜ˆ์ธก์„ ์œ„ํ•œ ๊ธฐ๊ณ„ ํ•™์Šต ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ๋ฌธ๋ณ‘๋กœ.Machine learning is the study of artificial intelligence that automatically generates programs from data. It is distinguished from conventional programming, which needs to write a series of specific instructions directly to perform a specific task. Machine learning is preferred when it is difficult to develop an effective algorithm for given tasks such as natural language processing or computer vision. Traditionally, numerical weather prediction (NWP) has been a prevailing method to forecast weather. The NWP predicts future weather through simulations using mathematical models based on current weather conditions. However, the NWP has some problems: errors in the current observations are amplified as simulation proceeds; spatial and temporal resolutions are limited; and there is a spin-up problem, in which initial forecasts are unreliable while the model attempts to stabilize. An alternative approach is needed to complement NWP on small spatial and temporal scales. Therefore, we propose short-range weather forecast models that employ machine learning techniques appropriate for a given forecasting problem. First, we introduce dimensionality reduction techniques to construct effective forecasting models with high-dimensional input data. As the dimension of input data increases, the amount of time or memory required by machine learning techniques can increase significantly. This phenomenon is referred to as the curse of dimensionality, which can be ialleviated by dimensionality reduction techniques. Dimensionality reduction techniques include feature selection and feature extraction. Feature selection selects a subset of input variables, while feature extraction projects high-dimensional features to a lower dimensional space. The details of correlation-based feature selection, and principal component analysis (PCA) which is a representative feature extraction are provided. We then propose a scheme for precipitation type forecast as an example of meteorological forecasting using dimensionality reduction techniques. This scheme takes 93 meteorological variables as input, and uses feature selection to assemble an effective subset of input variables. Multinomial logistic regression is used to classify precipitation as rain, snow, or sleet. This scheme achieved predictions which are 13 % more accurate than the original forecasts, and feature selection improved the accuracy to a statistically significant level. Second, we present sampling techniques that help predict rare meteorological events. Machine learning algorithms tend to sacrifice performance on rare instances to overall performance, which is referred to as class imbalance problem. To resolve this problem, undersampling reduces the number of common instances. As an example of meteorological forecasting using undersampling, we propose a scheme for lightning forecast. Meteorological variables from European Centre for Medium-range Weather Forecasts provide the input to our scheme, in which an undersampling is used to alleviate the class imbalance problem, and SVMs are used to forecast lightning activities within a particular location and time interval. When the scheme was trained with the original input data, it could not predict any lightning. After undersampling, however, the scheme successfully detected about 38 % of the lightning strikes. Finally, we propose a selective discretization technique that automatically selects and discretizes suitable variables for discretization. Discretization is a preprocessing technique that converts continuous variables into categorical ones. Conventional discretization techniques apply discretization to all variables, which may lead to significant information loss. The selective discretization minimizes information loss by discretizing only variables that have nonlinear relationship with the dependent variable. We suggest a scheme for heavy rainfall forecast as an example of meteorological forecasting using the selective discretization. This scheme takes input from automatic weather stations, and predicts whether or not the heavy rain criterion will be met within the next three hours. The input variables are preprocessed to have a compressed yet efficient representation through the selective discretization and iiPCA. Logistic regression uses the preprocessed data to predict whether or not the heavy rain condition will be satisfied. The selective discretization selectively discretized continuous variables such as date and temperature, contributing to the improvement of predictive performance to a statistically significant level. We present effective machine learning techniques for short-range weather forecast, and provide case studies that apply machine learning to precipitation type forecast, lightning forecast, and heavy rainfall forecast. We combine appropriate techniques to solve each forecasting problem effectively, and the resulting prediction models were good enough to be used for operational forecasting system.๊ธฐ๊ณ„ ํ•™์Šต์€ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด ์ž๋™์œผ๋กœ ํ”„๋กœ๊ทธ๋žจ์„ ์ƒ์„ฑํ•ด๋‚ด๋Š” ๊ธฐ๋ฒ•์œผ๋กœ์„œ ์ธ๊ณต์ง€๋Šฅ ์˜ ํ•œ ๋ถ„์•ผ์ด๋‹ค. ํŠน์ • ์—…๋ฌด๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ์ผ๋ จ์˜ ๊ตฌ์ฒด์ ์ธ ๋ช…๋ น์–ด๋ฅผ ์ง์ ‘ ๊ธฐ์ž…ํ•ด์•ผ๋งŒ ํ–ˆ๋˜ ์ข…๋ž˜์˜ ํ”„๋กœ๊ทธ๋ž˜๋ฐ๊ณผ ๊ตฌ๋ถ„๋˜๋ฉฐ, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋‚˜ ์ปดํ“จํ„ฐ ๋น„์ „์—์„œ์™€ ๊ฐ™์ด ํšจ๊ณผ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ๋ฐœํ•˜๊ธฐ ํž˜๋“  ๊ฒฝ์šฐ ๊ธฐ๊ณ„ ํ•™์Šต์ด ์„ ํ˜ธ๋œ๋‹ค. ์ „ํ†ต์ ์œผ๋กœ ๊ธฐ์ƒ ์˜ˆ๋ณด๋Š” ์ˆ˜์น˜ ์˜ˆ๋ณด ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์ด๋ฃจ์–ด์ง„๋‹ค. ์ˆ˜์น˜ ์˜ˆ๋ณด๋Š” ํ˜„์žฌ์˜ ๊ธฐ์ƒ ์ • ๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ˆ˜ํ•™์  ๋ชจ๋ธ์„ ์ด์šฉํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ†ตํ•ด ๋ฏธ๋ž˜์˜ ๋‚ ์”จ๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค. ํ•˜์ง€๋งŒ ์ˆ˜์น˜ ์˜ˆ๋ณด ๊ธฐ๋ฒ•์€ ์ดˆ๊ธฐ ์ž๋ฃŒ๋กœ ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ์— ์˜ค๋ฅ˜๊ฐ€ ์žˆ์„ ๊ฒฝ์šฐ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ•ด๋‚˜๊ฐ€๋ฉฐ ๊ทธ ์˜ค๋ฅ˜๊ฐ€ ์ฆํญ๋˜๊ณ , ์‹œ๊ณต๊ฐ„์ ์œผ๋กœ ๋น„๊ต์  ๋‚ฎ์€ ํ•ด์ƒ๋„๋ฅผ ์ง€๋‹ˆ๊ณ  ์žˆ์œผ๋ฉฐ, ์ผ์ • ์‹œ๊ฐ„์ด ์ง€๋‚˜์•ผ๋งŒ ์˜ˆ๋ณด๊ฐ€ ์•ˆ์ •ํ™”๋˜๊ธฐ ๋•Œ๋ฌธ์— ๊ตญ์†Œ์ ์ด๋ฉด์„œ ๋‹จ๊ธฐ์ ์ธ ๊ธฐ์ƒ ์˜ˆ์ธก ๋ฌธ์ œ์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ฃผ์–ด์ง„ ์˜ˆ์ธก ๋ฌธ์ œ์— ์ ์ ˆํ•œ ๊ธฐ๊ณ„ ํ•™์Šต ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ํšจ๊ณผ์ ์œผ๋กœ ๋‹จ๊ธฐ ๊ธฐ์ƒ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ๊ณ ์ฐจ์›์˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ํšจ๊ณผ์ ์ธ ์˜ˆ์ธก ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ์ฐจ์› ์ถ•์†Œ ๊ธฐ๋ฒ•๋“ค์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ์ฐจ์›์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ๊ธฐ๊ณ„ํ•™์Šต ๊ธฐ๋ฒ•๋“ค์ด ํ•„์š”๋กœ ํ•˜๋Š” ์‹œ๊ฐ„ ์ด๋‚˜ ๋ฉ”๋ชจ๋ฆฌ ์š”๊ตฌ๋Ÿ‰์ด ํญ๋ฐœ์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๋Š” ์ฐจ์›์˜ ์ €์ฃผ๊ฐ€ ๋ฐœ์ƒํ•˜๋Š”๋ฐ, ์ฐจ์› ์ถ•์†Œ ๊ธฐ์ˆ ์€ ์ด๋ฅผ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ๋ฒ•๋“ค์ด๋‹ค. ์ฐจ์› ์ถ•์†Œ ๊ธฐ์ˆ ์—๋Š” ํŠน์ง• ์„ ํƒ๊ณผ ํŠน์ง• ์ถ”์ถœ์ด ์žˆ๋‹ค. ํŠน์ง• ์„ ํƒ์€ ์ „์ฒด ์ž…๋ ฅ ์ธ์ž๋“ค ์ค‘ ์ผ๋ถ€์˜ ์ž…๋ ฅ ์ธ์ž๋“ค๋งŒ์„ ์„ ํƒํ•˜๋Š” ๋ฐ˜๋ฉด, ํŠน์ง• ์ถ”์ถœ์€ ๊ณ ์ฐจ์›์˜ ์ž…๋ ฅ ๋ฐ ์ดํ„ฐ๋ฅผ ์ €์ฐจ์›์˜ ๊ณต๊ฐ„์— ํˆฌ์˜ํ•œ๋‹ค. ์ƒ๊ด€ ๊ด€๊ณ„ ๊ธฐ๋ฐ˜์˜ ํŠน์ง• ์„ ํƒ๊ณผ ๋Œ€ํ‘œ์ ์ธ ํŠน์ง• ์ถ”์ถœ ๊ธฐ๋ฒ•์ธ ์ฃผ์„ฑ๋ถ„ ๋ถ„์„์ด ์ œ์‹œ๋˜๋ฉฐ, ์ฐจ์› ์ถ•์†Œ ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•œ ๊ธฐ์ƒ ์˜ˆ์ธก ์‚ฌ๋ก€๋กœ์„œ ๊ฐ•์ˆ˜ ์œ ํ˜• ์˜ˆ์ธก ๋ชจ๋ธ์ด ์ œ์•ˆ๋œ๋‹ค. ํ•ด๋‹น ๋ชจ๋ธ์€ ๋‹จ๊ธฐ ๊ธฐ์ƒ ์˜ˆ๋ณด์— ํฌํ•จ๋œ 93๊ฐœ์˜ ๊ธฐ์ƒ ์ธ์ž๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ๊ฒจ์šธ์ฒ  ๊ฐ•์ˆ˜ ์œ ํ˜•์„ ์˜ˆ์ธกํ•œ๋‹ค. ์œ ํšจํ•œ ์ž…๋ ฅ ์ธ์ž ์ง‘ํ•ฉ์„ ์„ ํƒํ•˜๊ธฐ ์œ„ํ•ด ํŠน์ง• ์„ ํƒ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋ฉฐ, ๋‹ค์ค‘ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋Š” ์„ ํƒ๋œ ์ž…๋ ฅ ์ธ์ž๋“ค์„ ์ด์šฉํ•˜์—ฌ ๋น„, ๋ˆˆ, ๊ทธ๋ฆฌ๊ณ  ์ง„๋ˆˆ๊นจ๋น„ ์ค‘ ์–ด๋Š ํ˜•ํƒœ๋กœ ๊ฐ•์ˆ˜๊ฐ€ ๋ฐœ์ƒํ•  ๊ฒƒ์ธ์ง€ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋œ๋‹ค. ๋ณธ ์˜ˆ์ธก ๋ชจ๋ธ์€ ๊ฐ•์ˆ˜์œ ํ˜• ์˜ˆ์ธก ์ •ํ™•๋„๋ฅผ 13 % ์ด์ƒ ๊ฐœ์„ ํ–ˆ์œผ๋ฉฐ, ๋ณธ ๋ชจ๋ธ์—์„œ ํŠน์ง• ์„ ํƒ์€ ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜ํ•œ ์ˆ˜์ค€์œผ๋กœ ์ •ํ™•๋„๋ฅผ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ํ”์น˜ ์•Š์€ ๊ธฐ์ƒ ์ด๋ฒคํŠธ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ์— ๋„์›€์„ ์ฃผ๋Š” ์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ•๋“ค์ด ์†Œ๊ฐœ๋œ๋‹ค. ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ํด๋ž˜์Šค๊ฐ€ ๋ถˆ๊ท ํ˜•ํ•˜๊ฒŒ ๋ถ„ํฌํ•˜๋Š” ๊ฒฝ์šฐ ๊ธฐ๊ณ„ ํ•™์Šต ๊ธฐ๋ฒ•๋“ค์€ ์ „์ฒด ์ •ํ™•๋„๋ฅผ ๋†’์ด๊ณ ์ž ํฌ๊ท€ํ•œ ์˜ˆ์ œ๋“ค์— ๋Œ€ํ•œ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ํฌ์ƒํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ํ•™์Šต ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์–ธ๋”์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ•์€ ํ”ํ•œ ์˜ˆ์ œ์˜ ์ˆซ์ž๋ฅผ ์ค„์ธ๋‹ค. ์–ธ๋”์ƒ˜ํ”Œ๋ง ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๊ธฐ์ƒ ์˜ˆ์ธก ์‚ฌ๋ก€๋กœ์„œ ๋‡Œ์ „ ์˜ˆ์ธก ๋ชจ๋ธ์ด ์ œ์‹œ๋œ๋‹ค. ํ•ด๋‹น ๋ชจ๋ธ์€ ์œ ๋Ÿฝ ์ค‘๊ธฐ ์˜ˆ๋ณด ์„ผํ„ฐ๋กœ๋ถ€ํ„ฐ ๋‹จ๊ธฐ ๊ธฐ์ƒ ์˜ˆ๋ณด๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ๋‡Œ์ „ ๋ฐœ์ƒ ์œ ๋ฌด๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค. ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ํ•™์Šต ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์–ธ๋”์ƒ˜ํ”Œ๋ง์ด ์‚ฌ์šฉ๋˜๋ฉฐ, ์ง€์ง€ ๋ฒกํ„ฐ ๊ธฐ๊ณ„๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ • ์‹œ๊ฐ„๋Œ€์— ํŠน์ • ์ง€์—ญ์—์„œ์˜ ๋‡Œ์ „ ๋ฐœ์ƒ ์œ ๋ฌด๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค. ์›๋ž˜์˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์—์„œ๋Š” ๋‡Œ์ „์„ ํ•˜๋‚˜๋„ ์˜ˆ์ธกํ•˜์ง€ ๋ชปํ–ˆ์ง€๋งŒ ์–ธ๋”์ƒ˜ํ”Œ๋ง์„ ํ†ตํ•ด ์•ฝ 38 %์˜ ๋‡Œ์ „์„ ์„ฑ๊ณต์ ์œผ๋กœ ๊ฐ์ง€ํ•ด๋ƒˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์ด์‚ฐํ™”ํ•˜๊ธฐ์— ์ ํ•ฉํ•œ ์ธ์ž๋ฅผ ์ž๋™์œผ๋กœ ์„ ๋ณ„ํ•˜์—ฌ ์ด์‚ฐํ™”ํ•˜๋Š” ์„ ํƒ์  ์ด์‚ฐํ™” ๊ธฐ๋ฒ•์ด ์†Œ๊ฐœ๋œ๋‹ค. ์ด์‚ฐํ™”๋Š” ์—ฐ์†ํ˜• ๋ณ€์ˆ˜๋ฅผ ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ์ „์ฒ˜๋ฆฌ ๊ธฐ๋ฒ•์ด๋‹ค. ์ข…๋ž˜์˜ ์ด์‚ฐํ™” ๊ธฐ๋ฒ•์€ ๋ชจ๋“  ๋ณ€์ˆ˜์— ๋Œ€ํ•ด ์ด์‚ฐํ™”๋ฅผ ์ ์šฉํ•˜๋Š”๋ฐ ์ด ๊ณผ์ •์—์„œ ์ •๋ณด ์†์‹ค์€ ๋ถˆ๊ฐ€ํ”ผํ•˜๋‹ค. ์„ ํƒ์  ์ด์‚ฐํ™” ๊ธฐ๋ฒ•์€ ์ข…์† ๋ณ€์ˆ˜์™€ ๋น„์„ ํ˜• ๊ด€๊ณ„์— ์žˆ๋Š” ๋ณ€์ˆ˜๋งŒ์„ ์ด์‚ฐํ™”ํ•˜์—ฌ ์ •๋ณด ์†์‹ค์„ ์ตœ ์†Œํ™”ํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์„ ํƒ์  ์ด์‚ฐํ™” ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๊ธฐ์ƒ ์˜ˆ์ธก ์‚ฌ๋ก€๋กœ์„œ ์ง‘์ค‘ ํ˜ธ์šฐ ์˜ˆ์ธก ๋ชจ๋ธ์ด ์ œ์‹œ๋œ๋‹ค. ๋ณธ ๋ชจ๋ธ์€ ์ž๋™ ๊ธฐ์ƒ ๊ด€์ธก ์‹œ์Šคํ…œ์œผ๋กœ๋ถ€ํ„ฐ ์ž…๋ ฅ์„ ๋ฐ›์•„ ์„ธ ์‹œ๊ฐ„ ์ด๋‚ด์— ํ˜ธ์šฐ ์ฃผ์˜๋ณด ์กฐ๊ฑด์ด ์ถฉ์กฑ๋  ๊ฒƒ์ธ์ง€๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค. ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋Š” ์„ ํƒ์  ์ด์‚ฐํ™” ๊ธฐ๋ฒ•๊ณผ ์ฃผ์„ฑ๋ถ„ ๋ถ„์„์„ ํ†ตํ•ด ์‘์ถ•๋œ ์–‘์งˆ์˜ ์ •๋ณด๋ฅผ ๋‹ด๋„๋ก ์ „์ฒ˜๋ฆฌ๋˜๊ณ , ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋Š” ์ „์ฒ˜๋ฆฌ๋œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜ ์—ฌ ํ˜ธ์šฐ ์ฃผ์˜๋ณด ์กฐ๊ฑด์ด ๋งŒ์กฑ๋  ๊ฒƒ์ธ์ง€ ์˜ˆ์ธกํ•œ๋‹ค. ์„ ํƒ์  ์ด์‚ฐํ™” ๊ธฐ๋ฒ•์€ ์ผ์ž๋‚˜ ๊ธฐ์˜จ๊ณผ ๊ฐ™์€ ์ธ์ž๋“ค์„ ์„ ํƒ์ ์œผ๋กœ ์ด์‚ฐํ™”ํ•˜์—ฌ ํ†ต๊ณ„์ ์œผ๋กœ ์œ ์˜ํ•œ ์ˆ˜์ค€์œผ๋กœ ์˜ˆ์ธก ์„ฑ๋Šฅํ–ฅ์ƒ์— ๊ธฐ์—ฌํ–ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ๋‹จ๊ธฐ ๊ธฐ์ƒ ์˜ˆ๋ณด๋ฅผ ์œ„ํ•œ ํšจ๊ณผ์ ์ธ ๊ธฐ๊ณ„ ํ•™์Šต ๊ธฐ๋ฒ•๋“ค์„ ์ œ์‹œํ•˜๊ณ , ๊ฐ•์ˆ˜ ์œ ํ˜•, ๋‡Œ์ „, ๊ทธ๋ฆฌ๊ณ  ์ง‘์ค‘ ํ˜ธ์šฐ ์˜ˆ์ธก์— ๊ธฐ๊ณ„ ํ•™์Šต์„ ํšจ๊ณผ์ ์œผ๋กœ ์ ์šฉํ•œ ์‚ฌ๋ก€๋“ค์„ ์ œ๊ณตํ•œ๋‹ค. ๊ฐ ์‚ฌ๋ก€์—์„œ๋Š” ํ•ด๋‹น ์˜ˆ์ธก ๋ฌธ์ œ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ’€ ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋ฒ•๋“ค์„ ์กฐํ•ฉํ–ˆ์œผ๋ฉฐ, ์šฐ๋ฆฌ๊ฐ€ ๋งŒ๋“  ์˜ˆ์ธก ๋ชจ๋ธ๋“ค์€ ์‹ค์ œ ์šด์šฉ ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์„ ์ •๋„์˜ ์„ฑ๊ณต์ ์ธ ์˜ˆ์ธก ํ’ˆ์งˆ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค.1 Introduction 1 1.1 Machine Learning 1 1.1.1 Data Preprocessing 2 1.1.2 Classification 3 1.2 Meteorological Forecasts 4 1.2.1 Precipitation Types 5 1.2.2 Lightning 5 1.2.3 Heavy Rainfall 6 1.3 Main Contributions . 6 1.4 Organization 8 2 Dimensional Reduction Techniques 9 2.1 Correlation-based Feature Selection 10 2.2 Principal Component Analysis 12 2.3 Case Study: Precipitation Type Forecast 14 2.3.1 Introduction 14 2.3.2 Forecast Model 16 2.3.3 Experiments 26 2.3.4 Discussions 37 3 Sampling Techniques 40 3.1 Undersampling 40 3.2 Oversampling 42 3.3 Case Study: Lightning Forecast 43 3.3.1 Introduction 44 3.3.2 Forecast Model 45 3.3.3 Experiments 54 3.3.4 Discussions 62 4 Discretization Techniques 65 4.1 Selective Discretization 66 4.2 Minimum Description Length Discretization 68 4.3 Case Study: Heavy Rainfall Forecast 70 4.3.1 Introduction 71 4.3.2 Early Warning System 73 4.3.3 Experiments 80 4.3.4 Discussions 92 5 Conclusions 95Docto

    Cooperative Profit Random Forests With Application in Ocean Front Recognition.

    Get PDF
    Random Forests are powerful classification and regression tools that are commonly applied in machine learning and image processing. In the majority of random classification forests algorithms, the Gini index and the information gain ratio are commonly used for node splitting. However, these two kinds of node-split methods may pay less attention to the intrinsic structure of the attribute variables and fail to find attributes with strong discriminate ability as a group yet weak as individuals. In this paper, we propose an innovative method for splitting the tree nodes based on the cooperative game theory, from which some attributes with good discriminate ability as a group can be learned. This new random forests algorithm is called Cooperative Profit Random Forests (CPRF). Experimental comparisons with several other existing random classification forests algorithms are carried out on several real-world data sets, including remote sensing images. The results show that CPRF outperforms other existing Random Forests algorithms in most cases. In particular, CPRF achieves promising results in ocean front recognition

    Classification of Polarimetric SAR Images Using Compact Convolutional Neural Networks

    Get PDF
    Classification of polarimetric synthetic aperture radar (PolSAR) images is an active research area with a major role in environmental applications. The traditional Machine Learning (ML) methods proposed in this domain generally focus on utilizing highly discriminative features to improve the classification performance, but this task is complicated by the well-known "curse of dimensionality" phenomena. Other approaches based on deep Convolutional Neural Networks (CNNs) have certain limitations and drawbacks, such as high computational complexity, an unfeasibly large training set with ground-truth labels, and special hardware requirements. In this work, to address the limitations of traditional ML and deep CNN based methods, a novel and systematic classification framework is proposed for the classification of PolSAR images, based on a compact and adaptive implementation of CNNs using a sliding-window classification approach. The proposed approach has three advantages. First, there is no requirement for an extensive feature extraction process. Second, it is computationally efficient due to utilized compact configurations. In particular, the proposed compact and adaptive CNN model is designed to achieve the maximum classification accuracy with minimum training and computational complexity. This is of considerable importance considering the high costs involved in labelling in PolSAR classification. Finally, the proposed approach can perform classification using smaller window sizes than deep CNNs. Experimental evaluations have been performed over the most commonly-used four benchmark PolSAR images: AIRSAR L-Band and RADARSAT-2 C-Band data of San Francisco Bay and Flevoland areas. Accordingly, the best obtained overall accuracies range between 92.33 - 99.39% for these benchmark study sites
    • โ€ฆ
    corecore