Search CORE

367 research outputs found

Analysis, Characterization, Prediction and Attribution of Extreme Atmospheric Events with Machine Learning: a Review

Author: Ascenso Guido
Barriopedro David
Casillas-Pérez David
Castelletti Andrea
Del Ser Javier
Fister Dusan
García-Herrera Ricardo
Giuliani Mateo
Kadow Christopher
Pérez-Aracil Jorge
Restelli Marcello
Salcedo-Sanz Sancho
Publication venue
Publication date: 03/06/2022
Field of study

Atmospheric Extreme Events (EEs) cause severe damages to human societies and ecosystems. The frequency and intensity of EEs and other associated events are increasing in the current climate change and global warming risk. The accurate prediction, characterization, and attribution of atmospheric EEs is therefore a key research field, in which many groups are currently working by applying different methodologies and computational tools. Machine Learning (ML) methods have arisen in the last years as powerful techniques to tackle many of the problems related to atmospheric EEs. This paper reviews the ML algorithms applied to the analysis, characterization, prediction, and attribution of the most important atmospheric EEs. A summary of the most used ML techniques in this area, and a comprehensive critical review of literature related to ML in EEs, are provided. A number of examples is discussed and perspectives and outlooks on the field are drawn.Comment: 93 pages, 18 figures, under revie

arXiv.org e-Print Archive

Fuzzy rough and evolutionary approaches to instance selection

Author: Verbiest Nele
Publication venue: Ghent University. Faculty of Sciences
Publication date: 01/01/2014
Field of study

Ghent University Academic Bibliography

Advanced techniques for classification of polarimetric synthetic aperture radar data

Author: Uhlmann Stefan Gerd
Publication venue: Tampere University of Technology
Publication date: 01/01/2014
Field of study

With various remote sensing technologies to aid Earth Observation, radar-based imaging is one of them gaining major interests due to advances in its imaging techniques in form of syn-thetic aperture radar (SAR) and polarimetry. The majority of radar applications focus on mon-itoring, detecting, and classifying local or global areas of interests to support humans within their efforts of decision-making, analysis, and interpretation of Earth’s environment. This thesis focuses on improving the classification performance and process particularly concerning the application of land use and land cover over polarimetric SAR (PolSAR) data. To achieve this, three contributions are studied related to superior feature description and ad-vanced machine-learning techniques including classifiers, principles, and data exploitation. First, this thesis investigates the application of color features within PolSAR image classi-fication to provide additional discrimination on top of the conventional scattering information and texture features. The color features are extracted over the visual presentation of fully and partially polarimetric SAR data by generation of pseudo color images. Within the experiments, the obtained results demonstrated that with the addition of the considered color features, the achieved classification performances outperformed results with common PolSAR features alone as well as achieved higher classification accuracies compared to the traditional combination of PolSAR and texture features. Second, to address the large-scale learning challenge in PolSAR image classification with the utmost efficiency, this thesis introduces the application of an adaptive and data-driven supervised classification topology called Collective Network of Binary Classifiers, CNBC. This topology incorporates active learning to support human users with the analysis and interpretation of PolSAR data focusing on collections of images, where changes or updates to the existing classifier might be required frequently due to surface, terrain, and object changes as well as certain variations in capturing time and position. Evaluations demonstrated the capabilities of CNBC over an extensive set of experimental results regarding the adaptation and data-driven classification of single as well as collections of PolSAR images. The experimental results verified that the evolutionary classification topology, CNBC, did provide an efficient solution for the problems of scalability and dynamic adaptability allowing both feature space dimensions and the number of terrain classes in PolSAR image collections to vary dynamically. Third, most PolSAR classification problems are undertaken by supervised machine learn-ing, which require manually labeled ground truth data available. To reduce the manual labeling efforts, supervised and unsupervised learning approaches are combined into semi-supervised learning to utilize the huge amount of unlabeled data. The application of semi-supervised learning in this thesis is motivated by ill-posed classification tasks related to the small training size problem. Therefore, this thesis investigates how much ground truth is actually necessary for certain classification problems to achieve satisfactory results in a supervised and semi-supervised learning scenario. To address this, two semi-supervised approaches are proposed by unsupervised extension of the training data and ensemble-based self-training. The evaluations showed that significant speed-ups and improvements in classification performance are achieved. In particular, for a remote sensing application such as PolSAR image classification, it is advantageous to exploit the location-based information from the labeled training data. Each of the developed techniques provides its stand-alone contribution from different viewpoints to improve land use and land cover classification. The introduction of a new fea-ture for better discrimination is independent of the underlying classification algorithms used. The application of the CNBC topology is applicable to various classification problems no matter how the underlying data have been acquired, for example in case of remote sensing data. Moreover, the semi-supervised learning approach tackles the challenge of utilizing the unlabeled data. By combining these techniques for superior feature description and advanced machine-learning techniques exploiting classifier topologies and data, further contributions to polarimetric SAR image classification are made. According to the performance evaluations conducted including visual and numerical assessments, the proposed and investigated tech-niques showed valuable improvements and are able to aid the analysis and interpretation of PolSAR image data. Due to the generic nature of the developed techniques, their applications to other remote sensing data will require only minor adjustments

Trepo - Institutional Repository of Tampere University

단기 기상 예측을 위한 기계 학습 기법

Author: 문승현
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :공과대학 전기·컴퓨터공학부,2020. 2. 문병로.Machine learning is the study of artificial intelligence that automatically generates programs from data. It is distinguished from conventional programming, which needs to write a series of specific instructions directly to perform a specific task. Machine learning is preferred when it is difficult to develop an effective algorithm for given tasks such as natural language processing or computer vision. Traditionally, numerical weather prediction (NWP) has been a prevailing method to forecast weather. The NWP predicts future weather through simulations using mathematical models based on current weather conditions. However, the NWP has some problems: errors in the current observations are amplified as simulation proceeds; spatial and temporal resolutions are limited; and there is a spin-up problem, in which initial forecasts are unreliable while the model attempts to stabilize. An alternative approach is needed to complement NWP on small spatial and temporal scales. Therefore, we propose short-range weather forecast models that employ machine learning techniques appropriate for a given forecasting problem. First, we introduce dimensionality reduction techniques to construct effective forecasting models with high-dimensional input data. As the dimension of input data increases, the amount of time or memory required by machine learning techniques can increase significantly. This phenomenon is referred to as the curse of dimensionality, which can be ialleviated by dimensionality reduction techniques. Dimensionality reduction techniques include feature selection and feature extraction. Feature selection selects a subset of input variables, while feature extraction projects high-dimensional features to a lower dimensional space. The details of correlation-based feature selection, and principal component analysis (PCA) which is a representative feature extraction are provided. We then propose a scheme for precipitation type forecast as an example of meteorological forecasting using dimensionality reduction techniques. This scheme takes 93 meteorological variables as input, and uses feature selection to assemble an effective subset of input variables. Multinomial logistic regression is used to classify precipitation as rain, snow, or sleet. This scheme achieved predictions which are 13 % more accurate than the original forecasts, and feature selection improved the accuracy to a statistically significant level. Second, we present sampling techniques that help predict rare meteorological events. Machine learning algorithms tend to sacrifice performance on rare instances to overall performance, which is referred to as class imbalance problem. To resolve this problem, undersampling reduces the number of common instances. As an example of meteorological forecasting using undersampling, we propose a scheme for lightning forecast. Meteorological variables from European Centre for Medium-range Weather Forecasts provide the input to our scheme, in which an undersampling is used to alleviate the class imbalance problem, and SVMs are used to forecast lightning activities within a particular location and time interval. When the scheme was trained with the original input data, it could not predict any lightning. After undersampling, however, the scheme successfully detected about 38 % of the lightning strikes. Finally, we propose a selective discretization technique that automatically selects and discretizes suitable variables for discretization. Discretization is a preprocessing technique that converts continuous variables into categorical ones. Conventional discretization techniques apply discretization to all variables, which may lead to significant information loss. The selective discretization minimizes information loss by discretizing only variables that have nonlinear relationship with the dependent variable. We suggest a scheme for heavy rainfall forecast as an example of meteorological forecasting using the selective discretization. This scheme takes input from automatic weather stations, and predicts whether or not the heavy rain criterion will be met within the next three hours. The input variables are preprocessed to have a compressed yet efficient representation through the selective discretization and iiPCA. Logistic regression uses the preprocessed data to predict whether or not the heavy rain condition will be satisfied. The selective discretization selectively discretized continuous variables such as date and temperature, contributing to the improvement of predictive performance to a statistically significant level. We present effective machine learning techniques for short-range weather forecast, and provide case studies that apply machine learning to precipitation type forecast, lightning forecast, and heavy rainfall forecast. We combine appropriate techniques to solve each forecasting problem effectively, and the resulting prediction models were good enough to be used for operational forecasting system.기계 학습은 주어진 데이터를 통해 자동으로 프로그램을 생성해내는 기법으로서 인공지능 의 한 분야이다. 특정 업무를 수행하기 위해 일련의 구체적인 명령어를 직접 기입해야만 했던 종래의 프로그래밍과 구분되며, 자연어 처리나 컴퓨터 비전에서와 같이 효과적인 알고리즘을 개발하기 힘든 경우 기계 학습이 선호된다. 전통적으로 기상 예보는 수치 예보 기법을 통해 이루어진다. 수치 예보는 현재의 기상 정 보를 바탕으로 수학적 모델을 이용한 시뮬레이션을 통해 미래의 날씨를 예측한다. 하지만 수치 예보 기법은 초기 자료로 사용한 데이터에 오류가 있을 경우 시뮬레이션을 해나가며 그 오류가 증폭되고, 시공간적으로 비교적 낮은 해상도를 지니고 있으며, 일정 시간이 지나야만 예보가 안정화되기 때문에 국소적이면서 단기적인 기상 예측 문제에는 적합하지 않다. 이를 해결하기 위해 주어진 예측 문제에 적절한 기계 학습 기법을 사용하여 효과적으로 단기 기상 예측을 수행하는 방법들을 제안한다. 첫 번째로, 고차원의 입력 데이터를 가지고 효과적인 예측 모델을 만들기 위한 차원 축소 기법들을 소개한다. 입력 데이터의 차원이 증가함에 따라 기계학습 기법들이 필요로 하는 시간 이나 메모리 요구량이 폭발적으로 증가하는 차원의 저주가 발생하는데, 차원 축소 기술은 이를 완화하기 위한 기법들이다. 차원 축소 기술에는 특징 선택과 특징 추출이 있다. 특징 선택은 전체 입력 인자들 중 일부의 입력 인자들만을 선택하는 반면, 특징 추출은 고차원의 입력 데 이터를 저차원의 공간에 투영한다. 상관 관계 기반의 특징 선택과 대표적인 특징 추출 기법인 주성분 분석이 제시되며, 차원 축소 기술을 사용한 기상 예측 사례로서 강수 유형 예측 모델이 제안된다. 해당 모델은 단기 기상 예보에 포함된 93개의 기상 인자를 입력으로 받아 겨울철 강수 유형을 예측한다. 유효한 입력 인자 집합을 선택하기 위해 특징 선택 기법을 사용하며, 다중 로지스틱 회귀는 선택된 입력 인자들을 이용하여 비, 눈, 그리고 진눈깨비 중 어느 형태로 강수가 발생할 것인지 예측하기 위해 사용된다. 본 예측 모델은 강수유형 예측 정확도를 13 % 이상 개선했으며, 본 모델에서 특징 선택은 통계적으로 유의한 수준으로 정확도를 향상시켰다. 두 번째로, 흔치 않은 기상 이벤트를 예측하는 데에 도움을 주는 샘플링 기법들이 소개된다. 훈련 데이터에 클래스가 불균형하게 분포하는 경우 기계 학습 기법들은 전체 정확도를 높이고자 희귀한 예제들에 대한 예측 성능을 희생하는 경향이 있다. 이러한 클래스 불균형 학습 문제를 해결하기 위해 언더샘플링 기법은 흔한 예제의 숫자를 줄인다. 언더샘플링 기법을 사용한 기상 예측 사례로서 뇌전 예측 모델이 제시된다. 해당 모델은 유럽 중기 예보 센터로부터 단기 기상 예보를 입력으로 받아 뇌전 발생 유무를 예측한다. 클래스 불균형 학습 문제를 해결하기 위해 언더샘플링이 사용되며, 지지 벡터 기계를 사용하여 특정 시간대에 특정 지역에서의 뇌전 발생 유무를 예측한다. 원래의 입력 데이터에서는 뇌전을 하나도 예측하지 못했지만 언더샘플링을 통해 약 38 %의 뇌전을 성공적으로 감지해냈다. 마지막으로, 이산화하기에 적합한 인자를 자동으로 선별하여 이산화하는 선택적 이산화 기법이 소개된다. 이산화는 연속형 변수를 범주형 변수로 변환하는 전처리 기법이다. 종래의 이산화 기법은 모든 변수에 대해 이산화를 적용하는데 이 과정에서 정보 손실은 불가피하다. 선택적 이산화 기법은 종속 변수와 비선형 관계에 있는 변수만을 이산화하여 정보 손실을 최 소화한다. 이러한 선택적 이산화 기법을 사용한 기상 예측 사례로서 집중 호우 예측 모델이 제시된다. 본 모델은 자동 기상 관측 시스템으로부터 입력을 받아 세 시간 이내에 호우 주의보 조건이 충족될 것인지를 예측한다. 입력 데이터는 선택적 이산화 기법과 주성분 분석을 통해 응축된 양질의 정보를 담도록 전처리되고, 로지스틱 회귀는 전처리된 입력 데이터를 이용하 여 호우 주의보 조건이 만족될 것인지 예측한다. 선택적 이산화 기법은 일자나 기온과 같은 인자들을 선택적으로 이산화하여 통계적으로 유의한 수준으로 예측 성능향상에 기여했다. 본 논문은 단기 기상 예보를 위한 효과적인 기계 학습 기법들을 제시하고, 강수 유형, 뇌전, 그리고 집중 호우 예측에 기계 학습을 효과적으로 적용한 사례들을 제공한다. 각 사례에서는 해당 예측 문제를 효과적으로 풀 수 있는 기법들을 조합했으며, 우리가 만든 예측 모델들은 실제 운용 목적으로 사용할 수 있을 정도의 성공적인 예측 품질을 보여주었다.1 Introduction 1 1.1 Machine Learning 1 1.1.1 Data Preprocessing 2 1.1.2 Classification 3 1.2 Meteorological Forecasts 4 1.2.1 Precipitation Types 5 1.2.2 Lightning 5 1.2.3 Heavy Rainfall 6 1.3 Main Contributions . 6 1.4 Organization 8 2 Dimensional Reduction Techniques 9 2.1 Correlation-based Feature Selection 10 2.2 Principal Component Analysis 12 2.3 Case Study: Precipitation Type Forecast 14 2.3.1 Introduction 14 2.3.2 Forecast Model 16 2.3.3 Experiments 26 2.3.4 Discussions 37 3 Sampling Techniques 40 3.1 Undersampling 40 3.2 Oversampling 42 3.3 Case Study: Lightning Forecast 43 3.3.1 Introduction 44 3.3.2 Forecast Model 45 3.3.3 Experiments 54 3.3.4 Discussions 62 4 Discretization Techniques 65 4.1 Selective Discretization 66 4.2 Minimum Description Length Discretization 68 4.3 Case Study: Heavy Rainfall Forecast 70 4.3.1 Introduction 71 4.3.2 Early Warning System 73 4.3.3 Experiments 80 4.3.4 Discussions 92 5 Conclusions 95Docto

SNU Open Repository and Archive

Cooperative Profit Random Forests With Application in Ocean Front Recognition.

Author: Dong J.
Saeeda H.
Sun Jianyuan
Zhang Q.
Zhong G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/01/2017
Field of study

Random Forests are powerful classification and regression tools that are commonly applied in machine learning and image processing. In the majority of random classification forests algorithms, the Gini index and the information gain ratio are commonly used for node splitting. However, these two kinds of node-split methods may pay less attention to the intrinsic structure of the attribute variables and fail to find attributes with strong discriminate ability as a group yet weak as individuals. In this paper, we propose an innovative method for splitting the tree nodes based on the cooperative game theory, from which some attributes with good discriminate ability as a group can be learned. This new random forests algorithm is called Cooperative Profit Random Forests (CPRF). Experimental comparisons with several other existing random classification forests algorithms are carried out on several real-world data sets, including remote sensing images. The results show that CPRF outperforms other existing Random Forests algorithms in most cases. In particular, CPRF achieves promising results in ocean front recognition

Bournemouth University Research Online

Machine learning application in modelling marine and coastal phenomena: a critical review

Author: Brocchini Maurizio
Jalali Mahdi
Pourzangbar Ali
Publication venue: Frontiers Media SA
Publication date: 31/10/2023
Field of study

KITopen

Classification of Polarimetric SAR Images Using Compact Convolutional Neural Networks

Author: Ahishali Mete
Gabbouj Moncef
Ince Turker
Kiranyaz Serkan
Publication venue
Publication date: 10/11/2020
Field of study

Classification of polarimetric synthetic aperture radar (PolSAR) images is an active research area with a major role in environmental applications. The traditional Machine Learning (ML) methods proposed in this domain generally focus on utilizing highly discriminative features to improve the classification performance, but this task is complicated by the well-known "curse of dimensionality" phenomena. Other approaches based on deep Convolutional Neural Networks (CNNs) have certain limitations and drawbacks, such as high computational complexity, an unfeasibly large training set with ground-truth labels, and special hardware requirements. In this work, to address the limitations of traditional ML and deep CNN based methods, a novel and systematic classification framework is proposed for the classification of PolSAR images, based on a compact and adaptive implementation of CNNs using a sliding-window classification approach. The proposed approach has three advantages. First, there is no requirement for an extensive feature extraction process. Second, it is computationally efficient due to utilized compact configurations. In particular, the proposed compact and adaptive CNN model is designed to achieve the maximum classification accuracy with minimum training and computational complexity. This is of considerable importance considering the high costs involved in labelling in PolSAR classification. Finally, the proposed approach can perform classification using smaller window sizes than deep CNNs. Experimental evaluations have been performed over the most commonly-used four benchmark PolSAR images: AIRSAR L-Band and RADARSAT-2 C-Band data of San Francisco Bay and Flevoland areas. Accordingly, the best obtained overall accuracies range between 92.33 - 99.39% for these benchmark study sites

arXiv.org e-Print Archive

Directory of Open Access Journals

Trepo - Institutional Repository of Tampere University