139 research outputs found

    Machine learning for predicting soil classes in three semi-arid landscapes

    Get PDF
    Mapping the spatial distribution of soil taxonomic classes is important for informing soil use and management decisions. Digital soil mapping (DSM) can quantitatively predict the spatial distribution of soil taxonomic classes. Key components of DSM are the method and the set of environmental covariates used to predict soil classes. Machine learning is a general term for a broad set of statistical modeling techniques. Many different machine learning models have been applied in the literature and there are different approaches for selecting covariates for DSM. However, there is little guidance as to which, if any, machine learning model and covariate set might be optimal for predicting soil classes across different landscapes. Our objective was to compare multiple machine learning models and covariate sets for predicting soil taxonomic classes at three geographically distinct areas in the semi-arid western United States of America (southern New Mexico, southwestern Utah, and northeastern Wyoming). All three areas were the focus of digital soil mapping studies. Sampling sites at each study area were selected using conditioned Latin hypercube sampling (cLHS). We compared models that had been used in other DSM studies, including clustering algorithms, discriminant analysis, multinomial logistic regression, neural networks, tree based methods, and support vector machine classifiers. Tested machine learning models were divided into three groups based on model complexity: simple, moderate, and complex. We also compared environmental covariates derived from digital elevation models and Landsat imagery that were divided into three different sets: 1) covariates selected a priori by soil scientists familiar with each area and used as input into cLHS, 2) the covariates in set 1 plus 113 additional covariates, and 3) covariates selected using recursive feature elimination. Overall, complex models were consistently more accurate than simple or moderately complex models.Random forests (RF) using covariates selected via recursive feature elimination was consistently most accurate, or was among the most accurate, classifiers sets within each study area. We recommend that for soil taxonomic class prediction, complex models and covariates selected by recursive feature elimination be used. Overall classification accuracy in each study area was largely dependent upon the number of soil taxonomic classes and the frequency distribution of pedon observations between taxonomic classes. 43 Individual subgroup class accuracy was generally dependent upon the number of soil pedon 44 observations in each taxonomic class. The number of soil classes is related to the inherent variability of a given area. The imbalance of soil pedon observations between classes is likely related to cLHS. Imbalanced frequency distributions of soil pedon observations between classes must be addressed to improve model accuracy. Solutions include increasing the number of soil pedon observations in classes with few observations or decreasing the number of classes. Spatial predictions using the most accurate models generally agree with expected soil-landscape relationships. Spatial prediction uncertainty was lowest in areas of relatively low relief for each study area

    Delineation of high resolution climate regions over the Korean Peninsula using machine learning approaches

    Get PDF
    In this research, climate classification maps over the Korean Peninsula at 1 km resolution were generated using the satellite-based climatic variables of monthly temperature and precipitation based on machine learning approaches. Random forest (RF), artificial neural networks (ANN), k-nearest neighbor (KNN), logistic regression (LR), and support vector machines (SVM) were used to develop models. Training and validation of these models were conducted using in-situ observations from the Korea Meteorological Administration (KMA) from 2001 to 2016. The rule of the traditional Koppen-Geiger (K-G) climate classification was used to classify climate regions. The input variables were land surface temperature (LST) of the Moderate Resolution Imaging Spectroradiometer (MODIS), monthly precipitation data from the Tropical Rainfall Measuring Mission (TRMM) 3B43 product, and the Digital Elevation Map (DEM) from the Shuttle Radar Topography Mission (SRTM). The overall accuracy (OA) based on validation data from 2001 to 2016 for all models was high over 95%. DEM and minimum winter temperature were two distinct variables over the study area with particularly high relative importance. ANN produced more realistic spatial distribution of the classified climates despite having a slightly lower OA than the others. The accuracy of the models using high altitudinal in-situ data of the Mountain Meteorology Observation System (MMOS) was also assessed. Although the data length of the MMOS data was relatively short (2013 to 2017), it proved that the snowy, dry and cold winter and cool summer class (Dwc) is widely located in the eastern coastal region of South Korea. Temporal shifting of climate was examined through a comparison of climate maps produced by period: from 1950 to 2000, from 1983 to 2000, and from 2001 to 2013. A shrinking trend of snow classes (D) over the Korean Peninsula was clearly observed from the ANN-based climate classification results. Shifting trends of climate with the decrease/increase of snow (D)/temperate (C) classes were clearly shown in the maps produced using the proposed approaches, consistent with the results from the reanalysis data of the Climatic Research Unit (CRU) and Global Precipitation Climatology Centre (GPCC)

    Regional Forest Volume Estimation by Expanding LiDAR Samples Using Multi-Sensor Satellite Data

    Get PDF
    Accurate information regarding forest volume plays an important role in estimating afforestation, timber harvesting, and forest ecological services. Traditionally, operations on forest growing stock volume using field measurements are labor-intensive and time-consuming. Recently, remote sensing technology has emerged as a time-cost efficient method for forest inventory. In the present study, we have adopted three procedures, including samples expanding, feature selection, and results generation and evaluation. Extrapolating the samples from Light Detection and Ranging (LiDAR) scanning is the most important step in satisfying the requirement of sample size for nonparametric methods operation and result in accuracy improvement. Besides, mean decrease Gini (MDG) methodology embedded into Random Forest (RF) algorithm served as a selector for feature measure; afterwards, RF and K-Nearest Neighbor (KNN) were adopted in subsequent forest volume prediction. The results show that the retrieval of Forest volume in the entire area was in the range of 50–360 m3/ha, and the results from the two models show a better consistency while using the sample combination extrapolated by the optimal threshold value (2 × 10−4), leading to the best performances of RF (R2 = 0.618, root mean square error, RMSE = 43.641 m3/ha, mean absolute error, MAE = 33.016 m3/ha), followed by KNN (R2 = 0.617, RMSE = 43.693 m3/ha, MAE = 32.534 m3/ha). The detailed analysis that is discussed in the present paper clearly shows that expanding image-derived LiDAR samples helps in refining the prediction of regional forest volume while using satellite data and nonparametric models

    Deep Learning Methods for Remote Sensing

    Get PDF
    Remote sensing is a field where important physical characteristics of an area are exacted using emitted radiation generally captured by satellite cameras, sensors onboard aerial vehicles, etc. Captured data help researchers develop solutions to sense and detect various characteristics such as forest fires, flooding, changes in urban areas, crop diseases, soil moisture, etc. The recent impressive progress in artificial intelligence (AI) and deep learning has sparked innovations in technologies, algorithms, and approaches and led to results that were unachievable until recently in multiple areas, among them remote sensing. This book consists of sixteen peer-reviewed papers covering new advances in the use of AI for remote sensing

    A review of machine learning applications in wildfire science and management

    Full text link
    Artificial intelligence has been applied in wildfire science and management since the 1990s, with early applications including neural networks and expert systems. Since then the field has rapidly progressed congruently with the wide adoption of machine learning (ML) in the environmental sciences. Here, we present a scoping review of ML in wildfire science and management. Our objective is to improve awareness of ML among wildfire scientists and managers, as well as illustrate the challenging range of problems in wildfire science available to data scientists. We first present an overview of popular ML approaches used in wildfire science to date, and then review their use in wildfire science within six problem domains: 1) fuels characterization, fire detection, and mapping; 2) fire weather and climate change; 3) fire occurrence, susceptibility, and risk; 4) fire behavior prediction; 5) fire effects; and 6) fire management. We also discuss the advantages and limitations of various ML approaches and identify opportunities for future advances in wildfire science and management within a data science context. We identified 298 relevant publications, where the most frequently used ML methods included random forests, MaxEnt, artificial neural networks, decision trees, support vector machines, and genetic algorithms. There exists opportunities to apply more current ML methods (e.g., deep learning and agent based learning) in wildfire science. However, despite the ability of ML models to learn on their own, expertise in wildfire science is necessary to ensure realistic modelling of fire processes across multiple scales, while the complexity of some ML methods requires sophisticated knowledge for their application. Finally, we stress that the wildfire research and management community plays an active role in providing relevant, high quality data for use by practitioners of ML methods.Comment: 83 pages, 4 figures, 3 table

    Remote Sensing of Natural Hazards

    Get PDF
    Each year, natural hazards such as earthquakes, cyclones, flooding, landslides, wildfires, avalanches, volcanic eruption, extreme temperatures, storm surges, drought, etc., result in widespread loss of life, livelihood, and critical infrastructure globally. With the unprecedented growth of the human population, largescale development activities, and changes to the natural environment, the frequency and intensity of extreme natural events and consequent impacts are expected to increase in the future.Technological interventions provide essential provisions for the prevention and mitigation of natural hazards. The data obtained through remote sensing systems with varied spatial, spectral, and temporal resolutions particularly provide prospects for furthering knowledge on spatiotemporal patterns and forecasting of natural hazards. The collection of data using earth observation systems has been valuable for alleviating the adverse effects of natural hazards, especially with their near real-time capabilities for tracking extreme natural events. Remote sensing systems from different platforms also serve as an important decision-support tool for devising response strategies, coordinating rescue operations, and making damage and loss estimations.With these in mind, this book seeks original contributions to the advanced applications of remote sensing and geographic information systems (GIS) techniques in understanding various dimensions of natural hazards through new theory, data products, and robust approaches

    An uncertainty prediction approach for active learning - application to earth observation

    Get PDF
    Mapping land cover and land usage dynamics are crucial in remote sensing since farmers are encouraged to either intensify or extend crop use due to the ongoing rise in the world’s population. A major issue in this area is interpreting and classifying a scene captured in high-resolution satellite imagery. Several methods have been put forth, including neural networks which generate data-dependent models (i.e. model is biased toward data) and static rule-based approaches with thresholds which are limited in terms of diversity(i.e. model lacks diversity in terms of rules). However, the problem of having a machine learning model that, given a large amount of training data, can classify multiple classes over different geographic Sentinel-2 imagery that out scales existing approaches remains open. On the other hand, supervised machine learning has evolved into an essential part of many areas due to the increasing number of labeled datasets. Examples include creating classifiers for applications that recognize images and voices, anticipate traffic, propose products, act as a virtual personal assistant and detect online fraud, among many more. Since these classifiers are highly dependent from the training datasets, without human interaction or accurate labels, the performance of these generated classifiers with unseen observations is uncertain. Thus, researchers attempted to evaluate a number of independent models using a statistical distance. However, the problem of, given a train-test split and classifiers modeled over the train set, identifying a prediction error using the relation between train and test sets remains open. Moreover, while some training data is essential for supervised machine learning, what happens if there is insufficient labeled data? After all, assigning labels to unlabeled datasets is a time-consuming process that may need significant expert human involvement. When there aren’t enough expert manual labels accessible for the vast amount of openly available data, active learning becomes crucial. However, given a large amount of training and unlabeled datasets, having an active learning model that can reduce the training cost of the classifier and at the same time assist in labeling new data points remains an open problem. From the experimental approaches and findings, the main research contributions, which concentrate on the issue of optical satellite image scene classification include: building labeled Sentinel-2 datasets with surface reflectance values; proposal of machine learning models for pixel-based image scene classification; proposal of a statistical distance based Evidence Function Model (EFM) to detect ML models misclassification; and proposal of a generalised sampling approach for active learning that, together with the EFM enables a way of determining the most informative examples. Firstly, using a manually annotated Sentinel-2 dataset, Machine Learning (ML) models for scene classification were developed and their performance was compared to Sen2Cor the reference package from the European Space Agency – a micro-F1 value of 84% was attained by the ML model, which is a significant improvement over the corresponding Sen2Cor performance of 59%. Secondly, to quantify the misclassification of the ML models, the Mahalanobis distance-based EFM was devised. This model achieved, for the labeled Sentinel-2 dataset, a micro-F1 of 67.89% for misclassification detection. Lastly, EFM was engineered as a sampling strategy for active learning leading to an approach that attains the same level of accuracy with only 0.02% of the total training samples when compared to a classifier trained with the full training set. With the help of the above-mentioned research contributions, we were able to provide an open-source Sentinel-2 image scene classification package which consists of ready-touse Python scripts and a ML model that classifies Sentinel-2 L1C images generating a 20m-resolution RGB image with the six studied classes (Cloud, Cirrus, Shadow, Snow, Water, and Other) giving academics a straightforward method for rapidly and effectively classifying Sentinel-2 scene images. Additionally, an active learning approach that uses, as sampling strategy, the observed prediction uncertainty given by EFM, will allow labeling only the most informative points to be used as input to build classifiers; Sumário: Uma Abordagem de Previsão de Incerteza para Aprendizagem Ativa – Aplicação à Observação da Terra O mapeamento da cobertura do solo e a dinâmica da utilização do solo são cruciais na deteção remota uma vez que os agricultores são incentivados a intensificar ou estender as culturas devido ao aumento contínuo da população mundial. Uma questão importante nesta área é interpretar e classificar cenas capturadas em imagens de satélite de alta resolução. Várias aproximações têm sido propostas incluindo a utilização de redes neuronais que produzem modelos dependentes dos dados (ou seja, o modelo é tendencioso em relação aos dados) e aproximações baseadas em regras que apresentam restrições de diversidade (ou seja, o modelo carece de diversidade em termos de regras). No entanto, a criação de um modelo de aprendizagem automática que, dada uma uma grande quantidade de dados de treino, é capaz de classificar, com desempenho superior, as imagens do Sentinel-2 em diferentes áreas geográficas permanece um problema em aberto. Por outro lado, têm sido utilizadas técnicas de aprendizagem supervisionada na resolução de problemas nas mais diversas áreas de devido à proliferação de conjuntos de dados etiquetados. Exemplos disto incluem classificadores para aplicações que reconhecem imagem e voz, antecipam tráfego, propõem produtos, atuam como assistentes pessoais virtuais e detetam fraudes online, entre muitos outros. Uma vez que estes classificadores são fortemente dependente do conjunto de dados de treino, sem interação humana ou etiquetas precisas, o seu desempenho sobre novos dados é incerta. Neste sentido existem propostas para avaliar modelos independentes usando uma distância estatística. No entanto, o problema de, dada uma divisão de treino-teste e um classificador, identificar o erro de previsão usando a relação entre aqueles conjuntos, permanece aberto. Mais ainda, embora alguns dados de treino sejam essenciais para a aprendizagem supervisionada, o que acontece quando a quantidade de dados etiquetados é insuficiente? Afinal, atribuir etiquetas é um processo demorado e que exige perícia, o que se traduz num envolvimento humano significativo. Quando a quantidade de dados etiquetados manualmente por peritos é insuficiente a aprendizagem ativa torna-se crucial. No entanto, dada uma grande quantidade dados de treino não etiquetados, ter um modelo de aprendizagem ativa que reduz o custo de treino do classificador e, ao mesmo tempo, auxilia a etiquetagem de novas observações permanece um problema em aberto. A partir das abordagens e estudos experimentais, as principais contribuições deste trabalho, que se concentra na classificação de cenas de imagens de satélite óptico incluem: criação de conjuntos de dados Sentinel-2 etiquetados, com valores de refletância de superfície; proposta de modelos de aprendizagem automática baseados em pixels para classificação de cenas de imagens de satétite; proposta de um Modelo de Função de Evidência (EFM) baseado numa distância estatística para detetar erros de classificação de modelos de aprendizagem; e proposta de uma abordagem de amostragem generalizada para aprendizagem ativa que, em conjunto com o EFM, possibilita uma forma de determinar os exemplos mais informativos. Em primeiro lugar, usando um conjunto de dados Sentinel-2 etiquetado manualmente, foram desenvolvidos modelos de Aprendizagem Automática (AA) para classificação de cenas e seu desempenho foi comparado com o do Sen2Cor – o produto de referência da Agência Espacial Europeia – tendo sido alcançado um valor de micro-F1 de 84% pelo classificador, o que representa uma melhoria significativa em relação ao desempenho Sen2Cor correspondente, de 59%. Em segundo lugar, para quantificar o erro de classificação dos modelos de AA, foi concebido o Modelo de Função de Evidência baseado na distância de Mahalanobis. Este modelo conseguiu, para o conjunto de dados etiquetado do Sentinel-2 um micro-F1 de 67,89% na deteção de classificação incorreta. Por fim, o EFM foi utilizado como uma estratégia de amostragem para a aprendizagem ativa, uma abordagem que permitiu atingir o mesmo nível de desempenho com apenas 0,02% do total de exemplos de treino quando comparado com um classificador treinado com o conjunto de treino completo. Com a ajuda das contribuições acima mencionadas, foi possível desenvolver um pacote de código aberto para classificação de cenas de imagens Sentinel-2 que, utilizando num conjunto de scripts Python, um modelo de classificação, e uma imagem Sentinel-2 L1C, gera a imagem RGB correspondente (com resolução de 20m) com as seis classes estudadas (Cloud, Cirrus, Shadow, Snow, Water e Other), disponibilizando à academia um método direto para a classificação de cenas de imagens do Sentinel-2 rápida e eficaz. Além disso, a abordagem de aprendizagem ativa que usa, como estratégia de amostragem, a deteção de classificacão incorreta dada pelo EFM, permite etiquetar apenas os pontos mais informativos a serem usados como entrada na construção de classificadores

    Mapping drought stress in commercial eucalyptus forest plantations using remotely sensed techniques in Southern Africa.

    Get PDF
    Masters Degree. University of KwaZulu-Natal, Pietermaritzburg.Drought is one of the least understood and hazardous natural disasters that leave many parts of the world devastated. To improve understanding and detection of drought onset, remote sensing technology is required to map drought affected areas as it covers large geographical areas. The study aimed to evaluate the utility of a cost-effective Landsat 8 imagery in mapping the spatial extent of drought prone Eucalyptus dunnii plantations. The first objective was to compare the utility of Landsat spectra with a combination of vegetation indices to detect drought affected plantations using the Stochastic gradient boosting algorithm. The test datasets showed that using Landsat 8 spectra only produced an overall accuracy of 74.70% and a kappa value of 0.59. The integration of Landsat 8 spectra with vegetation indices produced an overall accuracy of 83.13% and a kappa of 0.76. The second objective of this study was to do a trend analysis of vegetation health during drought. The normalized difference vegetation index (NDVI) values fluctuated over the years where 2013 had the highest value of 0.68 and 2015 the lowest NDVI of 0.55 and the normalized difference water index (NDWI) had the lowest value in 2015. Most indices showed a similar trend where 2013 had the highest index value and 2015 the lowest. The third objective was to do a trend analysis of rainfall and temperature during drought. The rainfall trend analysis from 2013 to 2017 indicated that the month of February 2017 received the highest rainfall of 154 mm. In addition, July of 2016 received the highest rainfall compared to 2013, 2014, 2015 and 2017 with rainfalls of 6.4 mm, 0.6 mm, 28 mm, and 1 mm, respectively. The temperature trend analysis from 2013 to 2017 indicated that December 2015 had the highest temperature of 28 ° C compared to December of 2013 2014, 2016 and 2017 with temperatures of 24°C, 25°C, 27°C, 24°C, respectively. Furthermore, it was also noted that June 2017 had the highest temperature of 23°C while June 2015 had the lowest at 20°C. The fourth objective of this study was to compare the utility of topographical variables with a combination of Landsat vegetation indices to detect drought affected plantations using the One class support vector machine algorithm. The multiclass support vector machine using Landsat vegetation indices and topographical variables produced an overall accuracy of 73.86% and a kappa value of 0.71 with user’s and producer’s accuracies ranging between 61% to 69% for drought damaged trees, while for healthy trees ranged from 84% to 90%. The one class support vector machine using Landsat vegetation indices and topographical variables produced an overall accuracy of 82.35% and a kappa value of 0.73. The one class support vector machine produced the highest overall accuracy compared to the multiclass SVM and stochastic gradient boosting algorithm. The use of topographical variables further improved the accuracies compared to the combination of Landsat spectra with vegetation indices

    Land Degradation Assessment with Earth Observation

    Get PDF
    This Special Issue (SI) on “Land Degradation Assessment with Earth Observation” comprises 17 original research papers with a focus on land degradation in arid, semiarid and dry-subhumid areas (i.e., desertification) in addition to temperate rangelands, grasslands, woodlands and the humid tropics. The studies cover different spatial, spectral and temporal scales and employ a wealth of different optical and radar sensors. Some studies incorporate time-series analysis techniques that assess the general trend of vegetation or the timing and duration of the reduction in biological productivity caused by land degradation. As anticipated from the latest trend in Earth Observation (EO) literature, some studies utilize the cloud-computing infrastructure of Google Earth Engine to cope with the unprecedented volume of data involved in current methodological approaches. This SI clearly demonstrates the ever-increasing relevance of EO technologies when it comes to assessing and monitoring land degradation. With the recently published IPCC Reports informing us of the severe impacts and risks to terrestrial and freshwater ecosystems and the ecosystem services they provide, the EO scientific community has a clear obligation to increase its efforts to address any remaining gaps—some of which have been identified in this SI—and produce highly accurate and relevant land-degradation assessment and monitoring tools
    corecore