    Machine Learning Approaches for Natural Resource Data

    Abstract Real life applications involving efficient management of natural resources are dependent on accurate geographical information. This information is usually obtained by manual on-site data collection, via automatic remote sensing methods, or by the mixture of the two. Natural resource management, besides accurate data collection, also requires detailed analysis of this data, which in the era of data flood can be a cumbersome process. With the rising trend in both computational power and storage capacity, together with lowering hardware prices, data-driven decision analysis has an ever greater role. In this thesis, we examine the predictability of terrain trafficability conditions and forest attributes by using a machine learning approach with geographic information system data. Quantitative measures on the prediction performance of terrain conditions using natural resource data sets are given through five distinct research areas located around Finland. Furthermore, the estimation capability of key forest attributes is inspected with a multitude of modeling and feature selection techniques. The research results provide empirical evidence on whether the used natural resource data is sufficiently accurate enough for practical applications, or if further refinement on the data is needed. The results are important especially to forest industry since even slight improvements to the natural resource data sets utilized in practice can result in high saves in terms of operation time and costs. Model evaluation is also addressed in this thesis by proposing a novel method for estimating the prediction performance of spatial models. Classical model goodness of fit measures usually rely on the assumption of independently and identically distributed data samples, a characteristic which normally is not true in the case of spatial data sets. Spatio-temporal data sets contain an intrinsic property called spatial autocorrelation, which is partly responsible for breaking these assumptions. The proposed cross validation based evaluation method provides model performance estimation where optimistic bias due to spatial autocorrelation is decreased by partitioning the data sets in a suitable way. Keywords: Open natural resource data, machine learning, model evaluationTiivistelmä Käytännön sovellukset, joihin sisältyy luonnonvarojen hallintaa ovat riippuvaisia tarkasta paikkatietoaineistosta. Tämä paikkatietoaineisto kerätään usein manuaalisesti paikan päällä, automaattisilla kaukokartoitusmenetelmillä tai kahden edellisen yhdistelmällä. Luonnonvarojen hallinta vaatii tarkan aineiston keräämisen lisäksi myös sen yksityiskohtaisen analysoinnin, joka tietotulvan aikakautena voi olla vaativa prosessi. Nousevan laskentatehon, tallennustilan sekä alenevien laitteistohintojen myötä datapohjainen päätöksenteko on yhä suuremmassa roolissa. Tämä väitöskirja tutkii maaston kuljettavuuden ja metsäpiirteiden ennustettavuutta käyttäen koneoppimismenetelmiä paikkatietoaineistojen kanssa. Maaston kuljettavuuden ennustamista mitataan kvantitatiivisesti käyttäen kaukokartoitusaineistoa viideltä eri tutkimusalueelta ympäri Suomea. Tarkastelemme lisäksi tärkeimpien metsäpiirteiden ennustettavuutta monilla eri mallintamistekniikoilla ja piirteiden valinnalla. Väitöstyön tulokset tarjoavat empiiristä todistusaineistoa siitä, onko käytetty luonnonvaraaineisto riittävän laadukas käytettäväksi käytännön sovelluksissa vai ei. Tutkimustulokset ovat tärkeitä erityisesti metsäteollisuudelle, koska pienetkin parannukset luonnonvara-aineistoihin käytännön sovelluksissa voivat johtaa suuriin säästöihin niin operaatioiden ajankäyttöön kuin kuluihin. Tässä työssä otetaan kantaa myös mallin evaluointiin esittämällä uuden menetelmän spatiaalisten mallien ennustuskyvyn estimointiin. Klassiset mallinvalintakriteerit nojaavat yleensä riippumattomien ja identtisesti jakautuneiden datanäytteiden oletukseen, joka ei useimmiten pidä paikkaansa spatiaalisilla datajoukoilla. Spatio-temporaaliset datajoukot sisältävät luontaisen ominaisuuden, jota kutsutaan spatiaaliseksi autokorrelaatioksi. Tämä ominaisuus on osittain vastuussa näiden oletusten rikkomisesta. Esitetty ristiinvalidointiin perustuva evaluointimenetelmä tarjoaa mallin ennustuskyvyn mitan, missä spatiaalisen autokorrelaation vaikutusta vähennetään jakamalla datajoukot sopivalla tavalla. Avainsanat: Avoin luonnonvara-aineisto, koneoppiminen, mallin evaluoint

    Semi-Automatization of Support Vector Machines to Map Lithium (Li) Bearing Pegmatites

    Machine learning (ML) algorithms have shown great performance in geological remote sensing applications. The study area of this work was the Fregeneda–Almendra region (Spain–Portugal) where the support vector machine (SVM) was employed. Lithium (Li)-pegmatite exploration using satellite data presents some challenges since pegmatites are, by nature, small, narrow bodies. Consequently, the following objectives were defined: (i) train several SVM’s on Sentinel-2 images with different parameters to find the optimal model; (ii) assess the impact of imbalanced data; (iii) develop a successful methodological approach to delineate target areas for Li-exploration. Parameter optimization and model evaluation was accomplished by a two-staged grid-search with cross-validation. Several new methodological advances were proposed, including a region of interest (ROI)-based splitting strategy to create the training and test subsets, a semi-automatization of the classification process, and the application of a more innovative and adequate metric score to choose the best model. The proposed methodology obtained good results, identifying known Li-pegmatite occurrences as well as other target areas for Li-exploration. Also, the results showed that the class imbalance had a negative impact on the SVM performance since known Li-pegmatite occurrences were not identified. The potentials and limitations of the methodology proposed are highlighted and its applicability to other case studies is discussed.The authors would like to thank the financial support provided by FCT—Fundação para a Ciência e a Tecnologia, I.P., with the ERA-MIN/0001/2017—LIGHTS project. The work was also supported by National Funds through the FCT project UIDB/04683/2020—ICT (Institute of Earth Sciences). Joana Cardoso-Fernandes is financially supported within the compass of a Ph.D. Thesis, ref. SFRH/BD/136108/2018, by national funds from MCTES through FCT, and co-financed by the European Social Fund (ESF) through POCH—Programa Operacional Capital Humano. The Spanish Ministerio de Ciencia, Innovacion y Universidades (Project RTI2018-094097-B-100, with ERDF funds) and the University of the Basque Country (UPV/EHU) (grant GIU18/084) also contributed economically

    FUCOM-MOORA and FUCOM-MOOSRA: new MCDM-based knowledge-driven procedures for mineral potential mapping in greenfields

    AbstractIn this study, we present the application of two novel hybrid multiple-criteria decision-making (MCDM) techniques in the mineral potential mapping (MPM), namely FUCOM-MOORA and FUCOM-MOOSRA, as robust computational frameworks for MPM. These were applied to a set of exploration targeting criteria of skarn. The multi-objective optimization method on the basis of ratio analysis (MOORA) and the multi-objective optimization on the basis of simple ratio analysis (MOOSRA) approaches are used to prioritize and rank individual cells. What makes MOORA and MOOSRA more reliable compared to many other methods is the fact that the optimizations procedure is applied to calculate the prospectivity score of individual unit cells. This reduces the uncertainty stemming from erroneous mathematical calculations. The full consistency method (FUCOM), on the other hand, is useful for assigning weights to the spatial proxies. The FUCOM method, as a pairwise comparison method, reduces a large number of pairwise comparisons of similar and popular approaches such as analytic hierarchy process (AHP) with n(n1)/2n\left( {n - 1} \right)/2 n n - 1 / 2 and the best–worst method (BWM) with 2n32n - 3 2 n - 3 number of pairwise comparisons with n1n - 1 n - 1 which leads to a less time-consuming and more consistent performance compared with AHP and BWM. These were applied to a set of exploration targeting criteria of skarn iron deposits from Central Iran. Two potential maps were retrieved from the procedures applied, the comparison of which using correct classification rates and field checks revealed the superiority of FUCOM-MOOSRA over the FUCOM-MOORA

    Application of Artificial Neural Networks to geological classification: porphyry prospectivity in British Columbia and oil reservoir properties in Iran

    Seismic facies analysis aims to classify oil and gas reservoirs into geologically and petrophysically meaningful rock groups, or classes. An artificial neural network (ANN) is a versatile and efficient tool for classifying data or estimating subsurface properties from large geophysical datasets. This tool can provide critical information for oilfield development and reservoir characterization. This study includes application of artificial neural networks on two different datasets: 1) geophysical characterization of an oil reservoir in Iran and 2) geological prospectivity for porphyry in British Columbia, Canada. In the first case study, I utilize seismic attributes, well-log data, and core data analysis and use supervised machine learning techniques to efficiently estimate the acoustic impedance and porosity of the reservoir and to classify it into four lithological classes. Seismic attributes as inputs for our techniques capture the lithological patterns or structural characteristics in the seismic amplitude, phase, frequency, and other complex seismic properties that cannot be directly seen in the original seismic images. Selection of an optimal set of input features from the vast number of possible mathematical transformations of seismic data is a critical task for reservoir property prediction and classification. This selection is performed by standard as well as innovative procedures employing properties of the target classes. Three different supervised approaches to non-linear classification are used: 1) the so-called probabilistic neural network (PNN), 2) conventional ANN, and 3) an ANN with the new approach of optimal attribute selection. For each of these approaches, images of classification confidence levels and confidence-filtered class images are produced. Assessments of the robustness and accuracy of seismic facies classification is performed for each of these algorithms. The ANN classifiers are validated using validation and test data subsets. The proposed algorithm shows a higher performance, particularly in comparison with the PNN algorithm. Several visualization techniques are used to examine and illustrate the power of the ANN-based approaches to classify the seismic facies with high accuracy. However, the three approaches still provide significantly different levels of lateral continuity, frequency content, and classification accuracy. Therefore, some level of expert assessment is still required when using machine learning for reservoir interpretation. In the second case study, I use an ANN to explore the prospectivity for porphyry within the Quesnel Terrane, BC, Canada. A purely data-driven approach based on geophysical, structural, and volcanic-age data results in a predictive prospectivity map which correlates well with known mineral occurrences and suggests new areas for potential exploration

    Combination of Machine Learning Algorithms with Concentration-Area Fractal Method for Soil Geochemical Anomaly Detection in Sediment-Hosted Irankuh Pb-Zn Deposit, Central Iran

    Prediction of geochemical concentration values is essential in mineral exploration as it plays a principal role in the economic section. In this paper, four regression machine learning (ML) algorithms, such as K neighbor regressor (KNN), support vector regressor (SVR), gradient boosting regressor (GBR), and random forest regressor (RFR), have been trained to build our proposed hybrid ML (HML) model. Three metric measurements, including the correlation coefficient, mean absolute error (MAE), and means squared error (MSE), have been selected for model prediction performance. The final prediction of Pb and Zn grades is achieved using the HML model as they outperformed other algorithms by inheriting the advantages of individual regression models. Although the introduced regression algorithms can solve problems as single, non-complex, and robust regression models, the hybrid techniques can be used for the ore grade estimation with better performance. The required data are gathered from in situ soil. The objective of the recent study is to use the ML model’s prediction to classify Pb and Zn anomalies by concentration-area fractal modeling in the study area. Based on this fractal model results, there are five geochemical populations for both cases. These elements’ main anomalous regions were correlated with mining activities and core drilling data. The results indicate that our method is promising for predicting the ore elemental distribution

    Deep Belief Network and Auto-Encoder for Face Classification

    The Deep Learning models have drawn ever-increasing research interest owing to their intrinsic capability of overcoming the drawback of traditional algorithm. Hence, we have adopted the representative Deep Learning methods which are Deep Belief Network (DBN) and Stacked Auto-Encoder (SAE), to initialize deep supervised Neural Networks (NN), besides of Back Propagation Neural Networks (BPNN) applied to face classification task. Moreover, our contribution is to extract hierarchical representations of face image based on the Deep Learning models which are: DBN, SAE and BPNN. Then, the extracted feature vectors of each model are used as input of NN classifier. Next, to test our approach and evaluate its performance, a simulation series of experiments were performed on two facial databases: BOSS and MIT. Our proposed approach which is (DBN,NN) has a significant improvement on the classification error rate compared to (SAE,NN) and BPNN which we get 1.14% and 1.96% in terms of error rate with BOSS and MIT respectively

    Computational intelligent impact force modeling and monitoring in HISLO conditions for maximizing surface mining efficiency, safety, and health

    Shovel-truck systems are the most widely employed excavation and material handling systems for surface mining operations. During this process, a high-impact shovel loading operation (HISLO) produces large forces that cause extreme whole body vibrations (WBV) that can severely affect the safety and health of haul truck operators. Previously developed solutions have failed to produce satisfactory results as the vibrations at the truck operator seat still exceed the “Extremely Uncomfortable Limits”. This study was a novel effort in developing deep learning-based solution to the HISLO problem. This research study developed a rigorous mathematical model and a 3D virtual simulation model to capture the dynamic impact force for a multi-pass shovel loading operation. The research further involved the application of artificial intelligence and machine learning for implementing the impact force detection in real time. Experimental results showed the impact force magnitudes of 571 kN and 422 kN, for the first and second shovel pass, respectively, through an accurate representation of HISLO with continuous flow modelling using FEA-DEM coupled methodology. The novel ‘DeepImpact’ model, showed an exceptional performance, giving an R2, RMSE, and MAE values of 0.9948, 10.750, and 6.33, respectively, during the model validation. This research was a pioneering effort for advancing knowledge and frontiers in addressing the WBV challenges in deploying heavy mining machinery in safe and healthy large surface mining environments. The smart and intelligent real-time monitoring system from this study, along with process optimization, minimizes the impact force on truck surface, which in turn reduces the level of vibration on the operator, thus leading to a safer and healthier working mining environments --Abstract, page iii

    Advances in Computational Intelligence Applications in the Mining Industry

    This book captures advancements in the applications of computational intelligence (artificial intelligence, machine learning, etc.) to problems in the mineral and mining industries. The papers present the state of the art in four broad categories: mine operations, mine planning, mine safety, and advances in the sciences, primarily in image processing applications. Authors in the book include both researchers and industry practitioners

    A Comprehensive Survey on Rare Event Prediction

    Rare event prediction involves identifying and forecasting events with a low probability using machine learning and data analysis. Due to the imbalanced data distributions, where the frequency of common events vastly outweighs that of rare events, it requires using specialized methods within each step of the machine learning pipeline, i.e., from data processing to algorithms to evaluation protocols. Predicting the occurrences of rare events is important for real-world applications, such as Industry 4.0, and is an active research area in statistical and machine learning. This paper comprehensively reviews the current approaches for rare event prediction along four dimensions: rare event data, data processing, algorithmic approaches, and evaluation approaches. Specifically, we consider 73 datasets from different modalities (i.e., numerical, image, text, and audio), four major categories of data processing, five major algorithmic groupings, and two broader evaluation approaches. This paper aims to identify gaps in the current literature and highlight the challenges of predicting rare events. It also suggests potential research directions, which can help guide practitioners and researchers.Comment: 44 page