16 research outputs found

    Enhance the Accuracy of k-Nearest Neighbor (k-NN) for Unbalanced Class Data Using Synthetic Minority Oversampling Technique (SMOTE) and Gain Ratio (GR)

    Get PDF
    k-Nearest Neighbor (k-NN) has very good accuracy results on data with almost the same class distribution, but on the contrary for information whose class distribution is not the same, the accuracy of k-NN will generally be lower. In addition, k-NN does not separate information for each class, implying that each class has an equal influence in determining the new information class, so it is important to choose a class that generally applies to information before characterizing the class assignments process. To overcome this problem, we will propose a structure that uses the Synthetic Minority Oversampling Technique (SMOTE) strategy to address class distribution problems and Gain Ratio (GR) to perform attribute selection to generate a new dataset with a reasonable class spread and significant class information attributes. E-Coli and Glass Identification were among the datasets used in this review. For objective results, the 10-fold-cross validation method will be used as an evaluation method with k values 1 to 10. The results of the research prove that SMOTE and GR can increase the accuracy of the k-NN method, where the highest increase occurred in the Glass Identification dataset by a difference increase of 18.5%. The lowest increase in accuracy occurred in the E-Coli dataset with an increase of 11.4%. The overall proposed method has given better performance, although the value of precision, recall, and F1-Score is not better than original k-NN when used in dataset E-Coli. To all datasets, an improvement from precision is 41.0%, recall is 43.4% and F1-Score is 41.5%

    Geochemical Anomaly Detection in the Irankuh District Using Hybrid Machine Learning Technique and Fractal Modeling

    Get PDF
    Prediction of elemental concentrations is essential in mineral exploration as it plays a vital role in detailed exploration. New machine learning (ML) methods, such as hybrid models, are robust approaches infrequently used concerning other methods in this field; therefore, they have not been examined properly. In this study, a hybrid machine learning (HML) method was proposed based on combining K-Nearest Neighbor Regression (KNNR) and Random Forest Regression (RFR) to predict Pb and Zn grades in the Irankuh district, Sanandaj-Sirjan Zone.. The aim of the proposed study is to employ the hybrid model as a new method for grade distribution. The KNNR-RFR hybrid model results have been applied for the Pb and Zn anomalies classification. The hybrid (KNNR-RFR) method has shown more accurate prediction outputs based on the correlation coefficients than the single regression models with 0.66 and 0.54 correlation coefficients for Pb and Zn, respectively. The KNN-RF results were used to classify Pb and Zn anomalies in the study area. The concentration-area fractal model separated the main anomalous areas for these elements. The Pb and Zn main anomalies were correlated with mining activities and core drilling data. The current study demonstrates that the hybrid model has a substantial potential for the ore elemental distribution prediction. The presented model expresses a promising result and can predict ore grades in similar investigations

    A Bonferroni Mean Based Fuzzy K Nearest Centroid Neighbor Classifier

    Get PDF
    K-nearest neighbor (KNN) is an effective nonparametric classifier that determines the neighbors of a point based only on distance proximity. The classification performance of KNN is disadvantaged by the presence of outliers in small sample size datasets and its performance deteriorates on datasets with class imbalance. We propose a local Bonferroni Mean based Fuzzy K-Nearest Centroid Neighbor (BM-FKNCN) classifier that assigns class label of a query sample dependent on the nearest local centroid mean vector to better represent the underlying statistic of the dataset. The proposed classifier is robust towards outliers because the Nearest Centroid Neighborhood (NCN) concept also considers spatial distribution and symmetrical placement of the neighbors. Also, the proposed classifier can overcome class domination of its neighbors in datasets with class imbalance because it averages all the centroid vectors from each class to adequately interpret the distribution of the classes. The BM-FKNCN classifier is tested on datasets from the Knowledge Extraction based on Evolutionary Learning (KEEL) repository and benchmarked with classification results from the KNN, Fuzzy-KNN (FKNN), BM-FKNN and FKNCN classifiers. The experimental results show that the BM-FKNCN achieves the highest overall average classification accuracy of 89.86% compared to the other four classifiers

    A primitive machine learning tool for the mechanical property prediction of multiple principal element alloys

    Full text link
    Multi-principal element alloys (MPEAs) are produced by combining metallic elements in what is a diverse range of proportions. MPEAs reported to date have revealed promising performance due to their exceptional mechanical properties. Training a machine learning (ML) model on known performance data is a reasonable method to rationalise the complexity of composition dependent mechanical properties of MPEAs. This study utilises data from a specifically curated dataset, that contains information regarding six mechanical properties of MPEAs. A parser tool was introduced to convert chemical composition of alloys into the input format of the ML models, and a number of ML models were applied. Finally, Gradio was used to visualise the ML model predictions and to create a user-interactive interface. The ML model presented is an initial primitive model (as it does not factor in aspects such as MPEA production and processing route), however serves as a an initial user tool, whilst also providing a workflow for other researchers

    Performance analysis of various machine learning algorithms for CO2 leak prediction and characterization in geo-sequestration injection wells

    Get PDF
    The effective detection and prevention of CO2 leakage in active injection wells are paramount for safe carbon capture and storage (CCS) initiatives. This study assesses five fundamental machine learning algorithms, namely, Support Vector Regression (SVR), K-Nearest Neighbor Regression (KNNR), Decision Tree Regression (DTR), Random Forest Regression (RFR), and Artificial Neural Network (ANN), for use in developing a robust data-driven model to predict potential CO2 leakage incidents in injection wells. Leveraging wellhead and bottom-hole pressure and temperature data, the models aim to simultaneously predict the location and size of leaks. A representative dataset simulating various leak scenarios in a saline aquifer reservoir was utilized. The findings reveal crucial insights into the relationships between the variables considered and leakage characteristics. With its positive linear correlation with depth of leak, wellhead pressure could be a pivotal indicator of leak location, while the negative linear relationship with well bottom-hole pressure demonstrated the strongest association with leak size. Among the predictive models examined, the highest prediction accuracy was achieved by the KNNR model for both leak localization and sizing. This model displayed exceptional sensitivity to leak size, and was able to identify leak magnitudes representing as little as 0.0158% of the total main flow with relatively high levels of accuracy. Nonetheless, the study underscored that accurate leak sizing posed a greater challenge for the models compared to leak localization. Overall, the findings obtained can provide valuable insights into the development of efficient data-driven well-bore leak detection systems.<br/

    Predicting Field Value with Interpretable Models

    Get PDF
    This thesis explores predicting current prices of individual agricultural fields in Finland based on historical data. The task is to predict field prices accurately with the data we have available while keeping model predictions interpretable and well explainable. The research question is to find which out of several different models we try out is most optimal for the task. The motivation behind this research is the growing agricultural land market and the lack of publicly available field valuation services that can assist market participants to determine and identify reasonable asking prices. Previous studies on the topic have used standard statistics to establish relevant factors that affect field prices. Rather than creating a model whose predictions can be used on their own in every case, the primary purpose of previous works has indeed been to identify information that should be considered in manual field valuation. We, on the other hand, focus on the predictive ability of models that do not require any manual labor. Our modelling approaches focus mainly but not exclusively on algorithms based on Markov–Chain Monte Carlo. We create a nearest neighbors model and four hierarchical linear models of varying complexity. Performance comparisons lead us to recommend a nearest neighbor -type model for this task

    Combination of Machine Learning Algorithms with Concentration-Area Fractal Method for Soil Geochemical Anomaly Detection in Sediment-Hosted Irankuh Pb-Zn Deposit, Central Iran

    Get PDF
    Prediction of geochemical concentration values is essential in mineral exploration as it plays a principal role in the economic section. In this paper, four regression machine learning (ML) algorithms, such as K neighbor regressor (KNN), support vector regressor (SVR), gradient boosting regressor (GBR), and random forest regressor (RFR), have been trained to build our proposed hybrid ML (HML) model. Three metric measurements, including the correlation coefficient, mean absolute error (MAE), and means squared error (MSE), have been selected for model prediction performance. The final prediction of Pb and Zn grades is achieved using the HML model as they outperformed other algorithms by inheriting the advantages of individual regression models. Although the introduced regression algorithms can solve problems as single, non-complex, and robust regression models, the hybrid techniques can be used for the ore grade estimation with better performance. The required data are gathered from in situ soil. The objective of the recent study is to use the ML model’s prediction to classify Pb and Zn anomalies by concentration-area fractal modeling in the study area. Based on this fractal model results, there are five geochemical populations for both cases. These elements’ main anomalous regions were correlated with mining activities and core drilling data. The results indicate that our method is promising for predicting the ore elemental distribution
    corecore