2,855 research outputs found

    COMPARISON OF RANDOM FOREST AND NAÏVE BAYES METHODS FOR CLASSIFYING AND FORECASTING SOIL TEXTURE IN THE AREA AROUND DAS KALIKONTO, EAST JAVA

    Get PDF
    Soil texture is used to determine airflow, heat, instability, water holding capacity, and the shape and structure of the soil structure. Soil texture as an important attribute that determines the direction of soil management must be modeled accurately. However, soil texture is a soil attribute that is quite difficult to model. It is a compositional data set that describes the particle size of the soil mineral fraction (sand, silt, and clay). The methods used to classification and predict soil texture with machine learning algorithms are Random Forest (RF) and Naïve Bayes (NB). The purpose of this study was to classify the distribution of soil texture using the Random Forest and Naïve Bayes methods to obtain the most accurate grouping results. This research was conducted in the area around Kalikonto River Basin, East Java Province. The performance-based tests show that the RF algorithm provides higher accuracy in predicting soil texture based on the Digital Elevation Model (DEM). The results of RF’s performance testing on training data and testing data gave an accuracy value of 92.55% and 87.5%. Classification using the Naïve Bayes method produces an accuracy value of 89.98% on testing data and 80.65% accuracy on training data

    A balanced approach to the multi-class imbalance problem

    Get PDF
    The multi-class class-imbalance problem is a subset of supervised machine learning tasks where the classification variable of interest consists of three or more categories with unequal sample sizes. In the fields of manufacturing and business, common machine learning classification tasks such as failure mode, fraud, and threat detection often exhibit class imbalance due to the infrequent occurrence of one or more event states. Though machine learning as a discipline is well established, the study of class imbalance with respect to multi-class learning does not yet have the same deep, rich history. In its current state, the class imbalance literature leverages the use of biased sampling and increasing model complexity to improve predictive performance, and while some have made advances, there is still no standard model evaluation criteria for which to compare their performance. In the presence of substantial multi-class distributional skew, of the model evaluation criteria that can scale beyond the binary case, many become invalid due to their over-emphasis on the majority class observations. Going a step further, many of the evaluation criteria utilized in practice vary significantly across the class imbalance literature and so far no single measure has been able to galvanize consensus due not only to implementation complexity, but the existence of undesirable properties. Therefore, the focus of this research is to introduce a new performance measure, Class Balance Accuracy, designed specifically for model validation in the presence of multi-class imbalance. This paper begins with the statement of definition for Class Balance Accuracy and provides an intuitive proof for its interpretation as a simultaneous lower bound for the average per class recall and average per class precision. Results from comparison studies show that models chosen by maximizing the training class balance accuracy consistently yield both high overall accuracy and per class recall on the test sets compared to the models chosen by other criteria. Simulation studies were then conducted to highlight specific scenarios where the use of class balance accuracy outperforms model selection based on regular accuracy. The measure is then invoked in two novel applications, one as the maximization criteria in the instance selection biased sampling technique and the other as a model selection tool in a multiple classifier system prediction algorithm. In the case of instance selection, the use of class balance accuracy shows improvement over traditional accuracy in scenarios of multi-class class-imbalance data sets with low separability between the majority and minority classes. Likewise, the use of CBA in the multiple classifier system resulted in improved predictions over state of the art methods such as adaBoost for some of the U.C.I. machine learning repository test data sets. The paper then concludes with a discussion of the climbR package, a repository of functions designed to aid in the model evaluation and prediction of class imbalance machine learning problems
    corecore