14,831 research outputs found

    The Inverse Bagging Algorithm: Anomaly Detection by Inverse Bootstrap Aggregating

    Full text link
    For data sets populated by a very well modeled process and by another process of unknown probability density function (PDF), a desired feature when manipulating the fraction of the unknown process (either for enhancing it or suppressing it) consists in avoiding to modify the kinematic distributions of the well modeled one. A bootstrap technique is used to identify sub-samples rich in the well modeled process, and classify each event according to the frequency of it being part of such sub-samples. Comparisons with general MVA algorithms will be shown, as well as a study of the asymptotic properties of the method, making use of a public domain data set that models a typical search for new physics as performed at hadronic colliders such as the Large Hadron Collider (LHC).Comment: 8 pages, 5 figures. Proceedings of the XIIth Quark Confinement and Hadron Spectrum conference, 28/8-2/9 2016, Thessaloniki, Greec

    Development of a Stope Stability Prediction Model Using Ensemble Learning Techniques - A Case Study

    Get PDF
    The consequences of collapsed stopes can be dire in the mining industry. This can lead to the revocation of a mining license in most jurisdictions, especially when the harm costs lives. Therefore, as a mine planning and technical services engineer, it is imperative to estimate the stability status of stopes. This study has attempted to produce a stope stability prediction model adopted from stability graph using ensemble learning techniques. This study was conducted using 472 case histories from 120 stopes of AngloGold Ashanti Ghana, Obuasi Mine. Random Forest, Gradient Boosting, Bootstrap Aggregating and Adaptive Boosting classification algorithms were used to produce the models. A comparative analysis was done using six classification performance metrics namely Accuracy, Precision, Sensitivity, F1-score, Specificity and Mathews Correlation Coefficient (MCC) to determine which ensemble learning technique performed best in predicting the stability of a stope. The Bootstrap Aggregating model obtained the highest MCC score of 96.84% while the Adaptive Boosting model obtained the lowest score. The Specificity scores in decreasing order of performance were 98.95%, 97.89%, 96.32% and 95.26% for Bootstrap Aggregating, Gradient Boosting, Random Forest and Adaptive Boosting respectively. The results showed equal Accuracy, Precision, F1-score and Sensitivity score of 97.89% for the Bootstrap Aggregating model while the same observation was made for Adaptive Boosting, Gradient Boosting and Random Forest with 90.53%, 92.63% and 95.79% scores respectively. At a 95% confidence interval using Wilson Score Interval, the results showed that the Bootstrap Aggregating model produced the minimal error and hence was selected as the alternative stope design tool for predicting the stability status of stopes.   Keywords: Stope Stability, Ensemble Learning Techniques, Stability Graph, Machine Learnin

    KETEPATAN KLASIFIKASI PEMILIHAN METODE KONTRASEPSI DI KOTA SEMARANG MENGGUNAKAN BOOSTSTRAP AGGREGATTING REGRESI LOGISTIK MULTINOMIAL

    Get PDF
    Classification is one of the statistical methods in grouping the data compiled systematically. Classification problem rises when there are a number of measures that consists of one or several categories that can not be identified directly but must use a measure. classification methods commonly used in studies to analyze a problem or event is logistic regression analysis. However, this classification method provides unstable parameter estimation. So to obtain a stable parameter multinomial logistic regression model used bootstrap approach that is bootstrap aggregating (bagging). The purpose of this study was to compare the accuracy of the classification multinomial logistic regression models and bootstrap aggragatting model using the data of family planning in Semarang. From the results of bagging multinomial logistic regression obtained classification accuracy in replication bootstrap most 50 times at 51%, this model is able to decrease the classification error of up to 2% compared to the multinomial logistic regression model with a classification accuracy of 49%. Keywords : logistic regression, bootstrap aggregating, accuracy of classificatio

    Klasifikasi Hasil Tes Deteksi Kanker Payudara Berdasarkan Gejala Klinis Menggunakan Metode Classification Tree di RSUD Nur Hidayah Bantul

    Get PDF
    Kanker payudara adalah tumor ganas yang terbentuk dari sel-sel pa-yudara yang tumbuh dan berkembang biak tanpa terkendali sehingga dapat menyebar di antara jaringan atau organ di sekitar payudara atau bahkan dapat menyebar ke bagian tubuh lainnya. Kanker payudara merupakan kanker yang paling sering terjadi diantara wanita. Tingginya kasus kanker payudara menjadikan perhatian khusus bagi kaum wanita untuk selalu was-pada terhadap terjadinya kanker payudara. Tujuan dari penelitian ini ingin dilakukan klasifikasi hasil tes deteksi kanker payudara berdasarkan gejala klinis menggunakan metode Classification Tree dan Bootstrap Aggregating (Bagging). Selanjutnya dilakukan evaluasi ketepatan klasifikasi dengan me-ninjau accuracy, sensitifity, specificity dan AUC. Hasil dari penelitian ini menunjukkan bahwa model pohon klasifikasi optimal yang terbentuk adalah memiliki 4 simpul terminal dengan kedalaman 2. Sedangkan model terbaik yang terbentuk dengan menggunakan metode Bootstrap Aggregating adalah dengan 20 replikasi bootstrap. Hasil dari perbandingan ketepatan klasi-fikasi pohon yang diperoleh dengan menggunakan metode Bootstrap Aggregating tidak lebih baik dari metode Classification Tree, karena pada penelitian ini metode Classification Tree menghasilkan ketepatan klasi-fikasi lebih tinggi dibandingkan dengan metode Bootstrap Aggregating. Walaupun hasil dari Bootstrap Aggregating tidak selalu meningkatkan hasil akurasi dari metode Classification Tree, akan tetapi dapat menghasilkan model yang konsisten. ================================================================================================================================ Breast cancer is a malignant tumor that starts when cells in the breast begin to grow and out of control, so it can spread between tissues, organs around the breast or to other parts of body. Breast cancer occurs almost en-tirely in woman. The high cases of breast cancer make special attention for women to always be watchful of breast cancer’s occurrence. The purpose of this research is classified the result of breast cancer detection tests based on clinical symptoms using Classification Tree and Bootstrap Aggregating (Ba-gging) method. The evaluation are based on accuracy, sensitivity, specificity and AUC. This research shows that the optimal classification tree model has 4 terminal nodes with the depth is 2. While the best model formed using The Bootstrap Aggregating method is with 20 bootstrap’s replications. The Bootstrap Aggregating method is not always better than the Classification Tree method because in this research the accuracy of classification tree is higher than the Bootstrap Aggregating method. Even though the result of Bootstrap Aggregating is not always increasing the accuracy of the Classification Tree method, it can produce a consistent model

    Specimens at the Center: An Informatics Workflow and Toolkit for Specimen-level analysis of Public DNA database data

    Get PDF
    Major public DNA databases — NCBI GenBank, the DNA DataBank of Japan (DDBJ), and the European Molecular Biology Laboratory (EMBL) — are invaluable biodiversity libraries. Systematists and other biodiversity scientists commonly mine these databases for sequence data to use in phylogenetic studies, but such studies generally use only the taxonomic identity of the sequenced tissue, not the specimen identity. Thus studies that use DNA supermatrices to construct phylogenetic trees with species at the tips typically do not take advantage of the fact that for many individuals in the public DNA databases, several DNA regions have been sampled; and for many species, two or more individuals have been sampled. Thus these studies typically do not make full use of the multigene datasets in public DNA databases to test species coherence and select optimal sequences to represent a species. In this study, we introduce a set of tools developed in the R programming language to construct individual-based trees from NCBI GenBank data and present a set of trees for the genus Carex (Cyperaceae) constructed using these methods. For the more than 770 species for which we found sequence data, our approach recovered an average of 1.85 gene regions per specimen, up to seven for some specimens, and more than 450 species represented by two or more specimens. Depending on the subset of genes analyzed, we found up to 42% of species monophyletic. We introduce a simple tree statistic—the Taxonomic Disparity Index (TDI)—to assist in curating specimen-level datasets and provide code for selecting maximally informative (or, conversely, minimally misleading) sequences as species exemplars. While tailored to the Carex dataset, the approach and code presented in this paper can readily be generalized to constructing individual-level trees from large amounts of data for any species group

    Bounding Optimality Gap in Stochastic Optimization via Bagging: Statistical Efficiency and Stability

    Full text link
    We study a statistical method to estimate the optimal value, and the optimality gap of a given solution for stochastic optimization as an assessment of the solution quality. Our approach is based on bootstrap aggregating, or bagging, resampled sample average approximation (SAA). We show how this approach leads to valid statistical confidence bounds for non-smooth optimization. We also demonstrate its statistical efficiency and stability that are especially desirable in limited-data situations, and compare these properties with some existing methods. We present our theory that views SAA as a kernel in an infinite-order symmetric statistic, which can be approximated via bagging. We substantiate our theoretical findings with numerical results
    • …
    corecore