14,831 research outputs found
The Inverse Bagging Algorithm: Anomaly Detection by Inverse Bootstrap Aggregating
For data sets populated by a very well modeled process and by another process
of unknown probability density function (PDF), a desired feature when
manipulating the fraction of the unknown process (either for enhancing it or
suppressing it) consists in avoiding to modify the kinematic distributions of
the well modeled one. A bootstrap technique is used to identify sub-samples
rich in the well modeled process, and classify each event according to the
frequency of it being part of such sub-samples. Comparisons with general MVA
algorithms will be shown, as well as a study of the asymptotic properties of
the method, making use of a public domain data set that models a typical search
for new physics as performed at hadronic colliders such as the Large Hadron
Collider (LHC).Comment: 8 pages, 5 figures. Proceedings of the XIIth Quark Confinement and
Hadron Spectrum conference, 28/8-2/9 2016, Thessaloniki, Greec
Development of a Stope Stability Prediction Model Using Ensemble Learning Techniques - A Case Study
The consequences of collapsed stopes can be dire in the mining industry. This can lead to the revocation of a mining license in most jurisdictions, especially when the harm costs lives. Therefore, as a mine planning and technical services engineer, it is imperative to estimate the stability status of stopes. This study has attempted to produce a stope stability prediction model adopted from stability graph using ensemble learning techniques. This study was conducted using 472 case histories from 120 stopes of AngloGold Ashanti Ghana, Obuasi Mine. Random Forest, Gradient Boosting, Bootstrap Aggregating and Adaptive Boosting classification algorithms were used to produce the models. A comparative analysis was done using six classification performance metrics namely Accuracy, Precision, Sensitivity, F1-score, Specificity and Mathews Correlation Coefficient (MCC) to determine which ensemble learning technique performed best in predicting the stability of a stope. The Bootstrap Aggregating model obtained the highest MCC score of 96.84% while the Adaptive Boosting model obtained the lowest score. The Specificity scores in decreasing order of performance were 98.95%, 97.89%, 96.32% and 95.26% for Bootstrap Aggregating, Gradient Boosting, Random Forest and Adaptive Boosting respectively. The results showed equal Accuracy, Precision, F1-score and Sensitivity score of 97.89% for the Bootstrap Aggregating model while the same observation was made for Adaptive Boosting, Gradient Boosting and Random Forest with 90.53%, 92.63% and 95.79% scores respectively. At a 95% confidence interval using Wilson Score Interval, the results showed that the Bootstrap Aggregating model produced the minimal error and hence was selected as the alternative stope design tool for predicting the stability status of stopes.
Keywords: Stope Stability, Ensemble Learning Techniques, Stability Graph, Machine Learnin
KETEPATAN KLASIFIKASI PEMILIHAN METODE KONTRASEPSI DI KOTA SEMARANG MENGGUNAKAN BOOSTSTRAP AGGREGATTING REGRESI LOGISTIK MULTINOMIAL
Classification is one of the statistical methods in grouping the data compiled systematically. Classification problem rises when there are a number of measures that consists of one or several categories that can not be identified directly but must use a measure. classification methods commonly used in studies to analyze a problem or event is logistic regression analysis. However, this classification method provides unstable parameter estimation. So to obtain a stable parameter multinomial logistic regression model used bootstrap approach that is bootstrap aggregating (bagging). The purpose of this study was to compare the accuracy of the classification multinomial logistic regression models and bootstrap aggragatting model using the data of family planning in Semarang. From the results of bagging multinomial logistic regression obtained classification accuracy in replication bootstrap most 50 times at 51%, this model is able to decrease the classification error of up to 2% compared to the multinomial logistic regression model with a classification accuracy of 49%.
Keywords : logistic regression, bootstrap aggregating, accuracy of classificatio
Klasifikasi Hasil Tes Deteksi Kanker Payudara Berdasarkan Gejala Klinis Menggunakan Metode Classification Tree di RSUD Nur Hidayah Bantul
Kanker payudara adalah tumor ganas yang terbentuk dari sel-sel pa-yudara yang tumbuh dan berkembang biak tanpa terkendali sehingga dapat menyebar di antara jaringan atau organ di sekitar payudara atau bahkan dapat menyebar ke bagian tubuh lainnya. Kanker payudara merupakan kanker yang paling sering terjadi diantara wanita. Tingginya kasus kanker payudara menjadikan perhatian khusus bagi kaum wanita untuk selalu was-pada terhadap terjadinya kanker payudara. Tujuan dari penelitian ini ingin dilakukan klasifikasi hasil tes deteksi kanker payudara berdasarkan gejala klinis menggunakan metode Classification Tree dan Bootstrap Aggregating (Bagging). Selanjutnya dilakukan evaluasi ketepatan klasifikasi dengan me-ninjau accuracy, sensitifity, specificity dan AUC. Hasil dari penelitian ini menunjukkan bahwa model pohon klasifikasi optimal yang terbentuk adalah memiliki 4 simpul terminal dengan kedalaman 2. Sedangkan model terbaik yang terbentuk dengan menggunakan metode Bootstrap Aggregating adalah dengan 20 replikasi bootstrap. Hasil dari perbandingan ketepatan klasi-fikasi pohon yang diperoleh dengan menggunakan metode Bootstrap Aggregating tidak lebih baik dari metode Classification Tree, karena pada penelitian ini metode Classification Tree menghasilkan ketepatan klasi-fikasi lebih tinggi dibandingkan dengan metode Bootstrap Aggregating. Walaupun hasil dari Bootstrap Aggregating tidak selalu meningkatkan hasil akurasi dari metode Classification Tree, akan tetapi dapat menghasilkan model yang konsisten.
================================================================================================================================
Breast cancer is a malignant tumor that starts when cells in the breast begin to grow and out of control, so it can spread between tissues, organs around the breast or to other parts of body. Breast cancer occurs almost en-tirely in woman. The high cases of breast cancer make special attention for women to always be watchful of breast cancer’s occurrence. The purpose of this research is classified the result of breast cancer detection tests based on clinical symptoms using Classification Tree and Bootstrap Aggregating (Ba-gging) method. The evaluation are based on accuracy, sensitivity, specificity and AUC. This research shows that the optimal classification tree model has 4 terminal nodes with the depth is 2. While the best model formed using The Bootstrap Aggregating method is with 20 bootstrap’s replications. The Bootstrap Aggregating method is not always better than the Classification Tree method because in this research the accuracy of classification tree is higher than the Bootstrap Aggregating method. Even though the result of Bootstrap Aggregating is not always increasing the accuracy of the Classification Tree method, it can produce a consistent model
Specimens at the Center: An Informatics Workflow and Toolkit for Specimen-level analysis of Public DNA database data
Major public DNA databases — NCBI GenBank, the DNA DataBank of Japan (DDBJ), and the European Molecular Biology
Laboratory (EMBL) — are invaluable biodiversity libraries. Systematists and other biodiversity scientists commonly mine these databases for
sequence data to use in phylogenetic studies, but such studies generally use only the taxonomic identity of the sequenced tissue, not the
specimen identity. Thus studies that use DNA supermatrices to construct phylogenetic trees with species at the tips typically do not take
advantage of the fact that for many individuals in the public DNA databases, several DNA regions have been sampled; and for many species,
two or more individuals have been sampled. Thus these studies typically do not make full use of the multigene datasets in public DNA
databases to test species coherence and select optimal sequences to represent a species. In this study, we introduce a set of tools developed
in the R programming language to construct individual-based trees from NCBI GenBank data and present a set of trees for the genus Carex
(Cyperaceae) constructed using these methods. For the more than 770 species for which we found sequence data, our approach recovered an
average of 1.85 gene regions per specimen, up to seven for some specimens, and more than 450 species represented by two or more specimens.
Depending on the subset of genes analyzed, we found up to 42% of species monophyletic. We introduce a simple tree statistic—the
Taxonomic Disparity Index (TDI)—to assist in curating specimen-level datasets and provide code for selecting maximally informative (or,
conversely, minimally misleading) sequences as species exemplars. While tailored to the Carex dataset, the approach and code presented in
this paper can readily be generalized to constructing individual-level trees from large amounts of data for any species group
Bounding Optimality Gap in Stochastic Optimization via Bagging: Statistical Efficiency and Stability
We study a statistical method to estimate the optimal value, and the
optimality gap of a given solution for stochastic optimization as an assessment
of the solution quality. Our approach is based on bootstrap aggregating, or
bagging, resampled sample average approximation (SAA). We show how this
approach leads to valid statistical confidence bounds for non-smooth
optimization. We also demonstrate its statistical efficiency and stability that
are especially desirable in limited-data situations, and compare these
properties with some existing methods. We present our theory that views SAA as
a kernel in an infinite-order symmetric statistic, which can be approximated
via bagging. We substantiate our theoretical findings with numerical results
- …