36 research outputs found

    Evaluation of the Factors Affecting Classification Performance in Class Imbalance Problem

    Get PDF
    In binary classification, when the distribution of numbers in the class is imbalanced, we are aimed to increase the accuracy of classification in classification methods. In our study, simulated data sets and actual data sets are used. In the simulation, the "BinNor" package in the R project, which produces both numerical and categorical data, was utilized. When simulation work is planned, three different effects are considered which may affect the classification performance. These are: sample size, correlation structure and class imbalance rates. Scenarios were created by considering these effects. Each scenario was repeated 1000 times and 10-fold cross-validation was applied. CART, SVM and RF methods have been used in the classification of data sets obtained from both simulation and actual data sets. SMOTE, SMOTEBoost and RUSBoost were used to decrease or completely remove the imbalance of the data before the classification methods were applied. Specificity, sensitivity, balanced accuracy and F-measure were used as performance measures. The simulation results: the imbalance rate increases from 10 to 30, the effect of the 3 algorithms on the classification methods is similar accuracy. Because the class imbalance has become balanced

    Performance Comparison of Independence Tests in Two-Way Contingency Table

    Get PDF
    Several test statistics are available for testing the independence of categorical variables from two-way contingency tables. A vast majority of published articles used the Pearson’s chi-squared test for such purposes; however, this test statistic may lead to biased conclusions under certain conditions. Therefore, we aimed to compare the performance of test statistics via a comprehensive simulation study considering several factors in contingency tables. We also evaluated the performance of each test statistic on a real-life dataset. This study contributes to the literature guiding researchers to select an appropriate test statistic under different conditions

    Principal component scores with loading biplot.

    No full text
    <p>Two principal components are explained almost all of the variability in the performance measures set. The first principal component accounted for 71.50% while the second principal component accounted for 28.40% of the variance of the performance measures data. Seven variables are loaded on the first principal component (AR: Accuracy rate, SP: Specificity, PPV: Positive predictive value, bAR: Balanced accuracy rate, FS: F score, MCC: Matthews correlation coefficient, κ: Kappa) whereas three variables (SE: Sensitivity, NPV: Negative predictive value, DR: Detection rate) are loaded on the second principal component.</p

    Plot tab of the MLViS web-tool.

    No full text
    <p>A dendrogram and a heat map can be created based on the compounds’ molecular similarity.</p

    PubChem tab of the MLViS web-tool.

    No full text
    <p>Users can create and view molecular structures of compounds.</p

    Performance assessment of various statistical learning algorithms in virtual screening of compounds.

    No full text
    <p>AR: Accuracy rate, SE: Sensitivity, SP: Specificity, PPV: Positive predictive value, NPV: Negative predictive value, DR: Detection rate, bAR: Balanced accuracy rate,</p><p>FS: F score, MCC: Matthews correlation coefficient, κ: Kappa statistic. Bold values indicate the top three winner algorithms in each performance measure</p><p>Performance assessment of various statistical learning algorithms in virtual screening of compounds.</p

    Morphometric study of the true S1 and S2 of the normal anddysmorphic sacralized sacra

    No full text
    Background/aim: This study aimed to generate data for the S1 and S2 alar pedicle and body and the alar orientations for both dysmorphic and normal sacra. Materials and methods: The study comprised two groups: Group N consisted of 53 normal sacra and Group D included 10 dysmorphic sacra. Various features such as alar pedicle circumference; anterior, middle, and posterior axis of the sacral ala; sacral body height and width; and sagittal thickness were measured. Results: In group N, the median anterior axis of the alae was observed to be 30 degrees on the right and 25 degrees on the left, the median midline axis was found to be 20 degrees on the right and 15 degrees on the left, and the median posterior alar axis was -15 degrees on the right and -20 degrees on the left. The true S1 and S2 alar pedicle circumferences were observed to be significantly smaller in group D, which demonstrated a shorter S1 alar pedicle mean circumference, significantly narrower S1 body mean width, and considerably tapered sagittal thickness. Conclusion: Our analysis indicated that dysmorphic sacra have a lower sagittal thickness and width of bodies and smaller alar pedicles, which explains the difficulties in their percutaneous fixation.WoSScopu

    Hierarchical cluster dendrogram.

    No full text
    <p>The algorithms used in the study are clustered into five clusters. Cluster 1 and 2 involve the algorithms (RLDA: Robust linear discriminant analysis, bagKNN: Bagged k-nearest neighbors, MDA: Mixture discriminant analysis, KNN: k-Nearest neighbors, SVMrbf: Support vector machines with radial basis function kernel, FDA: Flexible discriminant analysis, J48, C5.0, NN: Neural networks, SVMlin: Support vector machines with linear kernel, lsSVMrbf: Least squares support vector machines with radial basis function kernel, RF: Random forests, bagSVM: Bagged support vector machines), which are loaded on the positive side of the first principal component, and cluster 3 to 5 include the algorithms (LDA: Linear discriminant analysis, lsSVMlin: Least squares support vector machines with linear kernel, NSC: Nearest shrunken centroids, PLS: Partial least squares, QDA: Quadratic discriminant analysis, RQDA: Robust quadratic discriminant analysis, CIT: Conditional inference tree, NB: Naïve bayes, LVQ: Learning vector quantization, CART: Classification and regression trees) that are loaded on the negative side of the first principal component.</p

    Data upload tab of the MLViS web-tool.

    No full text
    <p>Users can upload their files using upload file, paste data or single molecule options.</p

    Statistical Learning Approaches In Diagnosing Patients With Nontraumatic Acute Abdomen

    No full text
    A quick evaluation is required for patients with acute abdominal pain. It is crucial to differentiate between surgical and nonsurgical pathology. Practical and accurate tests are essential in this differentiation. Lately, D-dimer level has been found to be an important adjuvant in this diagnosis and obviously outperforms leukocyte count, which is widely used for diagnosis of certain cases. Here, we handle this problem from a statistical perspective and combine the information from leukocyte count with D-dimer level to increase the diagnostic accuracy of nontraumatic acute abdomen. For this purpose, various statistical learning algorithms are considered and model performances are assessed using several measures. Our results revealed that the naive Bayes algorithm, robust quadratic discriminant analysis, bagged and boosted support vector machines, and single and bagged k-nearest neighbors provide an increase in diagnostic accuracies of up to 8.93% and 17.86% compared with D-dimer level and leukocyte count, respectively. Highest accuracy was obtained as 78.57% with the naive Bayes algorithm. Analysis has been done via the R programming language based on the codes developed by the authors. A user-friendly web-tool is also developed to assist physicians in their decisions to differentially diagnose patients with acute abdomen.WoSScopu
    corecore