235 research outputs found

    Ada-WHIPS: explaining AdaBoost classification with applications in the health sciences

    Get PDF
    Background Computer Aided Diagnostics (CAD) can support medical practitioners to make critical decisions about their patients’ disease conditions. Practitioners require access to the chain of reasoning behind CAD to build trust in the CAD advice and to supplement their own expertise. Yet, CAD systems might be based on black box machine learning models and high dimensional data sources such as electronic health records, magnetic resonance imaging scans, cardiotocograms, etc. These foundations make interpretation and explanation of the CAD advice very challenging. This challenge is recognised throughout the machine learning research community. eXplainable Artificial Intelligence (XAI) is emerging as one of the most important research areas of recent years because it addresses the interpretability and trust concerns of critical decision makers, including those in clinical and medical practice. Methods In this work, we focus on AdaBoost, a black box model that has been widely adopted in the CAD literature. We address the challenge – to explain AdaBoost classification – with a novel algorithm that extracts simple, logical rules from AdaBoost models. Our algorithm, Adaptive-Weighted High Importance Path Snippets (Ada-WHIPS), makes use of AdaBoost’s adaptive classifier weights. Using a novel formulation, Ada-WHIPS uniquely redistributes the weights among individual decision nodes of the internal decision trees of the AdaBoost model. Then, a simple heuristic search of the weighted nodes finds a single rule that dominated the model’s decision. We compare the explanations generated by our novel approach with the state of the art in an experimental study. We evaluate the derived explanations with simple statistical tests of well-known quality measures, precision and coverage, and a novel measure stability that is better suited to the XAI setting. Results Experiments on 9 CAD-related data sets showed that Ada-WHIPS explanations consistently generalise better (mean coverage 15%-68%) than the state of the art while remaining competitive for specificity (mean precision 80%-99%). A very small trade-off in specificity is shown to guard against over-fitting which is a known problem in the state of the art methods. Conclusions The experimental results demonstrate the benefits of using our novel algorithm for explaining CAD AdaBoost classifiers widely found in the literature. Our tightly coupled, AdaBoost-specific approach outperforms model-agnostic explanation methods and should be considered by practitioners looking for an XAI solution for this class of models

    BoostFM: Boosted Factorization Machines for Top-N Feature-based Recommendation

    Get PDF
    Feature-based matrix factorization techniques such as Factorization Machines (FM) have been proven to achieve impressive accuracy for the rating prediction task. However, most common recommendation scenarios are formulated as a top-N item ranking problem with implicit feedback (e.g., clicks, purchases)rather than explicit ratings. To address this problem, with both implicit feedback and feature information, we propose a feature-based collaborative boosting recommender called BoostFM, which integrates boosting into factorization models during the process of item ranking. Specifically, BoostFM is an adaptive boosting framework that linearly combines multiple homogeneous component recommenders, which are repeatedly constructed on the basis of the individual FM model by a re-weighting scheme. Two ways are proposed to efficiently train the component recommenders from the perspectives of both pairwise and listwise Learning-to-Rank (L2R). The properties of our proposed method are empirically studied on three real-world datasets. The experimental results show that BoostFM outperforms a number of state-of-the-art approaches for top-N recommendation

    Improving Object Detection in Medical Image Analysis through Multiple Expert Annotators: An Empirical Investigation

    Full text link
    The work discusses the use of machine learning algorithms for anomaly detection in medical image analysis and how the performance of these algorithms depends on the number of annotators and the quality of labels. To address the issue of subjectivity in labeling with a single annotator, we introduce a simple and effective approach that aggregates annotations from multiple annotators with varying levels of expertise. We then aim to improve the efficiency of predictive models in abnormal detection tasks by estimating hidden labels from multiple annotations and using a re-weighted loss function to improve detection performance. Our method is evaluated on a real-world medical imaging dataset and outperforms relevant baselines that do not consider disagreements among annotators.Comment: This is a short version submitted to the Midwest Machine Learning Symposium (MMLS 2023), Chicago, IL, US

    Analysis of CART and Random Forest on Statistics Student Status at Universitas Terbuka

    Get PDF
    CART and Random Forest are part of machine learning which is an essential part of the purpose of this research. CART is used to determine student status indicators, and Random Forest improves classification accuracy results. Based on the results of CART, three parameters can affect student status, namely the year of initial registration, number of rolls, and credits. Meanwhile, based on the classification accuracy results, RF can improve the accuracy performance on student status data with a difference in the percentage of CART by 1.44% in training data and testing data by 2.24%.CART and Random Forest are part of machine learning which is an essential part of the purpose of this research. CART is used to determine student status indicators, and Random Forest improves classification accuracy results. Based on the results of CART, three parameters can affect student status, namely the year of initial registration, number of rolls, and credits. Meanwhile, based on the classification accuracy results, RF can improve the accuracy performance on student status data with a difference in the percentage of CART by 1.44% in training data and testing data by 2.24%

    Optimal AdaBoost Converges

    Full text link
    The following work is a preprint collection of formal proofs regarding the convergence properties of the AdaBoost machine learning algorithm's classifier and margins. Various math and computer science papers have been written regarding conjectures and special cases of these convergence properties. Furthermore, the margins of AdaBoost feature prominently in the research surrounding the algorithm. At the zenith of this paper we present how AdaBoost's classifier and margins converge on a value that agrees with decades of research. After this, we show how various quantities associated with the combined classifier converge
    • …
    corecore