3 research outputs found

    Automated verbal autopsy classification: using one-against-all ensemble method and Naïve Bayes classifier [version 2; referees: 2 approved]

    Get PDF
    Verbal autopsy (VA) deals with post-mortem surveys about deaths, mostly in low and middle income countries, where the majority of deaths occur at home rather than a hospital, for retrospective assignment of causes of death (COD) and subsequently evidence-based health system strengthening. Automated algorithms for VA COD assignment have been developed and their performance has been assessed against physician and clinical diagnoses. Since the performance of automated classification methods remains low, we aimed to enhance the Naïve Bayes Classifier (NBC) algorithm to produce better ranked COD classifications on 26,766 deaths from four globally diverse VA datasets compared to some of the leading VA classification methods, namely Tariff, InterVA-4, InSilicoVA and NBC. We used a different strategy, by training multiple NBC algorithms using the one-against-all approach (OAA-NBC). To compare performance, we computed the cumulative cause-specific mortality fraction (CSMF) accuracies for population-level agreement from rank one to five COD classifications. To assess individual-level COD assignments, cumulative partially-chance corrected concordance (PCCC) and sensitivity was measured for up to five ranked classifications. Overall results show that OAA-NBC consistently assigns CODs that are the most alike physician and clinical COD assignments compared to some of the leading algorithms based on the cumulative CSMF accuracy, PCCC and sensitivity scores. The results demonstrate that our approach improves the performance of classification (sensitivity) by between 6% and 8% compared with other VA algorithms. Population-level agreements for OAA-NBC and NBC were found to be similar or higher than the other algorithms used in the experiments. Although OAA-NBC still requires improvement for individual-level COD assignment, the one-against-all approach improved its ability to assign CODs that more closely resemble physician or clinical COD classifications compared to some of the other leading VA classifiers

    On the Use of Software Tracing and Boolean Combination of Ensemble Classifiers to Support Software Reliability and Security Tasks

    Get PDF
    In this thesis, we propose an approach that relies on Boolean combination of multiple one-class classification methods based on Hidden Markov Models (HMMs), which are pruned using weighted Kappa coefficient to select and combine accurate and diverse classifiers. Our approach, called WPIBC (Weighted Pruning Iterative Boolean Combination) works in three phases. The first phase selects a subset of the available base diverse soft classifiers by pruning all the redundant soft classifiers based on a weighted version of Cohen’s kappa measure of agreement. The second phase selects a subset of diverse and accurate crisp classifiers from the base soft classifiers (selected in Phase1) based on the unweighted kappa measure. The selected complementary crisp classifiers are then combined in the final phase using Boolean combinations. We apply the proposed approach to two important problems in software security and reliability: The detection of system anomalies and the prediction of the reassignment of bug report fields. Detecting system anomalies at run-time is a critical component of system reliability and security. Studies in this area focus mainly on the effectiveness of the proposed approaches -the ability to detect anomalies with high accuracy. Less attention was given to false alarm and efficiency. Although ensemble approaches for the detection of anomalies that use Boolean combination of classifier decisions have been shown to be useful in reducing the false alarm rate over that of a single classifier, existing methods rely on an exponential number of combinations making them impractical even for a small number of classifiers. Our approach is not only able to maintain and even improve the accuracy of existing Boolean combination techniques, but also significantly reduce the combination time and the number of classifiers selected for combination. The second application domain of our approach is the prediction of the reassignment of bug report fields. Bug reports contain a wealth of information that is used by triaging and development teams to understand the causes of bugs in order to provide fixes. The problem is that, for various reasons, it is common to have bug reports with missing or incorrect information, hindering the bug resolution process. To address this problem. researchers have turned to machine learning techniques. The common practice is to build models that leverage historical bug reports to automatically predict when a given bug report field should be reassigned. Existing approaches have mainly relied upon classifiers that make use of natural language in the title and description of the bug reports. They fail to take advantage of the richly detailed sequential information that is present in stack traces included in bug reports. To address this, we propose an approach called EnHMM which uses WPIBC and stack traces to predict the reassignment of bug report fields. Another contribution of this thesis is an approach to improve the efficiency of WPIBC by leveraging the Hadoop framework and the MapReduce programming model. We also show how WPIBC can be extended to support heterogenous classifiers
    corecore