5,678 research outputs found

    Efficient chain structure for high-utility sequential pattern mining

    Get PDF
    High-utility sequential pattern mining (HUSPM) is an emerging topic in data mining, which considers both utility and sequence factors to derive the set of high-utility sequential patterns (HUSPs) from the quantitative databases. Several works have been presented to reduce the computational cost by variants of pruning strategies. In this paper, we present an efficient sequence-utility (SU)-chain structure, which can be used to store more relevant information to improve mining performance. Based on the SU-Chain structure, the existing pruning strategies can also be utilized here to early prune the unpromising candidates and obtain the satisfied HUSPs. Experiments are then compared with the state-of-the-art HUSPM algorithms and the results showed that the SU-Chain-based model can efficiently improve the efficiency performance than the existing HUSPM algorithms in terms of runtime and number of the determined candidates

    I-prune: Item selection for associative classification

    Get PDF
    Associative classification is characterized by accurate models and high model generation time. Most time is spent in extracting and postprocessing a large set of irrelevant rules, which are eventually pruned.We propose I-prune, an item-pruning approach that selects uninteresting items by means of an interestingness measure and prunes them as soon as they are detected. Thus, the number of extracted rules is reduced and model generation time decreases correspondingly. A wide set of experiments on real and synthetic data sets has been performed to evaluate I-prune and select the appropriate interestingness measure. The experimental results show that I-prune allows a significant reduction in model generation time, while increasing (or at worst preserving) model accuracy. Experimental evaluation also points to the chi-square measure as the most effective interestingness measure for item pruning

    Generating High Precision Classification Rules for Screening of Irrelevant Studies in Systematic Review Literature Searches

    Get PDF
    Systematic reviews aim to produce repeatable, unbiased, and comprehensive answers to clinical questions. Systematic reviews are an essential component of modern evidence based medicine, however due to the risks of omitting relevant research they are highly time consuming to create and are largely conducted manually. This thesis presents a novel framework for partial automation of systematic review literature searches. We exploit the ubiquitous multi-stage screening process by training the classifier using annotations made by reviewers in previous screening stages. Our approach has the benefit of integrating seamlessly with the existing screening process, minimising disruption to users. Ideally, classification models for systematic reviews should be easily interpretable by users. We propose a novel, rule based algorithm for use with our framework. A new approach for identifying redundant associations when generating rules is also presented. The proposed approach to redundancy seeks to both exclude redundant specialisations of existing rules (those with additional terms in their antecedent), as well as redundant generalisations (those with fewer terms in their antecedent). We demonstrate the ability of the proposed approach to improve the usability of the generated rules. The proposed rule based algorithm is evaluated by simulated application to several existing systematic reviews. Workload savings of up to 10% are demonstrated. There is an increasing demand for systematic reviews related to a variety of clinical disciplines, such as diagnosis. We examine reviews of diagnosis and contrast them against more traditional systematic reviews of treatment. We demonstrate existing challenges such as target class heterogeneity and high data imbalance are even more pronounced for this class of reviews. The described algorithm accounts for this by seeking to label subsets of non-relevant studies with high precision, avoiding the need to generate a high recall model of the minority class
    • …
    corecore