4,182 research outputs found

    Mining Patterns and Rules for Software Specification Discovery

    Get PDF
    Proceedings of the VLDB Endowment121609-161

    A genetic algorithm coupled with tree-based pruning for mining closed association rules

    Get PDF
    Due to the voluminous amount of itemsets that are generated, the association rules extracted from these itemsets contain redundancy, and designing an effective approach to address this issue is of paramount importance. Although multiple algorithms were proposed in recent years for mining closed association rules most of them underperform in terms of run time or memory. Another issue that remains challenging is the nature of the dataset. While some of the existing algorithms perform well on dense datasets others perform well on sparse datasets. This paper aims to handle these drawbacks by using a genetic algorithm for mining closed association rules. Recent studies have shown that genetic algorithms perform better than conventional algorithms due to their bitwise operations of crossover and mutation. Bitwise operations are predominantly faster than conventional approaches and bits consume lesser memory thereby improving the overall performance of the algorithm. To address the redundancy in the mined association rules a tree-based pruning algorithm has been designed here. This works on the principle of minimal antecedent and maximal consequent. Experiments have shown that the proposed approach works well on both dense and sparse datasets while surpassing existing techniques with regard to run time and memory

    Random simulations of a datatable for efficiently mining reliable and non-redundant itemsets

    Get PDF
    International audienceOur goal is twofold: 1) we want to mine the only statistically valid 2-itemsets out of a boolean datatable, 2) on this basis, we want to build the only higher-order non-redundant itemsets compared to their sub-itemsets. For the first task we have designed a randomization test (Tournebool) respectful of the structure of the data variables and independant from the specific distributions of the data. In our test set (959 texts and 8477 terms), this leads to a reduction from 126, 000 2-itemsets to 13, 000 significant ones, at the 99% confidence interval. For the second task, we have devised a hierarchical stepwise procedure (MIDOVA) for evaluating the residual amount of variation devoted to higher-order itemsets, yielding new possible positive or negative high-order relations. On our example, this leads to counts of 7,712 for 2-itemsets to 3 for 6-itemsets, and no higher-order ones, in a computationally efficient way

    Non-Redundant Sequential Rules - Theory and Algorithm

    Get PDF
    A sequential rule expresses a relationship between two series of events happening one after another. Sequential rules are potentially useful for analyzing data in sequential format, ranging from purchase histories, network logs and program execution traces. In this work, we investigate and propose a syntactic characterization of a non-redundant set of sequential rules built upon past work on compact set of representative patterns. A rule is redundant if it can be inferred from another rule having the same support and confidence. When using the set of mined rules as a composite filter, replacing a full set of rules with a non-redundant subset of the rules does not impact the accuracy of the filter. We consider several rule sets based on composition of various types of pattern sets – generators, projected-database generators, closed patterns and projected-database closed patterns. We investigate the completeness and tightness of these rule sets. We characterize a tight and complete set of non
    • …
    corecore