1 research outputs found

    A Hypotheses-based Method for Identifying Skewed Itemsets

    No full text
    Parallel and distributed association rule mining are very important research subjects, with various work addressing them. Data skewness, which describes the degree of non-uniformity of the itemset distribution among database partitions, causes various problems to parallel and distributed association rule mining algorithms, such as the generation of many false candidate itemsets. However, some algorithms employ techniques in order to not only overcome these problems but also to take advantage of data skewness with the purpose of improving their performance. For instance, some algorithms employ skewness-based pruning techniques. In the literature, an entropy-based metric has been used for measuring data skewness. In this paper we present a method for identifying skewed itemsets which uses tests of statistical hypotheses. This method has some advantages over the entropy-based method and can also be used in environments with privacy-preserving constraints. As a result we show that our approach is more accurate for identifying skewed itemsets than the entropy-based method. 1
    corecore