research

On-line sampling methods for discovering association rules

Abstract

Association rule discovery is one of the prototypical problems in data mining. In this problem, the input database is assumed to be very large and most of the algorithms are designed to minimize the number of scans of the database. Enumerating association rules is usually an expensive task due to the size of the input database. A proposed approach for reducing the running time of this process is random sampling. Of course, any implementation of an algorithm that uses sampling must solve the problem of determining which sample size is appropriate. Previous research of sampling for association rule mining has approached this problem concluding that, in general, the theoretically obtained sample size bounds are far from what is observed in practice. In this paper, we try to reduce this gap between theory and practice. We propose two on-line sampling algorithms for association rule mining. Our algorithms maintain the same theoretical guarantees of previous approaches while using a much smaller number of transactions in most of the cases. In the experiments we report, this improvement is often by an order of magnitude.Postprint (published version

    Similar works