On-line sampling methods for discovering association rules

Domingo Soriano, Carlos; Gavaldà Mestre, Ricard; Watanabe, Osamu

research

On-line sampling methods for discovering association rules

Authors: Carlos Domingo Soriano
Ricard Gavaldà Mestre
Osamu Watanabe
Publication date: 1 January 1999
Publisher

Abstract

Association rule discovery is one of the prototypical problems in data mining. In this problem, the input database is assumed to be very large and most of the algorithms are designed to minimize the number of scans of the database. Enumerating association rules is usually an expensive task due to the size of the input database. A proposed approach for reducing the running time of this process is random sampling. Of course, any implementation of an algorithm that uses sampling must solve the problem of determining which sample size is appropriate. Previous research of sampling for association rule mining has approached this problem concluding that, in general, the theoretically obtained sample size bounds are far from what is observed in practice. In this paper, we try to reduce this gap between theory and practice. We propose two on-line sampling algorithms for association rule mining. Our algorithms maintain the same theoretical guarantees of previous approaches while using a much smaller number of transactions in most of the cases. In the experiments we report, this improvement is often by an order of magnitude.Postprint (published version

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

UPCommons

oai:upcommons.upc.edu:2117/913...

Last time updated on 17/04/2020

UPCommons. Portal del coneixement obert de la UPC

oai:upcommons.upc.edu:2117/913...

Last time updated on 07/11/2016