As advances in technology allow for the collection, storage, and analysis of
vast amounts of data, the task of screening and assessing the significance of
discovered patterns is becoming a major challenge in data mining applications.
In this work, we address significance in the context of frequent itemset
mining. Specifically, we develop a novel methodology to identify a meaningful
support threshold s* for a dataset, such that the number of itemsets with
support at least s* represents a substantial deviation from what would be
expected in a random dataset with the same number of transactions and the same
individual item frequencies. These itemsets can then be flagged as
statistically significant with a small false discovery rate. We present
extensive experimental results to substantiate the effectiveness of our
methodology.Comment: A preliminary version of this work was presented in ACM PODS 2009. 20
pages, 0 figure