Discovering High-Utility Itemsets at Multiple Abstraction Levels

Abstract

High-Utility Itemset Mining (HUIM) is a relevant data mining task. The goal is to discover recurrent combinations of items characterized by high prot from transactional datasets. HUIM has a wide range of applications among which market basket analysis and service proling. Based on the observation that items can be clustered into domain-specic categories, a parallel research issue is generalized itemset mining. It entails generating correlations among data items at multiple abstraction levels. The extraction of multiple-level patterns affords new insights into the analyzed data from dierent viewpoints. This paper aims at discovering a novel pattern that combines the expressiveness of generalized and High-Utility itemsets. According to a user-defined taxonomy items are rst aggregated into semantically related categories. Then, a new type of pattern,namely the Generalized High-utility Itemset (GHUI), is extracted. It represents a combinations of items at different granularity levels characterized by high prot (utility). While protable combinations of item categories provide interesting high-level information, GHUIs at lower abstraction levels represent more specic correlationsamong protable items. A single-phase algorithm is proposed to efficiently discover utility itemsets at multiple abstraction levels. The experiments, which were performed on both real and synthetic data, demonstrate the effectiveness and usefulness of the proposed approach

    Similar works