3 research outputs found

    FSSD - A Fast and Efficient Algorithm for Subgroup Set Discovery

    Get PDF
    International audienceSubgroup discovery (SD) is the task of discovering interpretable patterns in the data that stand out w.r.t. some property of interest. Discovering patterns that accurately discriminate a class from the others is one of the most common SD tasks. Standard approaches of the literature are based on local pattern discovery, which is known to provide an overwhelmingly large number of redundant patterns. To solve this issue, pattern set mining has been proposed: instead of evaluating the quality of patterns separately, one should consider the quality of a pattern set as a whole. The goal is to provide a small pattern set that is diverse and well-discriminant to the target class. In this work, we introduce a novel formulation of the task of diverse subgroup set discovery where both discriminative power and diversity of the subgroup set are incorporated in the same quality measure. We propose an efficient and parameter-free algorithm dubbed FSSD and based on a greedy scheme. FSSD uses several optimization strategies that enable to efficiently provide a high quality pattern set in a short amount of time

    Evaluating Pattern Set Mining Strategies in a Constraint Programming Framework

    No full text
    Abstract. The pattern mining community has shifted its attention from local pattern mining to pattern set mining. The task of pattern set mining is concerned with finding a set of patterns that satisfies a set of constraints and often also scores best w.r.t. an optimisation criteria. Furthermore, while in local pattern mining the constraints are imposed at the level of individual patterns, in pattern set mining they are also concerned with the overall set of patterns. A wide variety of different pattern set mining techniques is available in literature. The key contribution of this paper is that it studies, compares and evaluates such search strategies for pattern set mining. The investigation employs concept-learning as a benchmark for pattern set mining and employs a constraint programming framework in which key components of pattern set mining are formulated and implemented. The study leads to novel insights into the strong and weak points of different pattern set mining strategies.
    corecore