Search CORE

2,454 research outputs found

Subgroup Discovery: Real-World Applications

Author: Carmona C. J.
Elizondo David
Publication venue: Techincal Report
Publication date: 01/03/2011
Field of study

Subgroup discovery is a data mining technique which extracts interesting rules with respect to a target variable. An important characteristic of this task is the combination of predictive and descriptive induction. In this paper, an overview about subgroup discovery is performed. In addition, di erent real-world applications solved through evolutionary algorithms where the suitability and potential of this type of algorithms for the development of subgroup discovery algorithms are presented

De Montfort University Open Research Archive

FSSD - A Fast and Efficient Algorithm for Subgroup Set Discovery

Author: Belfodil Adnene
Belfodil Aimene
Bendimerad Anes
Kaytoue Mehdi
Lamarre Philippe
Plantevit Marc
Robardet Céline
Publication venue: HAL CCSD
Publication date: 05/10/2019
Field of study

International audienceSubgroup discovery (SD) is the task of discovering interpretable patterns in the data that stand out w.r.t. some property of interest. Discovering patterns that accurately discriminate a class from the others is one of the most common SD tasks. Standard approaches of the literature are based on local pattern discovery, which is known to provide an overwhelmingly large number of redundant patterns. To solve this issue, pattern set mining has been proposed: instead of evaluating the quality of patterns separately, one should consider the quality of a pattern set as a whole. The goal is to provide a small pattern set that is diverse and well-discriminant to the target class. In this work, we introduce a novel formulation of the task of diverse subgroup set discovery where both discriminative power and diversity of the subgroup set are incorporated in the same quality measure. We propose an efficient and parameter-free algorithm dubbed FSSD and based on a greedy scheme. FSSD uses several optimization strategies that enable to efficiently provide a high quality pattern set in a short amount of time

Crossref

HAL

Hal-Diderot

Suwan - a Supervised Clustering Algorithm with attributed networks

Author: Bárbara Monteiro Santos
Publication venue
Publication date: 15/11/2021
Field of study

Repositório Aberto da Universidade do Porto

Anytime Subgroup Discovery in Numerical Domains with Guarantees

Author: Belfodil Adnene
Belfodil Aimene
Kaytoue Mehdi
Publication venue: HAL CCSD
Publication date: 10/09/2018
Field of study

International audienceSubgroup discovery is the task of discovering patterns that accurately discriminate a class label from the others. Existing approaches can uncover such patterns either through an exhaustive or an approximate exploration of the pattern search space. However, an exhaustive exploration is generally unfeasible whereas approximate approaches do not provide guarantees bounding the error of the best pattern quality nor the exploration progression ("How far are we of an exhaustive search"). We design here an algorithm for mining numerical data with three key properties w.r.t. the state of the art: (i) It yields progressively interval patterns whose quality improves over time; (ii) It can be interrupted anytime and always gives a guarantee bounding the error on the top pattern quality and (iii) It always bounds a distance to the exhaustive exploration. After reporting experimentations showing the effectiveness of our method, we discuss its generalization to other kinds of patterns

Identifying Exceptional Descriptions of People using Topic Modeling and Subgroup Discovery (Extended Abstract, Resubmission)

Author: Atzmueller Martin
Hendrickson Andrew
Wang Jason
Publication venue
Publication date: 01/01/2018
Field of study

Tilburg University Repository

Data Driven Discovery of Root Causes in an Internal Logistics Context:A Case Study at Prodrive Technologies

Author: de Groot F.
Publication venue
Publication date: 29/07/2022
Field of study

Pure OAI Repository

SeqScout: Using a Bandit Model to Discover Interesting Subgroups in Labeled Sequences

Author: Boulicaut Jean-François
Kaytoue Mehdi
Mathonat Romain
Nurbakova Diana
Publication venue: HAL CCSD
Publication date: 05/10/2019
Field of study

International audienceIt is extremely useful to exploit labeled datasets not only to learn models but also to improve our understanding of a domain and its available targeted classes. The so-called subgroup discovery task has been considered for a long time. It concerns the discovery of patterns or descriptions, the set of supporting objects of which have interesting properties, e.g., they characterize or discriminate a given target class. Though many subgroup discovery algorithms have been proposed for transactional data, discovering subgroups within labeled sequential data and thus searching for descriptions as sequential patterns has been much less studied. In that context, exhaustive exploration strategies can not be used for real-life applications and we have to look for heuristic approaches. We propose the algorithm SeqScout to discover interesting subgroups (w.r.t. a chosen quality measure) from labeled sequences of itemsets. This is a new sampling algorithm that mines discriminant sequential patterns using a multi-armed bandit model. It is an anytime algorithm that, for a given budget, finds a collection of local optima in the search space of descriptions and thus subgroups. It requires a light configuration and it is independent from the quality measure used for pattern scoring. Furthermore, it is fairly simple to implement. We provide qualitative and quantitative experiments on several datasets to illustrate its added-value

Crossref

HAL

Hal-Diderot

Subgroup Discovery trhough Evolutionary Fuzzy Systems applied to Bioinformatic problems

Author: Carmona C. J.
Elizondo David
Publication venue: Technical Report DMU
Publication date: 01/03/2011
Field of study

Subgroup discovery is a descriptive data mining technique using supervised learning. This paper presents a summary about the main properties and elements about subgroup discovery task. In addition, we will focus on the suitability and potential of the search performed by evolutionary algorithms in order to apply in the development of subgroup discovery algorithms, and in the use of fuzzy logic which is a soft computing technique very close to the human reasoning. The hybridisation of both techniques are well known as evolutionary fuzzy system. The most relevant applications of evolutionary fuzzy systems for subgroup discovery in the bioinformatics domains are outlined in this work. Specifically, these algorithms are applied to a problem based on the Influenza A virus and the accute sore throat problem

De Montfort University Open Research Archive

Recommended from our members

Evolutionary and molecular foundations of multiple contemporary functions of the nitroreductase superfamily.

Author: Akiva Eyal
Babbitt Patricia C
Copp Janine N
Tokuriki Nobuhiko
Publication venue: eScholarship, University of California
Publication date: 01/11/2017
Field of study

Insight regarding how diverse enzymatic functions and reactions have evolved from ancestral scaffolds is fundamental to understanding chemical and evolutionary biology, and for the exploitation of enzymes for biotechnology. We undertook an extensive computational analysis using a unique and comprehensive combination of tools that include large-scale phylogenetic reconstruction to determine the sequence, structural, and functional relationships of the functionally diverse flavin mononucleotide-dependent nitroreductase (NTR) superfamily (>24,000 sequences from all domains of life, 54 structures, and >10 enzymatic functions). Our results suggest an evolutionary model in which contemporary subgroups of the superfamily have diverged in a radial manner from a minimal flavin-binding scaffold. We identified the structural design principle for this divergence: Insertions at key positions in the minimal scaffold that, combined with the fixation of key residues, have led to functional specialization. These results will aid future efforts to delineate the emergence of functional diversity in enzyme superfamilies, provide clues for functional inference for superfamily members of unknown function, and facilitate rational redesign of the NTR scaffold

eScholarship - University of California