110,759 research outputs found

    Robust subgroup discovery

    Get PDF
    We introduce the problem of robust subgroup discovery, i.e., finding a set of interpretable descriptions of subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same time from a global modelling perspective. First, we formulate the broad model class of subgroup lists, i.e., ordered sets of subgroups, for univariate and multivariate targets that can consist of nominal or numeric variables, and that includes traditional top-1 subgroup discovery in its definition. This novel model class allows us to formalise the problem of optimal robust subgroup discovery using the Minimum Description Length (MDL) principle, where we resort to optimal Normalised Maximum Likelihood and Bayesian encodings for nominal and numeric targets, respectively. Second, as finding optimal subgroup lists is NP-hard, we propose SSD++, a greedy heuristic that finds good subgroup lists and guarantees that the most significant subgroup found according to the MDL criterion is added in each iteration, which is shown to be equivalent to a Bayesian one-sample proportions, multinomial, or t-test between the subgroup and dataset marginal target distributions plus a multiple hypothesis testing penalty. We empirically show on 54 datasets that SSD++ outperforms previous subgroup set discovery methods in terms of quality and subgroup list size.Comment: For associated code, see https://github.com/HMProenca/RuleList ; submitted to Data Mining and Knowledge Discovery Journa

    Subjectively Interesting Subgroup Discovery on Real-valued Targets

    Get PDF
    Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely many if we consider weighted combinations, even for linear combinations. Hence, an obvious question is whether we can automate the search for interesting patterns and visualizations. In this paper, we consider the setting where a user wants to learn as efficiently as possible about real-valued attributes. For example, to understand the distribution of crime rates in different geographic areas in terms of other (numerical, ordinal and/or categorical) variables that describe the areas. We introduce a method to find subgroups in the data that are maximally informative (in the formal Information Theoretic sense) with respect to a single or set of real-valued target attributes. The subgroup descriptions are in terms of a succinct set of arbitrarily-typed other attributes. The approach is based on the Subjective Interestingness framework FORSIED to enable the use of prior knowledge when finding most informative non-redundant patterns, and hence the method also supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio

    CONTRAST SET MINING MENGGUNAKAN SUBGROUP DISCOVERY ( Contrast Set Mining Through Subgroup Discovery )

    Get PDF
    ABSTRAKSI: Teknologi data mining merupakan suatu teknologi untuk menambang atau mengekstraksi pengetahuan menggunakan kumpulan data yang sangat besar. Salah satu pendekatan alternatif dalam data mining untuk mencari perbandingan dari beberapa grup perbandingan disebut contrast set mining. Contrast set mining dapat dilakukan menggunakan subgroup discovery. Salah satu algoritma dari subgroup discovery adalah APRIORI-SD. Berdasarkan hasil pengujian, diperoleh rule-rule yang dapat digunakan untuk mengklasifikasikan data dan diperoleh juga tingkat akurasi dari rule tersebut.Kata Kunci : data mining, contrast set mining, subgroup discovery,apriori-sd,ruleABSTRACT: Data mining technology can be used for mining and extracting knowledge with huge amount of data. An alternative approach in data mining for comparing of multiple groups comparison is called contrast set mining. Contrast set mining can be done by using subgroup discovery. One of the subgroup discovery algorithm is the Apriori-SD. Based on implementation of this algoritm, rules can be gained and tested for classifiying new data and also rule accuracy can be calculated based on this testing.Keyword: data mining, contrast set mining, subgroup discovery,apriori-sd,rul

    Explainable subgraphs with surprising densities : a subgroup discovery approach

    Get PDF
    The connectivity structure of graphs is typically related to the attributes of the nodes. In social networks for example, the probability of a friendship between any pair of people depends on a range of attributes, such as their age, residence location, workplace, and hobbies. The high-level structure of a graph can thus possibly be described well by means of patterns of the form `the subgroup of all individuals with a certain properties X are often (or rarely) friends with individuals in another subgroup defined by properties Y', in comparison to what is expected. Such rules present potentially actionable and generalizable insight into the graph. We present a method that finds node subgroup pairs between which the edge density is interestingly high or low, using an information-theoretic definition of interestingness. Additionally, the interestingness is quantified subjectively, to contrast with prior information an analyst may have about the connectivity. This view immediatly enables iterative mining of such patterns. This is the first method aimed at graph connectivity relations between different subgroups. Our method generalizes prior work on dense subgraphs induced by a subgroup description. Although this setting has been studied already, we demonstrate for this special case considerable practical advantages of our subjective interestingness measure with respect to a wide range of (objective) interestingness measures

    What is the Astrophysical Meaning of the Intermediate Subgroup of GRBs?

    Full text link
    Published articles concerning the intermediate (third) subgroup of GRBs are surveyed. From a statistical perspective this subgroup may exist, however its significance depends on which data set is used. Its astrophysical meaning is unclear because the occurrence of this subgroup can also be an artificial selection effect. Hence, GRBs from this subgroup need not be given by a physically different phenomenon. The aim of this contribution is to search for the answer to the question in the title.Comment: journal: Proceedings of Science, Swift: 10 Years of Discovery; conference date: 2-5 December 2014; location: La Sapienza University, Rome, Italy; 6 pages, 4 figures, 1 table; accepted for publication in July 9 201

    Model Reuse with Subgroup Discovery

    Get PDF

    Subgroup Discovery in Unstructured Data

    Full text link
    Subgroup discovery is a descriptive and exploratory data mining technique to identify subgroups in a population that exhibit interesting behavior with respect to a variable of interest. Subgroup discovery has numerous applications in knowledge discovery and hypothesis generation, yet it remains inapplicable for unstructured, high-dimensional data such as images. This is because subgroup discovery algorithms rely on defining descriptive rules based on (attribute, value) pairs, however, in unstructured data, an attribute is not well defined. Even in cases where the notion of attribute intuitively exists in the data, such as a pixel in an image, due to the high dimensionality of the data, these attributes are not informative enough to be used in a rule. In this paper, we introduce the subgroup-aware variational autoencoder, a novel variational autoencoder that learns a representation of unstructured data which leads to subgroups with higher quality. Our experimental results demonstrate the effectiveness of the method at learning subgroups with high quality while supporting the interpretability of the concepts

    Subgroup Discovery: Real-World Applications

    Get PDF
    Subgroup discovery is a data mining technique which extracts interesting rules with respect to a target variable. An important characteristic of this task is the combination of predictive and descriptive induction. In this paper, an overview about subgroup discovery is performed. In addition, di erent real-world applications solved through evolutionary algorithms where the suitability and potential of this type of algorithms for the development of subgroup discovery algorithms are presented

    Expert-Guided Subgroup Discovery: Methodology and Application

    Full text link
    This paper presents an approach to expert-guided subgroup discovery. The main step of the subgroup discovery process, the induction of subgroup descriptions, is performed by a heuristic beam search algorithm, using a novel parametrized definition of rule quality which is analyzed in detail. The other important steps of the proposed subgroup discovery process are the detection of statistically significant properties of selected subgroups and subgroup visualization: statistically significant properties are used to enrich the descriptions of induced subgroups, while the visualization shows subgroup properties in the form of distributions of the numbers of examples in the subgroups. The approach is illustrated by the results obtained for a medical problem of early detection of patient risk groups

    Subgroup Discovery for Defect Prediction

    Full text link
    • …
    corecore