7 research outputs found

    FSSD - A Fast and Efficient Algorithm for Subgroup Set Discovery

    Get PDF
    International audienceSubgroup discovery (SD) is the task of discovering interpretable patterns in the data that stand out w.r.t. some property of interest. Discovering patterns that accurately discriminate a class from the others is one of the most common SD tasks. Standard approaches of the literature are based on local pattern discovery, which is known to provide an overwhelmingly large number of redundant patterns. To solve this issue, pattern set mining has been proposed: instead of evaluating the quality of patterns separately, one should consider the quality of a pattern set as a whole. The goal is to provide a small pattern set that is diverse and well-discriminant to the target class. In this work, we introduce a novel formulation of the task of diverse subgroup set discovery where both discriminative power and diversity of the subgroup set are incorporated in the same quality measure. We propose an efficient and parameter-free algorithm dubbed FSSD and based on a greedy scheme. FSSD uses several optimization strategies that enable to efficiently provide a high quality pattern set in a short amount of time

    Robust subgroup discovery

    Get PDF
    We introduce the problem of robust subgroup discovery, i.e., finding a set of interpretable descriptions of subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made to mine either locally robust subgroups or to tackle the pattern explosion, but we are the first to address both challenges at the same time from a global modelling perspective. First, we formulate the broad model class of subgroup lists, i.e., ordered sets of subgroups, for univariate and multivariate targets that can consist of nominal or numeric variables, and that includes traditional top-1 subgroup discovery in its definition. This novel model class allows us to formalise the problem of optimal robust subgroup discovery using the Minimum Description Length (MDL) principle, where we resort to optimal Normalised Maximum Likelihood and Bayesian encodings for nominal and numeric targets, respectively. Second, as finding optimal subgroup lists is NP-hard, we propose SSD++, a greedy heuristic that finds good subgroup lists and guarantees that the most significant subgroup found according to the MDL criterion is added in each iteration, which is shown to be equivalent to a Bayesian one-sample proportions, multinomial, or t-test between the subgroup and dataset marginal target distributions plus a multiple hypothesis testing penalty. We empirically show on 54 datasets that SSD++ outperforms previous subgroup set discovery methods in terms of quality and subgroup list size.Comment: For associated code, see https://github.com/HMProenca/RuleList ; submitted to Data Mining and Knowledge Discovery Journa

    Kernel-based distribution features for statistical tests and Bayesian inference

    Get PDF
    The kernel mean embedding is known to provide a data representation which preserves full information of the data distribution. While typically computationally costly, its nonparametric nature has an advantage of requiring no explicit model specification of the data. At the other extreme are approaches which summarize data distributions into a finite-dimensional vector of hand-picked summary statistics. This explicit finite-dimensional representation offers a computationally cheaper alternative. Clearly, there is a trade-off between cost and sufficiency of the representation, and it is of interest to have a computationally efficient technique which can produce a data-driven representation, thus combining the advantages from both extremes. The main focus of this thesis is on the development of linear-time mean-embedding-based methods to automatically extract informative features of data distributions, for statistical tests and Bayesian inference. In the first part on statistical tests, several new linear-time techniques are developed. These include a new kernel-based distance measure for distributions, a new linear-time nonparametric dependence measure, and a linear-time discrepancy measure between a probabilistic model and a sample, based on a Stein operator. These new measures give rise to linear-time and consistent tests of homogeneity, independence, and goodness of fit, respectively. The key idea behind these new tests is to explicitly learn distribution-characterizing feature vectors, by maximizing a proxy for the probability of correctly rejecting the null hypothesis. We theoretically show that these new tests are consistent for any finite number of features. In the second part, we explore the use of random Fourier features to construct approximate kernel mean embeddings, for representing messages in expectation propagation (EP) algorithm. The goal is to learn a message operator which predicts EP outgoing messages from incoming messages. We derive a novel two-layer random feature representation of the input messages, allowing online learning of the operator during EP inference

    FSSD - A Fast and Efficient Algorithm for Subgroup Set Discovery

    No full text
    International audienceSubgroup discovery (SD) is the task of discovering interpretable patterns in the data that stand out w.r.t. some property of interest. Discovering patterns that accurately discriminate a class from the others is one of the most common SD tasks. Standard approaches of the literature are based on local pattern discovery, which is known to provide an overwhelmingly large number of redundant patterns. To solve this issue, pattern set mining has been proposed: instead of evaluating the quality of patterns separately, one should consider the quality of a pattern set as a whole. The goal is to provide a small pattern set that is diverse and well-discriminant to the target class. In this work, we introduce a novel formulation of the task of diverse subgroup set discovery where both discriminative power and diversity of the subgroup set are incorporated in the same quality measure. We propose an efficient and parameter-free algorithm dubbed FSSD and based on a greedy scheme. FSSD uses several optimization strategies that enable to efficiently provide a high quality pattern set in a short amount of time

    Knowledge and Management Models for Sustainable Growth

    Full text link
    In the last years sustainability has become a topic of global concern and a key issue in the strategic agenda of both business organizations and public authorities and organisations. Significant changes in business landscape, the emergence of new technology, including social media, the pressure of new social concerns, have called into question established conceptualizations of competitiveness, wealth creation and growth. New and unaddressed set of issues regarding how private and public organisations manage and invest their resources to create sustainable value have brought to light. In particular the increasing focus on environmental and social themes has suggested new dimensions to be taken into account in the value creation dynamics, both at organisations and communities level. For companies the need of integrating corporate social and environmental responsibility issues into strategy and daily business operations, pose profound challenges, which, in turn, involve numerous processes and complex decisions influenced by many stakeholders. Facing these challenges calls for the creation, use and exploitation of new knowledge as well as the development of proper management models, approaches and tools aimed to contribute to the development and realization of environmentally and socially sustainable business strategies and practices

    The 15th International CDIO Conference: Proceedings – Full Papers

    Get PDF
    The 15th international CDIO conference was held at Aarhus University from 25 June 2019 to 27 June 2019 with activities on 24 and 28 June. The main theme of the 15th International CDIO Conference was CHANGE in Engineering Education.  The conference programme included: Keynotes General presentations Working groups Workshops Round tables Social events CDIO Academy (A CDIO experience for students

    The 15th International CDIO Conference: Proceedings – Full Papers

    Get PDF
    We discuss a conceptual thesis structure model and visual tool for enhancing the writing process in the context of an engineering Master’s thesis. Our model is based on visualizing the thesis as a series of funnels that adjust the writing focus to the desired scope in each individual chapter. At the end of the thesis, the focus is widened back into the original topic area with a reflection on how the solutions proposed in the thesis have impacted or potentially will impact the field. Using our model gives students the opportunity to write a good Master’s thesis in various engineering disciplines. In our experience, the Focus Funnel approach has been very useful and effective, resulting in an overall improvement in the quality of engineering Master’s theses in our degree program.</p
    corecore