1,662 research outputs found

    Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups

    Get PDF
    Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data. We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top-kk subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time

    Subgroup discovery for structured target concepts

    Get PDF
    The main object of study in this thesis is subgroup discovery, a theoretical framework for finding subgroups in data—i.e., named sub-populations— whose behaviour with respect to a specified target concept is exceptional when compared to the rest of the dataset. This is a powerful tool that conveys crucial information to a human audience, but despite past advances has been limited to simple target concepts. In this work we propose algorithms that bring this framework to novel application domains. We introduce the concept of representative subgroups, which we use not only to ensure the fairness of a sub-population with regard to a sensitive trait, such as race or gender, but also to go beyond known trends in the data. For entities with additional relational information that can be encoded as a graph, we introduce a novel measure of robust connectedness which improves on established alternative measures of density; we then provide a method that uses this measure to discover which named sub-populations are more well-connected. Our contributions within subgroup discovery crescent with the introduction of kernelised subgroup discovery: a novel framework that enables the discovery of subgroups on i.i.d. target concepts with virtually any kind of structure. Importantly, our framework additionally provides a concrete and efficient tool that works out-of-the-box without any modification, apart from specifying the Gramian of a positive definite kernel. To use within kernelised subgroup discovery, but also on any other kind of kernel method, we additionally introduce a novel random walk graph kernel. Our kernel allows the fine tuning of the alignment between the vertices of the two compared graphs, during the count of the random walks, while we also propose meaningful structure-aware vertex labels to utilise this new capability. With these contributions we thoroughly extend the applicability of subgroup discovery and ultimately re-define it as a kernel method.Der Hauptgegenstand dieser Arbeit ist die Subgruppenentdeckung (Subgroup Discovery), ein theoretischer Rahmen für das Auffinden von Subgruppen in Daten—d. h. benannte Teilpopulationen—deren Verhalten in Bezug auf ein bestimmtes Targetkonzept im Vergleich zum Rest des Datensatzes außergewöhnlich ist. Es handelt sich hierbei um ein leistungsfähiges Instrument, das einem menschlichen Publikum wichtige Informationen vermittelt. Allerdings ist es trotz bisherigen Fortschritte auf einfache Targetkonzepte beschränkt. In dieser Arbeit schlagen wir Algorithmen vor, die diesen Rahmen auf neuartige Anwendungsbereiche übertragen. Wir führen das Konzept der repräsentativen Untergruppen ein, mit dem wir nicht nur die Fairness einer Teilpopulation in Bezug auf ein sensibles Merkmal wie Rasse oder Geschlecht sicherstellen, sondern auch über bekannte Trends in den Daten hinausgehen können. Für Entitäten mit zusätzlicher relationalen Information, die als Graph kodiert werden kann, führen wir ein neuartiges Maß für robuste Verbundenheit ein, das die etablierten alternativen Dichtemaße verbessert; anschließend stellen wir eine Methode bereit, die dieses Maß verwendet, um herauszufinden, welche benannte Teilpopulationen besser verbunden sind. Unsere Beiträge in diesem Rahmen gipfeln in der Einführung der kernelisierten Subgruppenentdeckung: ein neuartiger Rahmen, der die Entdeckung von Subgruppen für u.i.v. Targetkonzepten mit praktisch jeder Art von Struktur ermöglicht. Wichtigerweise, unser Rahmen bereitstellt zusätzlich ein konkretes und effizientes Werkzeug, das ohne jegliche Modifikation funktioniert, abgesehen von der Angabe des Gramian eines positiv definitiven Kernels. Für den Einsatz innerhalb der kernelisierten Subgruppentdeckung, aber auch für jede andere Art von Kernel-Methode, führen wir zusätzlich einen neuartigen Random-Walk-Graph-Kernel ein. Unser Kernel ermöglicht die Feinabstimmung der Ausrichtung zwischen den Eckpunkten der beiden unter-Vergleich-gestelltenen Graphen während der Zählung der Random Walks, während wir auch sinnvolle strukturbewusste Vertex-Labels vorschlagen, um diese neue Fähigkeit zu nutzen. Mit diesen Beiträgen erweitern wir die Anwendbarkeit der Subgruppentdeckung gründlich und definieren wir sie im Endeffekt als Kernel-Methode neu

    Mining subjectively interesting patterns in rich data

    Get PDF

    Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups

    Get PDF
    Subgroup discovery is a local pattern mining technique to find interpretable descriptions of sub-populations that stand out on a given target variable. That is, these sub-populations are exceptional with regard to the global distribution. In this paper we argue that in many applications, such as scientific discovery, subgroups are only useful if they are additionally representative of the global distribution with regard to a control variable. That is, when the distribution of this control variable is the same, or almost the same, as over the whole data. We formalise this objective function and give an efficient algorithm to compute its tight optimistic estimator for the case of a numeric target and a binary control variable. This enables us to use the branch-and-bound framework to efficiently discover the top-kk subgroups that are both exceptional as well as representative. Experimental evaluation on a wide range of datasets shows that with this algorithm we discover meaningful representative patterns and are up to orders of magnitude faster in terms of node evaluations as well as time.Comment: 10 pages, To appear in ICDM1

    Learning subjectively interesting data representations

    Get PDF

    Cultural Integration in Organizational Partnership with Statutory and Quasi Implications

    Get PDF
    The current academic literature is inadequate on the possibility of applying a typological model of effective cultural integration within the context of public-private partnerships, particularly when governments collaborate with multinational corporations. Using Schein\u27s organizational cultural framework as the foundation, the purpose of this case study of a partnership between a West African government and a multinational petroleum corporation is to understand clearly how synergistic cultural integration coupled with statutory requirements could catalyze public-private partnership success. Data for this study came from interviews with American or Nigerian individuals who were familiar with the partnership in the West African country, a review of documents related to the partnership, and observational notes compiled during interviews. The Organizational Cultural Assessment Instrument inspired the interview questions. Data was coded and analyzed using a modification of Strauss and Corbin\u27s 3-tiered analytic procedure. Key findings revealed the need for culturally based positive change dynamics to maximize evolving partnership growth and success. There were also indicators that an effective cultural integration synergistic typology would propel evolving competitive service delivery, efficient policy implementation, workforce motivation, economic and financial profitability, efficient communication channels and technological innovativeness, managerial and administrative expertise. The knowledge of organizational cultural integration dynamics is useful to academicians, public administrators, policy makers, and executives in structuring public and private partnerships in a culturally sensitive way for long-term organizational growth and success

    Cultural Industries and Innovation-An Empirical Analysis

    Get PDF
    The multitude of research work on Creative industries speaks to the importance of this sector of the knowledge-based economy. Creative industries worldwide have witnessed rapid growth in the past decade and this has prompted more interest in this sector. Research on innovation in creative industries on the other hand has been rather limited, although several studies have indicated useful approaches to the management and organization of innovation relevant to the creative industries, however empirical studies in this respect are still far from comprehensive, hence prompting this empirical research on the impact of innovation on productivity in Creative Industries with a focal point on China Online Game Industry. This paper empirically studies the links between innovation and productivity at the firm level in Creative Industries using Chinese Online Game Industry as the focal point of its analysis. This paper bases its analysis on the recommendations of the Oslo Manual, this approach provides a way to achieve a high level of comparability within the Industry, it also provides standard definitions and indicators of innovation. The paper went further to adopt the scoring matrix approach in order to capture and delineate the various dimensions, dynamics and key features of online gaming enterprises in China. Indicators adopted in the analysis were selected based on literature review and statistical analysis. The empirical approach is based on data obtained from enterprise-based surveys of innovative activity in Chinese online game firms. The paper applied an econometric model of Research and Development, innovation and productivity interrelations at a firm level similar to that of Crépon, Duguet, and Mairesse (1998) for France, to the micro data obtained for China online gaming industry

    Relative importance of prenatal and postnatal determinants of stunting: data mining approaches to the MINIMat cohort, Bangladesh.

    Get PDF
    INTRODUCTION: WHO has set a goal to reduce the prevalence of stunted child growth by 40% by the year 2025. To reach this goal, it is imperative to establish the relative importance of risk factors for stunting to deliver appropriate interventions. Currently, most interventions take place in late infancy and early childhood. This study aimed to identify the most critical prenatal and postnatal determinants of linear growth 0-24 months and the risk factors for stunting at 2 years, and to identify subgroups with different growth trajectories and levels of stunting at 2 years. METHODS: Conditional inference tree-based methods were applied to the extensive Maternal and Infant Nutrition Interventions in Matlab trial database with 309 variables of 2723 children, their parents and living conditions, including socioeconomic, nutritional and other biological characteristics of the parents; maternal exposure to violence; household food security; breast and complementary feeding; and measurements of morbidity of the mothers during pregnancy and repeatedly of their children up to 24 months of age. Child anthropometry was measured monthly from birth to 12 months, thereafter quarterly to 24 months. RESULTS: Birth length and weight were the most critical factors for linear growth 0-24 months and stunting at 2 years, followed by maternal anthropometry and parental education. Conditions after birth, such as feeding practices and morbidity, were less strongly associated with linear growth trajectories and stunting at 2 years. CONCLUSION: The results of this study emphasise the benefit of interventions before conception and during pregnancy to reach a substantial reduction in stunting

    Topological data analysis and geometry in quantum field dynamics

    Get PDF
    Many non-perturbative phenomena in quantum field theories are driven or accompanied by non-local excitations, whose dynamical effects can be intricate but difficult to study. Amongst others, this includes diverse phases of matter, anomalous chiral behavior, and non-equilibrium phenomena such as non-thermal fixed points and thermalization. Topological data analysis can provide non-local order parameters sensitive to numerous such collective effects, giving access to the topology of a hierarchy of complexes constructed from given data. This dissertation contributes to the study of topological data analysis and geometry in quantum field dynamics. A first part is devoted to far-from-equilibrium time evolutions and the thermalization of quantum many-body systems. We discuss the observation of dynamical condensation and thermalization of an easy-plane ferromagnet in a spinor Bose gas, which goes along with the build-up of long-range order and superfluidity. In real-time simulations of an over-occupied gluonic plasma we show that observables based on persistent homology provide versatile probes for universal dynamics off equilibrium. Related mathematical effects such as a packing relation between the occurring persistent homology scaling exponents are proven in a probabilistic setting. In a second part, non-Abelian features of gauge theories are studied via topological data analysis and geometry. The structure of confining and deconfining phases in non-Abelian lattice gauge theory is investigated using persistent homology, which allows for a comprehensive picture of confinement. More fundamentally, four-dimensional space-time geometries are considered within real projective geometry, to which canonical quantum field theory constructions can be extended. This leads to a derivation of much of the particle content of the Standard Model. The works discussed in this dissertation provide a step towards a geometric understanding of non-perturbative phenomena in quantum field theories, and showcase the promising versatility of topological data analysis for statistical and quantum physics studies
    • …
    corecore