321 research outputs found

    Structural Clustering and Visualization for Multi-Objective Decision Making

    Get PDF

    Subjectively Interesting Subgroup Discovery on Real-valued Targets

    Get PDF
    Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely many if we consider weighted combinations, even for linear combinations. Hence, an obvious question is whether we can automate the search for interesting patterns and visualizations. In this paper, we consider the setting where a user wants to learn as efficiently as possible about real-valued attributes. For example, to understand the distribution of crime rates in different geographic areas in terms of other (numerical, ordinal and/or categorical) variables that describe the areas. We introduce a method to find subgroups in the data that are maximally informative (in the formal Information Theoretic sense) with respect to a single or set of real-valued target attributes. The subgroup descriptions are in terms of a succinct set of arbitrarily-typed other attributes. The approach is based on the Subjective Interestingness framework FORSIED to enable the use of prior knowledge when finding most informative non-redundant patterns, and hence the method also supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio

    The limits of process: On (re)reading Henri Bergson

    Get PDF
    This article offers a reading of the work of Henri Bergson as it pertains to organizations through the lens of ideas drawn from critical realism. It suggests an alternative to interpretations based on a stark division between process and realist perspectives. Much of the existing literature presents a rather partial view of Bergson’s work. A review suggests some interesting parallels with themes in critical realism, notably the emergence of mind. Critical realism has a focus on process at its heart, but is also concerned with how the products of such processes become stabilized and form the conditions for action. This suggests that attention might usefully be paid to the relationship between organizational action and the sedimented practices grouped under the heading of ‘routines’. More attention to Bergson’s account of the relationship between instinct, intuition and intelligence provides a link to the social character of thought, something which can be mapped on to Archer’s work on reflexivity and the ‘internal conversation’. This suggests that our analyses need to pay attention to both memory and history, to building and dwelling, rather than the one-sided focus found in some process theory accounts

    A new set of cluster driven composite development indicators

    Get PDF
    Composite development indicators used in policy making often subjectively aggregate a restricted set of indicators. We show, using dimensionality reduction techniques, including Principal Component Analysis (PCA) and for the first time information filtering and hierarchical clustering, that these composite indicators miss key information on the relationship between different indicators. In particular, the grouping of indicators via topics is not reflected in the data at a global and local level. We overcome these issues by using the clustering of indicators to build a new set of cluster driven composite development indicators that are objective, data driven, comparable between countries, and retain interpretabilty. We discuss their consequences on informing policy makers about country development, comparing them with the top PageRank indicators as a benchmark. Finally, we demonstrate that our new set of composite development indicators outperforms the benchmark on a dataset reconstruction task.Comment: Accepted in EPJ Data Scienc

    Configurations of Control: A Transaction Cost Approach

    Get PDF
    In this paper, I present a theory of management control based on Transaction Cost Economics.This theory seeks to integrate into a single framework a set of insights as to the natureof the organization's activities, the control problems that are inherent in these activities,and the unique problem solving potential of various archetypal control structures. The gistof the argument is that activities predictably differ in the control problems to which theygive rise, whereas control archetypes differ in their problem-solving ability, and thatalignments between the two can be explained by delineating the efficiency properties of thematch. This is a contingent configuration approach. It is a configuration theory in that itoffers a set of ideal types, conceived of as internally consistent and discriminating clustersof attributes from multiple dimensions that have a specific effect on control structureeffectiveness as the variable to be explained. But it is also a contingent approach in that itspecifies the conditions in which each of the archetypes is most effective.transaction cost economics;management control theory;configuration theory

    Bayesian Cluster Analysis

    Get PDF

    Multivariate Approaches to Classification in Extragalactic Astronomy

    Get PDF
    Clustering objects into synthetic groups is a natural activity of any science. Astrophysics is not an exception and is now facing a deluge of data. For galaxies, the one-century old Hubble classification and the Hubble tuning fork are still largely in use, together with numerous mono-or bivariate classifications most often made by eye. However, a classification must be driven by the data, and sophisticated multivariate statistical tools are used more and more often. In this paper we review these different approaches in order to situate them in the general context of unsupervised and supervised learning. We insist on the astrophysical outcomes of these studies to show that multivariate analyses provide an obvious path toward a renewal of our classification of galaxies and are invaluable tools to investigate the physics and evolution of galaxies.Comment: Open Access paper. http://www.frontiersin.org/milky\_way\_and\_galaxies/10.3389/fspas.2015.00003/abstract\>. \<10.3389/fspas.2015.00003 \&g

    Multidimensional Graphical Representations by Chernoff-Type Faces in Color: Assigning Data Coordinates to Face Parameters through Principal Component Analysis

    Get PDF
    A new Chernoff-type face in color has been developed for purposes of representing and analyzing multidimensional data. This cartoon-like but fairly realistic face is defined by 20 parameters, including 4 color parameters. The programming was done in extended BASIC on the Hewlett-Packard 9845C color graphics computer. A method based on the mean pooled variances of parameter values within observed clusters was developed in order to establish an empirical rank order of importance among the face parameters. It was found experimentally that the smile, the outline of the face, and certain eye parameters were among the most important. Using a model consisting of a mixture of multivariate normal distributions, data were generated artificially from four known populations in order to compare different schemes for assigning data coordinates to face parameters. Five different schemes were experimentally evaluated with regard to their ability to recover known clusterings. The five methods were compared with one another, with random clusterings, and with the results of applying numerical algorithms to the artificial data. The assignment scheme best able experimentally to recover the known clustering was one where principal component scores were used to construct the faces rather than the original, raw data. Numerical algorithms which operated on the component scores were also generally superior to those operating on the original data. Using the new faces, a method was developed to cluster variables rather than the customary clustering of cases. This was compared with the clustering of variables through principal component analysis (varimax orthogonal rotations), and with numerical clustering algorithms which use the product moment correlation as a similarity measure. A data set consisting of psychological profiles of nine entering classes of physicians in a Family Medicine residency was utilized to illustrate some of the foregoing, and also to depict and analyze changes over time of entering class characteristics
    • 

    corecore