321 research outputs found
Subjectively Interesting Subgroup Discovery on Real-valued Targets
Deriving insights from high-dimensional data is one of the core problems in
data mining. The difficulty mainly stems from the fact that there are
exponentially many variable combinations to potentially consider, and there are
infinitely many if we consider weighted combinations, even for linear
combinations. Hence, an obvious question is whether we can automate the search
for interesting patterns and visualizations. In this paper, we consider the
setting where a user wants to learn as efficiently as possible about
real-valued attributes. For example, to understand the distribution of crime
rates in different geographic areas in terms of other (numerical, ordinal
and/or categorical) variables that describe the areas. We introduce a method to
find subgroups in the data that are maximally informative (in the formal
Information Theoretic sense) with respect to a single or set of real-valued
target attributes. The subgroup descriptions are in terms of a succinct set of
arbitrarily-typed other attributes. The approach is based on the Subjective
Interestingness framework FORSIED to enable the use of prior knowledge when
finding most informative non-redundant patterns, and hence the method also
supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio
The limits of process: On (re)reading Henri Bergson
This article offers a reading of the work of Henri Bergson as it pertains to organizations through the lens of ideas drawn from critical realism. It suggests an alternative to interpretations based on a stark division between process and realist perspectives. Much of the existing literature presents a rather partial view of Bergsonâs work. A review suggests some interesting parallels with themes in critical realism, notably the emergence of mind. Critical realism has a focus on process at its heart, but is also concerned with how the products of such processes become stabilized and form the conditions for action. This suggests that attention might usefully be paid to the relationship between organizational action and the sedimented practices grouped under the heading of âroutinesâ. More attention to Bergsonâs account of the relationship between instinct, intuition and intelligence provides a link to the social character of thought, something which can be mapped on to Archerâs work on reflexivity and the âinternal conversationâ. This suggests that our analyses need to pay attention to both memory and history, to building and dwelling, rather than the one-sided focus found in some process theory accounts
A new set of cluster driven composite development indicators
Composite development indicators used in policy making often subjectively
aggregate a restricted set of indicators. We show, using dimensionality
reduction techniques, including Principal Component Analysis (PCA) and for the
first time information filtering and hierarchical clustering, that these
composite indicators miss key information on the relationship between different
indicators. In particular, the grouping of indicators via topics is not
reflected in the data at a global and local level. We overcome these issues by
using the clustering of indicators to build a new set of cluster driven
composite development indicators that are objective, data driven, comparable
between countries, and retain interpretabilty. We discuss their consequences on
informing policy makers about country development, comparing them with the top
PageRank indicators as a benchmark. Finally, we demonstrate that our new set of
composite development indicators outperforms the benchmark on a dataset
reconstruction task.Comment: Accepted in EPJ Data Scienc
Configurations of Control: A Transaction Cost Approach
In this paper, I present a theory of management control based on Transaction Cost Economics.This theory seeks to integrate into a single framework a set of insights as to the natureof the organization's activities, the control problems that are inherent in these activities,and the unique problem solving potential of various archetypal control structures. The gistof the argument is that activities predictably differ in the control problems to which theygive rise, whereas control archetypes differ in their problem-solving ability, and thatalignments between the two can be explained by delineating the efficiency properties of thematch. This is a contingent configuration approach. It is a configuration theory in that itoffers a set of ideal types, conceived of as internally consistent and discriminating clustersof attributes from multiple dimensions that have a specific effect on control structureeffectiveness as the variable to be explained. But it is also a contingent approach in that itspecifies the conditions in which each of the archetypes is most effective.transaction cost economics;management control theory;configuration theory
Multivariate Approaches to Classification in Extragalactic Astronomy
Clustering objects into synthetic groups is a natural activity of any
science. Astrophysics is not an exception and is now facing a deluge of data.
For galaxies, the one-century old Hubble classification and the Hubble tuning
fork are still largely in use, together with numerous mono-or bivariate
classifications most often made by eye. However, a classification must be
driven by the data, and sophisticated multivariate statistical tools are used
more and more often. In this paper we review these different approaches in
order to situate them in the general context of unsupervised and supervised
learning. We insist on the astrophysical outcomes of these studies to show that
multivariate analyses provide an obvious path toward a renewal of our
classification of galaxies and are invaluable tools to investigate the physics
and evolution of galaxies.Comment: Open Access paper.
http://www.frontiersin.org/milky\_way\_and\_galaxies/10.3389/fspas.2015.00003/abstract\>.
\<10.3389/fspas.2015.00003 \&g
Multidimensional Graphical Representations by Chernoff-Type Faces in Color: Assigning Data Coordinates to Face Parameters through Principal Component Analysis
A new Chernoff-type face in color has been developed for purposes of representing and analyzing multidimensional data. This cartoon-like but fairly realistic face is defined by 20 parameters, including 4 color parameters. The programming was done in extended BASIC on the Hewlett-Packard 9845C color graphics computer. A method based on the mean pooled variances of parameter values within observed clusters was developed in order to establish an empirical rank order of importance among the face parameters. It was found experimentally that the smile, the outline of the face, and certain eye parameters were among the most important. Using a model consisting of a mixture of multivariate normal distributions, data were generated artificially from four known populations in order to compare different schemes for assigning data coordinates to face parameters. Five different schemes were experimentally evaluated with regard to their ability to recover known clusterings. The five methods were compared with one another, with random clusterings, and with the results of applying numerical algorithms to the artificial data. The assignment scheme best able experimentally to recover the known clustering was one where principal component scores were used to construct the faces rather than the original, raw data. Numerical algorithms which operated on the component scores were also generally superior to those operating on the original data. Using the new faces, a method was developed to cluster variables rather than the customary clustering of cases. This was compared with the clustering of variables through principal component analysis (varimax orthogonal rotations), and with numerical clustering algorithms which use the product moment correlation as a similarity measure. A data set consisting of psychological profiles of nine entering classes of physicians in a Family Medicine residency was utilized to illustrate some of the foregoing, and also to depict and analyze changes over time of entering class characteristics
- âŠ