7 research outputs found
A tool for subjective and interactive visual data exploration
We present SIDE, a tool for Subjective and Interactive Visual Data Exploration, which lets users explore high dimensional data via subjectively informative 2D data visualizations. Many existing visual analytics tools are either restricted to specific problems and domains or they aim to find visualizations that align with user’s belief about the data. In contrast, our generic tool computes data visualizations that are surprising given a user’s current understanding of the data. The user’s belief state is represented as a set of projection tiles. Hence, this user-awareness offers users an efficient way to interactively explore yet-unknown features of complex high dimensional datasets
Subjectively Interesting Subgroup Discovery on Real-valued Targets
Deriving insights from high-dimensional data is one of the core problems in
data mining. The difficulty mainly stems from the fact that there are
exponentially many variable combinations to potentially consider, and there are
infinitely many if we consider weighted combinations, even for linear
combinations. Hence, an obvious question is whether we can automate the search
for interesting patterns and visualizations. In this paper, we consider the
setting where a user wants to learn as efficiently as possible about
real-valued attributes. For example, to understand the distribution of crime
rates in different geographic areas in terms of other (numerical, ordinal
and/or categorical) variables that describe the areas. We introduce a method to
find subgroups in the data that are maximally informative (in the formal
Information Theoretic sense) with respect to a single or set of real-valued
target attributes. The subgroup descriptions are in terms of a succinct set of
arbitrarily-typed other attributes. The approach is based on the Subjective
Interestingness framework FORSIED to enable the use of prior knowledge when
finding most informative non-redundant patterns, and hence the method also
supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio
Interactive visual data exploration with subjective feedback : an information-theoretic approach
Visual exploration of high-dimensional real-valued datasets is a fundamental task in exploratory data analysis (EDA). Existing projection methods for data visualization use predefined criteria to choose the representation of data. There is a lack of methods that (i) use information on what the user has learned from the data and (ii) show patterns that she does not know yet. We construct a theoretical model where identified patterns can be input as knowledge to the system. The knowledge syntax here is intuitive, such as "this set of points forms a cluster", and requires no knowledge of maths. This background knowledge is used to find a maximum entropy distribution of the data, after which the user is provided with data projections for which the data and the maximum entropy distribution differ the most, hence showing the user aspects of data that are maximally informative given the background knowledge. We study the computational performance of our model and present use cases on synthetic and real data. We find that the model allows the user to learn information efficiently from various data sources and works sufficiently fast in practice. In addition, we provide an open source EDA demonstrator system implementing our model with tailored interactive visualizations. We conclude that the information theoretic approach to EDA where patterns observed by a user are formalized as constraints provides a principled, intuitive, and efficient basis for constructing an EDA system.Peer reviewe