1,978 research outputs found
Post-processing of association rules.
In this paper, we situate and motivate the need for a post-processing phase to the association rule mining algorithm when plugged into the knowledge discovery in databases process. Major research effort has already been devoted to optimising the initially proposed mining algorithms. When it comes to effectively extrapolating the most interesting knowledge nuggets from the standard output of these algorithms, one is faced with an extreme challenge, since it is not uncommon to be confronted with a vast amount of association rules after running the algorithms. The sheer multitude of generated rules often clouds the perception of the interpreters. Rightful assessment of the usefulness of the generated output introduces the need to effectively deal with different forms of data redundancy and data being plainly uninteresting. In order to do so, we will give a tentative overview of some of the main post-processing tasks, taking into account the efforts that have already been reported in the literature.
Subjectively Interesting Subgroup Discovery on Real-valued Targets
Deriving insights from high-dimensional data is one of the core problems in
data mining. The difficulty mainly stems from the fact that there are
exponentially many variable combinations to potentially consider, and there are
infinitely many if we consider weighted combinations, even for linear
combinations. Hence, an obvious question is whether we can automate the search
for interesting patterns and visualizations. In this paper, we consider the
setting where a user wants to learn as efficiently as possible about
real-valued attributes. For example, to understand the distribution of crime
rates in different geographic areas in terms of other (numerical, ordinal
and/or categorical) variables that describe the areas. We introduce a method to
find subgroups in the data that are maximally informative (in the formal
Information Theoretic sense) with respect to a single or set of real-valued
target attributes. The subgroup descriptions are in terms of a succinct set of
arbitrarily-typed other attributes. The approach is based on the Subjective
Interestingness framework FORSIED to enable the use of prior knowledge when
finding most informative non-redundant patterns, and hence the method also
supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio
Towards Role Based Hypothesis Evaluation for Health Data Mining
Data mining researchers have long been concerned with the application of tools to facilitate and
improve data analysis on large, complex data sets. The current challenge is to make data mining
and knowledge discovery systems applicable to a wider range of domains, among them health.
Early work was performed over transactional, retail based data sets, but the attraction of finding
previously unknown knowledge from the ever increasing amounts of data collected from the health
domain is an emerging area of interest and specialisation. The problem is finding a solution that is
suitably flexible to allow for generalised application whilst being specific enough to provide functionality
that caters for the nuances of each role within the domain. The need for a more granular
approach to problem solving in other areas of information technology has resulted in the use of role
based solutions. This paper discusses the progress to date in developing a role oriented solution to
the problem of providing for the diverse requirements of health domain data miners and defining the
foundation for determining what constitutes an interesting discovery in an area as complex as
health
- …