12 research outputs found

    Applications and Research Problems of Subgroup Mining

    No full text
    . Knowledge Discovery in Databases (KDD) is a data analysis process which, in contrast to conventional data analysis, automatically generates and evaluates very many hypotheses, deals with complex, i.e. large, high dimensional, multi relational, dynamic, or heterogeneous data, and produces understandable results for those who "own the data". With these objectives, subgroup mining searches for hypotheses that can be supported or confirmed by the given data and that are represented as a specialization of one of three general hypothesis types: deviating subgroups, associations between two subgroups, and partially ordered sets of subgroups where the partial ordering usually relates to time. This paper gives a short introduction into the methods of subgroup mining. Especially the main preprocessing, data mining and postprocessing steps are discussed in more detail for two applications. We conclude with some problems of the current state of the art of subgroup mining. 1 Introduc..

    Deviation Analysis

    No full text
    : The two general data analytic questions of subgroup mining (B2.2) deal with deviations and associations (C5.2.3, C5.2.4). A deviation pattern describes a deviating behavior (distribution) of a target variable in a subgroup. Target variable and behavior type are selected by the analyst for an individual mining task, the deviating subgroups are determined by the mining method. Deviation patterns rely on statistical tests and thus capture knowledge about a subgroup in form of a verified alternative hypothesis on the distribution of a target variable. Typically the rejected null hypothesis assumes an uninteresting, not deviating subgroup. Search for deviating subgroups is organized in two phases. In a first brute force search, different search heuristics can be applied to find a set of deviating subgroups. In a second refinement phase, redundancy elimination operators construct a best system of subgroups from the brute force search results. We discuss the role of tests for subgroup minin..

    Change Analysis

    No full text
    : Micro data are often available for several time points, especially when new data are incrementally collected, for instance by adding new batches of objects regularly (e.g. daily or monthly). In this section, we summarize subgroup mining approaches to analyse several cross sections of data, each representing a special time point. We assume the more general case of independent cross sections not necessarily containing the same objects. Change patterns are then typically more useful for an analyst, since in this regularly proceeding or incremental situation, the main static patterns related to a special time point (C5.3.1) are often quite stable over time and mostly well known. Specifically we deal with analysing change (two time points) or trend (sequence of equidistant time points), and we discuss pattern elaboration to refine or combine diverse types of patterns and to deal with some pecularities (Simpsons paradox). Keywords: subgroup mining, change tests, trend tests, Simpson parad..

    Exploration of Simulation Experiments by Discovery

    No full text
    : We exemplify in this paper, how a discovery system is applied to the analysis of simulation experiments in practical political planning, and show what kind of new knowledge can be discovered in an application area that differs from others by the high amount of knowledge that the analyst holds already about the process that generates the data. Subgoals like "low classification accuracy", "high homogeneity", "disjoint rules", etc. are introduced into Explora, to select between different statistical tests for each pattern and several search algorithms, allowing the user to adapt the discovery process to the special requirements of the application. The combination of discovery with simulation is endowed with the main characteristics of both Knowledge Discovery in Databases (KDD) and Automated Scientific Discovery (ASD), i.e. discovery in large databases and experimentation. Analysing a real system with simulation models allows to freely set the experimental conditions. In distinction to ..

    MiSoSouP

    No full text
    corecore