4 research outputs found

    Local Pattern Detection: International Seminar Dagstuhl Castle, Germany, April 12-16, 2004, Revised Selected Papers

    No full text
    International audienceIntroduction The dramatic increase in available computer storage capacity over the last 10 years has led to the creation of very large databases of scienti?c and commercial information. The need to analyze these masses of data has led to the evolution of the new ?eld knowledge discovery in databases (KDD) at the intersection of machine learning, statistics and database technology. Being interdisciplinary by nature, the ?eld o?ers the opportunity to combine the expertise of di?erent ?elds intoacommonobjective.Moreover,withineach?elddiversemethodshave been developed and justi?ed with respect to di?erent quality criteria. We have toinvestigatehowthesemethods cancontributeto solvingthe problemofKDD. Traditionally, KDD was seeking to ?nd global models for the data that - plain most of the instances of the database and describe the general structure of the data. Examples are statistical time series models, cluster models, logic programs with high coverageor classi?cation models like decision trees or linear decision functions. In practice, though, the use of these models often is very l- ited, because global models tend to ?nd only the obvious patterns in the data, 1 which domain experts already are aware of . What is really of interest to the users are the local patterns that deviate from the already-known background knowledge. David Hand, who organized a workshop in 2002, proposed the new ?eld of local patterns

    Local Pattern Detection: International Seminar Dagstuhl Castle, Germany, April 12-16, 2004, Revised Selected Papers

    No full text
    International audienceIntroduction The dramatic increase in available computer storage capacity over the last 10 years has led to the creation of very large databases of scienti?c and commercial information. The need to analyze these masses of data has led to the evolution of the new ?eld knowledge discovery in databases (KDD) at the intersection of machine learning, statistics and database technology. Being interdisciplinary by nature, the ?eld o?ers the opportunity to combine the expertise of di?erent ?elds intoacommonobjective.Moreover,withineach?elddiversemethodshave been developed and justi?ed with respect to di?erent quality criteria. We have toinvestigatehowthesemethods cancontributeto solvingthe problemofKDD. Traditionally, KDD was seeking to ?nd global models for the data that - plain most of the instances of the database and describe the general structure of the data. Examples are statistical time series models, cluster models, logic programs with high coverageor classi?cation models like decision trees or linear decision functions. In practice, though, the use of these models often is very l- ited, because global models tend to ?nd only the obvious patterns in the data, 1 which domain experts already are aware of . What is really of interest to the users are the local patterns that deviate from the already-known background knowledge. David Hand, who organized a workshop in 2002, proposed the new ?eld of local patterns

    Experiments with Two Approaches for Tracking Drifting Concepts

    Get PDF
    . This paper addresses the task of learning classifier from stream of labelled data. In this case we can face problem that the underling concepts can changes over time. The paper studies two mechanisms developed for dealing with changing concepts. Both are based on the time window idea. The first one forgets gradual, by assigning to the examples weight that gradually decreases over time. The second one uses a statistical test to detect changes in concept and then optimizes the size of time window, aiming to maximise the classification accuracy on the new examples. Both methods are general in nature and can be used with any learning algorithm. The objectives of the conducted experiments were to compare the mechanisms and explore whether they can combined to achieve a synergetic effect. Results from experiments with three basic learning algorithms (kNN, ID3 and NBC) using four datasets are reported and discussed

    Subjectively Interesting Subgroup Discovery on Real-valued Targets

    Get PDF
    Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely many if we consider weighted combinations, even for linear combinations. Hence, an obvious question is whether we can automate the search for interesting patterns and visualizations. In this paper, we consider the setting where a user wants to learn as efficiently as possible about real-valued attributes. For example, to understand the distribution of crime rates in different geographic areas in terms of other (numerical, ordinal and/or categorical) variables that describe the areas. We introduce a method to find subgroups in the data that are maximally informative (in the formal Information Theoretic sense) with respect to a single or set of real-valued target attributes. The subgroup descriptions are in terms of a succinct set of arbitrarily-typed other attributes. The approach is based on the Subjective Interestingness framework FORSIED to enable the use of prior knowledge when finding most informative non-redundant patterns, and hence the method also supports iterative data mining.Comment: 12 pages, 10 figures, 2 tables, conference submissio