17 research outputs found

    Rule Learning: From Local Patterns to Global Models

    Get PDF
    In many areas of daily life (e.g. in e-commerce or social networks), massive amounts of data are collected and stored in databases (for future use). Even though the specific information contained in the collected data may already be interesting, more general insights into the data would be more useful. Clearly, a data analysis should aim for a discovery of such pieces of knowledge, but a human inspection becomes less and less feasible to do as the databases become more and more unmanageable. To this end, the KDD process (short for ``Knowledge Discovery in Databases'') provides the tools for a semi-automatic data analysis. Data mining, which is the main component of the KDD process, searches the explicit facts for regularities which represent pieces of knowledge. Usually, these regularities are formulated as local patterns which describe only local characteristics of the data or as global models which explain the whole data. In our work, we will concentrate on local patterns and global models that may be used to predict a feature of interest or class attribute for future and unknown data. Interestingly, predictive local patterns may be used to obtain global predictions in two ways. The integrative approach treats the local patterns as building blocks and builds with their help a global model. The decoding approach aggregates the predictions of the local patterns into a single global prediction. While both approaches are promising, the question, how local patterns may be employed for global modelling, has not been answered satisfactorily yet. To this end, we consider three important aspects of this question in this work. The first aspect is, how may a set of local patterns be employed to obtain optimal global predictions. The LeGo framework (an acronym for ``from \textbf{l}ocal patt\textbf{e}rns to \textbf{g}lobal m\textbf{o}dels'') provides an approach to answer this question. It divides the data mining process into three subsequent steps: the local pattern discovery generates a set of local patterns, the pattern set discovery step selects a smaller subset from the set of local patterns, and the global modelling employs the reduced pattern set to build a global model. There are many methods available for each step. So, we employ a selection of methods for each step and evaluate their performances with respect to the first considered aspect empirically. The second aspect is, how may a set of local patterns be utilised to obtain optimal class probabilities. Often class probabilities may be more useful than a simple prediction as they may be used as a confidence measure in the prediction (e.g. in voting schemes). We divide this aspect into two sub tasks: the probability estimation and the probability aggregation. The probability estimation calculates class probabilities given a single local pattern. For this task, we consider basic probability estimation methods and shrinkage which is a technique to smooth the basic probability estimations. Furthermore, we examine the effect of the local pattern discovery on the quality of the probability estimation. The probability aggregation decodes the probability estimations of multiple patterns into a single probability estimation. For this purpose, we evaluate the performances of a selection of aggregation methods. The third aspect is, how may a set of local patterns be transformed into a compact and understandable model. Usually, local pattern sets are hard to interpret and their utilisation for prediction necessitates additional efforts (e.g. voting schemes). These issues may be solved at once if the local patterns are employed to obtain a global model. To this end, we introduce rule stacking, which is a novel approach for global modelling. Rule stacking advances the standard stacking approach in two aspects: the meta data generation and the additional retransformation of the meta model. In this way, we obtain a compressed and interpretable global model that is directly applicable to future data

    Pairwise Naive Bayes Classifier

    Get PDF
    Class binarizations are effective methods that break multi-class problem down into several 2- class or binary problems to improve weak learners. This paper analyzes which effects these methods have if we choose a Naive Bayes learner for the base classifier. We consider the known unordered and pairwise class binarizations and propose an alternative approach for a pairwise calculation of a modified Naive Bayes classifier

    Rule Learning: From Local Patterns to Global Models

    No full text
    In many areas of daily life (e.g. in e-commerce or social networks), massive amounts of data are collected and stored in databases (for future use). Even though the specific information contained in the collected data may already be interesting, more general insights into the data would be more useful. Clearly, a data analysis should aim for a discovery of such pieces of knowledge, but a human inspection becomes less and less feasible to do as the databases become more and more unmanageable. To this end, the KDD process (short for ``Knowledge Discovery in Databases'') provides the tools for a semi-automatic data analysis. Data mining, which is the main component of the KDD process, searches the explicit facts for regularities which represent pieces of knowledge. Usually, these regularities are formulated as local patterns which describe only local characteristics of the data or as global models which explain the whole data. In our work, we will concentrate on local patterns and global models that may be used to predict a feature of interest or class attribute for future and unknown data. Interestingly, predictive local patterns may be used to obtain global predictions in two ways. The integrative approach treats the local patterns as building blocks and builds with their help a global model. The decoding approach aggregates the predictions of the local patterns into a single global prediction. While both approaches are promising, the question, how local patterns may be employed for global modelling, has not been answered satisfactorily yet. To this end, we consider three important aspects of this question in this work. The first aspect is, how may a set of local patterns be employed to obtain optimal global predictions. The LeGo framework (an acronym for ``from \textbf{l}ocal patt\textbf{e}rns to \textbf{g}lobal m\textbf{o}dels'') provides an approach to answer this question. It divides the data mining process into three subsequent steps: the local pattern discovery generates a set of local patterns, the pattern set discovery step selects a smaller subset from the set of local patterns, and the global modelling employs the reduced pattern set to build a global model. There are many methods available for each step. So, we employ a selection of methods for each step and evaluate their performances with respect to the first considered aspect empirically. The second aspect is, how may a set of local patterns be utilised to obtain optimal class probabilities. Often class probabilities may be more useful than a simple prediction as they may be used as a confidence measure in the prediction (e.g. in voting schemes). We divide this aspect into two sub tasks: the probability estimation and the probability aggregation. The probability estimation calculates class probabilities given a single local pattern. For this task, we consider basic probability estimation methods and shrinkage which is a technique to smooth the basic probability estimations. Furthermore, we examine the effect of the local pattern discovery on the quality of the probability estimation. The probability aggregation decodes the probability estimations of multiple patterns into a single probability estimation. For this purpose, we evaluate the performances of a selection of aggregation methods. The third aspect is, how may a set of local patterns be transformed into a compact and understandable model. Usually, local pattern sets are hard to interpret and their utilisation for prediction necessitates additional efforts (e.g. voting schemes). These issues may be solved at once if the local patterns are employed to obtain a global model. To this end, we introduce rule stacking, which is a novel approach for global modelling. Rule stacking advances the standard stacking approach in two aspects: the meta data generation and the additional retransformation of the meta model. In this way, we obtain a compressed and interpretable global model that is directly applicable to future data

    Pairwise Naive Bayes Classifier

    No full text

    Paarweiser Naive Bayes Klassifizierer

    No full text

    Rule Stacking: An approach for compressing an ensemble of rule sets into a single classifier

    No full text
    In this paper, we present an approach for compressing a rule-based pairwise classifier ensemble into a single rule set that can be directly used for classification. The key idea is to re-encode the training examples using information about which of the original ruler covers the example, and to use them for training a rule-based meta-level classifier. We not only show that this approach is more accurate than using the same classifier at the base level (which could have been expected for such a variant of stacking), but also demonstrate that the resulting meta-level rule set can be straight-forwardly translated back into a rule set at the base level. Our key result is that the rule sets obtained in this way are of comparable complexity to those of the original rule learner, but considerably more accurate

    A Study of Probability Estimation Techniques for Rule Learning ⋆

    No full text
    Abstract. Rule learning is known for its descriptive and therefore comprehensible classification models which also yield good class predictions. However, in some application areas, we also need good class probability estimates. For different classification models, such as decision trees, a variety of techniques for obtaining good probability estimates have been proposed and evaluated. However, so far, there has been no systematic empirical study of how these techniques can be adapted to probabilistic rules and how these methods affect the probability-based rankings. In this paper we apply several basic methods for the estimation of class membership probabilities to classification rules. We also study the effect of a shrinkage technique for merging the probability estimates of rules with those of their generalizations.
    corecore