5 research outputs found

    Ontology-Based Knowledge Model for Multi-View KDD Process

    Get PDF
    International audienceKnowledge Discovery in Databases (KDD) is a highly complex, iterative and interactive process that involves several types of knowledge and expertise. In this paper we propose to support users of a multi-view analysis (a KDD process held by several experts who analyze the same data with different viewpoints). Our objective is to enhance both the reusability of the process and coordination between users. To do so, we propose a formalization of viewpoint in KDD and a Knowledge Model that structures domain knowledge involved in a multi-view analysis. Our formalization, using OWL ontologies, of viewpoint notion is based on CRISP-DM standard through the identification of a set of generic criteria that characterize a viewpoint in KDD

    Goal-Driven Approach to Model Interaction between Viewpoints of a Multi-View KDD process

    Get PDF
    International audienceA data mining project is usually held by several actors (domain experts, data analysts, KDD experts ...), each with a different viewpoint. In this paper we propose to enhance coordination and knowledge sharing between actors of a multiview KDD analysis through a goal driven modeling of interactions between viewpoints. After a brief review of our approach of viewpoint in KDD, we will first develop a Goal Model that allows identification and representation of business objectives during the business understanding step of KDD process. Then, based on this goal model, we define a set of relations between viewpoints of a multi-view analysis; namely equivalence, inclusion, conflict and requirement

    Cost sensitive meta-learning

    Get PDF
    Classification is one of the primary tasks of data mining and aims to assign a class label to unseen examples by using a model learned from a training dataset. Most of the accepted classifiers are designed to minimize the error rate but in practice data mining involves costs such as the cost of getting the data, and cost of making an error. Hence the following question arises:Among all the available classification algorithms, and in considering a specific type of data and cost, which is the best algorithm for my problem?It is well known to the machine learning community that there is no single algorithm that performs best for all domains. This observation motivates the need to develop an “algorithm selector” which is the work of automating the process of choosing between different algorithms given a specific domain of application. Thus, this research develops a new meta-learning system for recommending cost-sensitive classification methods. The system is based on the idea of applying machine learning to discover knowledge about the performance of different data mining algorithms. It includes components that repeatedly apply different classification methods on data sets and measuring their performance. The characteristics of the data sets, combined with the algorithm and the performance provide the training examples. A decision tree algorithm is applied on the training examples to induce the knowledge which can then be applied to recommend algorithms for new data sets, and then active learning is used to automate the ability to choose the most informative data set that should enter the learning process.This thesis makes contributions to both the fields of meta-learning, and cost sensitive learning in that it develops a new meta-learning approach for recommending cost-sensitive methods. Although, meta-learning is not new, the task of accelerating the learning process remains an open problem, and the thesis develops a novel active learning strategy based on clustering that gives the learner the ability to choose which data to learn from and accordingly, speed up the meta-learning process.Both the meta-learning system and use of active learning are implemented in the WEKA system and evaluated by applying them on different datasets and comparing the results with existing studies available in the literature. The results show that the meta-learning system developed produces better results than METAL, a well-known meta-learning system and that the use of clustering and active learning has a positive effect on accelerating the meta-learning process, where all tested datasets show a decrement of error rate prediction by 75 %
    corecore