4 research outputs found

    A combined data mining approach using rough set theory and case-based reasoning in medical datasets

    Get PDF
    Case-based reasoning (CBR) is the process of solving new cases by retrieving the most relevant ones from an existing knowledge-base. Since, irrelevant or redundant features not only remarkably increase memory requirements but also the time complexity of the case retrieval, reducing the number of dimensions is an issue worth considering. This paper uses rough set theory (RST) in order to reduce the number of dimensions in a CBR classifier with the aim of increasing accuracy and efficiency. CBR exploits a distance based co-occurrence of categorical data to measure similarity of cases. This distance is based on the proportional distribution of different categorical values of features. The weight used for a feature is the average of co-occurrence values of the features. The combination of RST and CBR has been applied to real categorical datasets of Wisconsin Breast Cancer, Lymphography, and Primary cancer. The 5-fold cross validation method is used to evaluate the performance of the proposed approach. The results show that this combined approach lowers computational costs and improves performance metrics including accuracy and interpretability compared to other approaches developed in the literature

    Using Bayesian Networks for Selecting Classifiers in GP Ensembles

    No full text
    Ensemble techniques have been widely used to improve classification performance also in the case of GP-based systems. These techniques should improve classification accuracy by using voting strategies to combine the responses of different classifiers. However, even reducing the number of classifiers composing the ensemble, by selecting only those appropriately "diverse" according to a given measure, gives no guarantee of obtaining significant improvements in both classification accuracy and generalization capacity. This paper presents a novel approach for combining GP-based ensembles by means of a Bayesian Network. The proposed system is able to learn and combine decision tree ensembles effectively by using two different strategies: in the first, decision tree ensembles are learned by means of a boosted GP algorithm; in the second, the responses of the ensemble are combined using a Bayesian network, which also implements a selection strategy to reduce the number of classifiers. Experiments on several data sets show that the approach obtains comparable or better accuracy with respect to other methods proposed in the literature, considerably reducing the number of classifiers used. In addition, a comparison with similar approaches, confirmed the goodness of our method and its superiority with respect to other selection techniques based on diversity
    corecore