12 research outputs found
Discovery of Decision Rules by Matching New Objects Against Data Tables
. In this paper we present an exemplary algorithm classifying new objects by matching them directly against data table to generate relevant decision instead of matching it against all rules generated from data table (see [1]). We report results of experiments on three medical data sets, concerning lymphography, breast cancer and primary tumor (see [8]). We compare standard methods for extracting laws from decision tables (see e.g. [17], [1]), based on rough set (see [13]) and boolean reasoning (see [2]), with the method based on algorithms calculating relevant decision rules for new objects. We also compare the results of computer experiments on those data sets obtained by applying our system based on rough set methods with the results on the same data sets obtained with help of several data analysis systems known from literature. 1 Introduction A classification algorithm is an algorithm which permits us to repeatedly make a forecast on the basis of accumulated knowledge i..
Suppressing microdata to prevent classification based inference
The revolution of the Internet together with the progression in computer technology makes it easy for institutions to collect an unprecedented amount of personal data. This pervasive data collection rally coupled with the increasing necessity of dissemination and sharing of non-aggregated data, i.e., microdata, raised a lot of concerns about privacy. One method to ensure privacy is to selectively hide the confidential, i.e. sensitive, information before disclosure. However, with data mining techniques, it is now possible for an adversary to predict the hidden confidential information from the disclosed data sets. In this paper, we concentrate on one such data mining technique called classification. We extend our previous work on microdata suppression to prevent both probabilistic and decision tree classification based inference. We also provide experimental results showing the effectiveness of not only the proposed methods but also the hybrid methods, i.e., methods suppressing microdata against both classification models, on real-life data sets