5 research outputs found

    A Comparison of the Quality of Rule Induction from Inconsistent Data Sets and Incomplete Data Sets

    Get PDF
    In data mining, decision rules induced from known examples are used to classify unseen cases. There are various rule induction algorithms, such as LEM1 (Learning from Examples Module version 1), LEM2 (Learning from Examples Module version 2) and MLEM2 (Modified Learning from Examples Module version 2). In the real world, many data sets are imperfect, either inconsistent or incomplete. The idea of lower and upper approximations, or more generally, the probabilistic approximation, provides an effective way to induce rules from inconsistent data sets and incomplete data sets. But the accuracies of rule sets induced from imperfect data sets are expected to be lower. The objective of this project is to investigate which kind of imperfect data sets (inconsistent or incomplete) is worse in terms of the quality of rule induction. In this project, experiments were conducted on eight inconsistent data sets and eight incomplete data sets with lost values. We implemented the MLEM2 algorithm to induce certain and possible rules from inconsistent data sets, and implemented the local probabilistic version of MLEM2 algorithm to induce certain and possible rules from incomplete data sets. A program called Rule Checker was also developed to classify unseen cases with induced rules and measure the classification error rate. Ten-fold cross validation was carried out and the average error rate was used as the criterion for comparison. The Mann-Whitney nonparametric tests were performed to compare, separately for certain and possible rules, incompleteness with inconsistency. The results show that there is no significant difference between inconsistent and incomplete data sets in terms of the quality of rule induction

    A rough set-based effective rule generation method for classification with an application in intrusion detection

    Get PDF
    Abstract: In this paper, we use Rough Set Theory (RST) to address the important problem of generating decision rules for data mining. In particular, we propose a rough set-based approach to mine rules from inconsistent data. It computes the lower and upper approximations for each concept, and then builds concise classification rules for each concept satisfying required classification accuracy. Estimating lower and upper approximations substantially reduces the computational complexity of the algorithm. We use UCI ML Repository data sets to test and validate the approach. We also use our approach on network intrusion data sets captured using our local network from network flows. The results show that our approach produces effective and minimal rules and provides satisfactory accuracy. Keywords: rough set; LEM2; inconsistency; minimal; redundant; PCS; intrusion detection; network flow data. Reference to this paper should be made as follows: Gogoi, P., Bhattacharyya, D.K. and Kalita, J.K. (2013) 'A rough set-based effective rule generation method for classification with an application in intrusion detection', Int

    Topological and algebraic characterization of coverings sets obtained in rough sets discretization and attribute reduction algorithms

    Get PDF
    Abstract. A systematic study on approximation operators in covering based rough sets and some relations with relation based rough sets are presented. Two different frameworks of approximation operators in covering based rough sets were unified in a general framework of dual pairs. This work establishes some relationships between the most important generalization of rough set theory: Covering based and relation based rough sets. A structured genetic algorithm to discretize, to find reducts and to select approximation operators for classification problems is presented.Se presenta un estudio sistemático de los diferentes operadores de aproximación en conjuntos aproximados basados en cubrimientos y operadores de aproximación basados en relaciones binarias. Se unifican dos marcos de referencia sobre operadores de aproximación basados en cubrimientos en un único marco de referencia con pares duales. Se establecen algunas relaciones entre operadores de aproximación de dos de las más importantes generalizaciones de la teoría de conjuntos aproximados. Finalmente, se presenta un algoritmo genético estructurado, para discretizar, reducir atributos y seleccionar operadores de aproximación, en problemas de clasificación.Doctorad

    A semantical and computational approach to covering-based rough sets

    Get PDF

    不完全な情報システムのためのラフ集合モデルと知識獲得

    Get PDF
    国立大学法人長岡技術科学大
    corecore