115 research outputs found

    HANDLING MISSING ATTRIBUTE VALUES IN DECISION TABLES USING VALUED TOLERANCE APPROACH

    Get PDF
    Rule induction is one of the key areas in data mining as it is applied to a large number of real life data. However, in such real life data, the information is incompletely specified most of the time. To induce rules from these incomplete data, more powerful algorithms are necessary. This research work mainly focuses on a probabilistic approach based on the valued tolerance relation. This thesis is divided into two parts. The first part describes the implementation of the valued tolerance relation. The induced rules are then evaluated based on the error rate due to incorrectly classified and unclassified examples. The second part of this research work shows a comparison of the rules induced by the MLEM2 algorithm that has been implemented before, with the rules induced by the valued tolerance based approach which was implemented as part of this research. Hence, through this thesis, the error rate for the MLEM2 algorithm and the valued tolerance based approach are compared and the results are documented

    Rough set and rule-based multicriteria decision aiding

    Get PDF
    The aim of multicriteria decision aiding is to give the decision maker a recommendation concerning a set of objects evaluated from multiple points of view called criteria. Since a rational decision maker acts with respect to his/her value system, in order to recommend the most-preferred decision, one must identify decision maker's preferences. In this paper, we focus on preference discovery from data concerning some past decisions of the decision maker. We consider the preference model in the form of a set of "if..., then..." decision rules discovered from the data by inductive learning. To structure the data prior to induction of rules, we use the Dominance-based Rough Set Approach (DRSA). DRSA is a methodology for reasoning about data, which handles ordinal evaluations of objects on considered criteria and monotonic relationships between these evaluations and the decision. We review applications of DRSA to a large variety of multicriteria decision problems

    A comparison of sixteen classification strategies of rule induction from incomplete data using the MLEM2 algorithm

    Get PDF
    In data mining, rule induction is a process of extracting formal rules from decision tables, where the later are the tabulated observations, which typically consist of few attributes, i.e., independent variables and a decision, i.e., a dependent variable. Each tuple in the table is considered as a case, and there could be n number of cases for a table specifying each observation. The efficiency of the rule induction depends on how many cases are successfully characterized by the generated set of rules, i.e., ruleset. There are different rule induction algorithms, such as LEM1, LEM2, MLEM2. In the real world, datasets will be imperfect, inconsistent, and incomplete. MLEM2 is an efficient algorithm to deal with such sorts of data, but the quality of rule induction largely depends on the chosen classification strategy. We tried to compare the 16 classification strategies of rule induction using MLEM2 on incomplete data. For this, we implemented MLEM2 for inducing rulesets based on the selection of the type of approximation, i.e., singleton, subset or concept, and the value of alpha for calculating probabilistic approximations. A program called rule checker is used to calculate the error rate based on the classification strategy specified. To reduce the anomalies, we used ten-fold cross-validation to measure the error rate for each classification. Error rates for the above strategies are being calculated for different datasets, compared, and presented

    NMGRS: Neighborhood-based multigranulation rough sets

    Get PDF
    AbstractRecently, a multigranulation rough set (MGRS) has become a new direction in rough set theory, which is based on multiple binary relations on the universe. However, it is worth noticing that the original MGRS can not be used to discover knowledge from information systems with various domains of attributes. In order to extend the theory of MGRS, the objective of this study is to develop a so-called neighborhood-based multigranulation rough set (NMGRS) in the framework of multigranulation rough sets. Furthermore, by using two different approximating strategies, i.e., seeking common reserving difference and seeking common rejecting difference, we first present optimistic and pessimistic 1-type neighborhood-based multigranulation rough sets and optimistic and pessimistic 2-type neighborhood-based multigranulation rough sets, respectively. Through analyzing several important properties of neighborhood-based multigranulation rough sets, we find that the new rough sets degenerate to the original MGRS when the size of neighborhood equals zero. To obtain covering reducts under neighborhood-based multigranulation rough sets, we then propose a new definition of covering reduct to describe the smallest attribute subset that preserves the consistency of the neighborhood decision system, which can be calculated by Chen’s discernibility matrix approach. These results show that the proposed NMGRS largely extends the theory and application of classical MGRS in the context of multiple granulations

    The structure of oppositions in rough set theory and formal concept analysis - Toward a new bridge between the two settings

    Get PDF
    Rough set theory (RST) and formal concept analysis (FCA) are two formal settings in information management, which have found applications in learning and in data mining. Both rely on a binary relation. FCA starts with a formal context, which is a relation linking a set of objects with their properties. Besides, a rough set is a pair of lower and upper approximations of a set of objects induced by an indistinguishability relation; in the simplest case, this relation expresses that two objects are indistinguishable because their known properties are exactly the same. It has been recently noticed, with different concerns, that any binary relation on a Cartesian product of two possibly equal sets induces a cube of oppositions, which extends the classical Aristotelian square of oppositions structure, and has remarkable properties. Indeed, a relation applied to a given subset gives birth to four subsets, and to their complements, that can be organized into a cube. These four subsets are nothing but the usual image of the subset by the relation, together with similar expressions where the subset and / or the relation are replaced by their complements. The eight subsets corresponding to the vertices of the cube can receive remarkable interpretations, both in the RST and the FCA settings. One facet of the cube corresponds to the core of RST, while basic FCA operators are found on another facet. The proposed approach both provides an extended view of RST and FCA, and suggests a unified view of both of them. © 2014 Springer International Publishing

    Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks

    Get PDF
    Wireless sensor networks (WSN) are envisioned to revolutionize the paradigm of monitoring complex real-world systems at a very high resolution. However, the deployment of a large number of unattended sensor nodes in hostile environments, frequent changes of environment dynamics, and severe resource constraints pose uncertainties and limit the potential use of WSN in complex real-world applications. Although uncertainty management in Artificial Intelligence (AI) is well developed and well investigated, its implications in wireless sensor environments are inadequately addressed. This dissertation addresses uncertainty management issues of spatio-temporal patterns generated from sensor data. It provides a framework for characterizing spatio-temporal pattern in WSN. Using rough set theory and temporal reasoning a novel formalism has been developed to characterize and quantify the uncertainties in predicting spatio-temporal patterns from sensor data. This research also uncovers the trade-off among the uncertainty measures, which can be used to develop a multi-objective optimization model for real-time decision making in sensor data aggregation and samplin

    Rough set decision algorithms for modeling with uncertainty

    Get PDF
    The use of decision rules allows to extract information and to infer conclusions from relational databases in a reliable way, thanks to some indicators like support and certainty. Moreover, decision algorithms collect a group of decision rules that satisfies desirable properties to describe the relational system. However, when a decision table is considered within a fuzzy environment, it is necessary to extend all notions related to decision algorithms to this framework. This paper presents a generalization of these notions, highlighting the new definitions of indicators of relevance to describe decision rules and decision algorithm

    CLUSTERING ALGORITHMS FOR CATEGORICAL DATA USING CONCEPTS OF SIGNIFICANCE AND DEPENDENCE OF ATTRIBUTES

    Get PDF
    Clustering categorical data is an essential and integral part of data mining. In this paper, we propose two new algorithms for clustering categorical data, namely, the Standard Deviation of Standard Deviation Significance and Standard Deviation of Standard Deviation Dependence algorithms. The proposed techniques are based mainly on rough set theory, taking into account the significance and dependence of attributes of database concepts. Analysis of the performance of the proposed algorithms compared with others shows their efficiency as well as ability to handle uncertainty together with categorical data
    corecore