437 research outputs found

    A Comparison of the Quality of Rule Induction from Inconsistent Data Sets and Incomplete Data Sets

    Get PDF
    In data mining, decision rules induced from known examples are used to classify unseen cases. There are various rule induction algorithms, such as LEM1 (Learning from Examples Module version 1), LEM2 (Learning from Examples Module version 2) and MLEM2 (Modified Learning from Examples Module version 2). In the real world, many data sets are imperfect, either inconsistent or incomplete. The idea of lower and upper approximations, or more generally, the probabilistic approximation, provides an effective way to induce rules from inconsistent data sets and incomplete data sets. But the accuracies of rule sets induced from imperfect data sets are expected to be lower. The objective of this project is to investigate which kind of imperfect data sets (inconsistent or incomplete) is worse in terms of the quality of rule induction. In this project, experiments were conducted on eight inconsistent data sets and eight incomplete data sets with lost values. We implemented the MLEM2 algorithm to induce certain and possible rules from inconsistent data sets, and implemented the local probabilistic version of MLEM2 algorithm to induce certain and possible rules from incomplete data sets. A program called Rule Checker was also developed to classify unseen cases with induced rules and measure the classification error rate. Ten-fold cross validation was carried out and the average error rate was used as the criterion for comparison. The Mann-Whitney nonparametric tests were performed to compare, separately for certain and possible rules, incompleteness with inconsistency. The results show that there is no significant difference between inconsistent and incomplete data sets in terms of the quality of rule induction

    Learning Fuzzy {\beta}-Certain and {\beta}-Possible rules from incomplete quantitative data by rough sets

    Full text link
    The rough-set theory proposed by Pawlak, has been widely used in dealing with data classification problems. The original rough-set model is, however, quite sensitive to noisy data. Tzung thus proposed deals with the problem of producing a set of fuzzy certain and fuzzy possible rules from quantitative data with a predefined tolerance degree of uncertainty and misclassification. This model allowed, which combines the variable precision rough-set model and the fuzzy set theory, is thus proposed to solve this problem. This paper thus deals with the problem of producing a set of fuzzy certain and fuzzy possible rules from incomplete quantitative data with a predefined tolerance degree of uncertainty and misclassification. A new method, incomplete quantitative data for rough-set model and the fuzzy set theory, is thus proposed to solve this problem. It first transforms each quantitative value into a fuzzy set of linguistic terms using membership functions and then finding incomplete quantitative data with lower and the fuzzy upper approximations. It second calculates the fuzzy {\beta}-lower and the fuzzy {\beta}-upper approximations. The certain and possible rules are then generated based on these fuzzy approximations. These rules can then be used to classify unknown objects.Comment: hi thanks for attentio

    A comparison of sixteen classification strategies of rule induction from incomplete data using the MLEM2 algorithm

    Get PDF
    In data mining, rule induction is a process of extracting formal rules from decision tables, where the later are the tabulated observations, which typically consist of few attributes, i.e., independent variables and a decision, i.e., a dependent variable. Each tuple in the table is considered as a case, and there could be n number of cases for a table specifying each observation. The efficiency of the rule induction depends on how many cases are successfully characterized by the generated set of rules, i.e., ruleset. There are different rule induction algorithms, such as LEM1, LEM2, MLEM2. In the real world, datasets will be imperfect, inconsistent, and incomplete. MLEM2 is an efficient algorithm to deal with such sorts of data, but the quality of rule induction largely depends on the chosen classification strategy. We tried to compare the 16 classification strategies of rule induction using MLEM2 on incomplete data. For this, we implemented MLEM2 for inducing rulesets based on the selection of the type of approximation, i.e., singleton, subset or concept, and the value of alpha for calculating probabilistic approximations. A program called rule checker is used to calculate the error rate based on the classification strategy specified. To reduce the anomalies, we used ten-fold cross-validation to measure the error rate for each classification. Error rates for the above strategies are being calculated for different datasets, compared, and presented

    不完全な情報システムのためのラフ集合モデルと知識獲得

    Get PDF
    国立大学法人長岡技術科学大

    Heuristic algorithms for finding distribution reducts in probabilistic rough set model

    Full text link
    Attribute reduction is one of the most important topics in rough set theory. Heuristic attribute reduction algorithms have been presented to solve the attribute reduction problem. It is generally known that fitness functions play a key role in developing heuristic attribute reduction algorithms. The monotonicity of fitness functions can guarantee the validity of heuristic attribute reduction algorithms. In probabilistic rough set model, distribution reducts can ensure the decision rules derived from the reducts are compatible with those derived from the original decision table. However, there are few studies on developing heuristic attribute reduction algorithms for finding distribution reducts. This is partly due to the fact that there are no monotonic fitness functions that are used to design heuristic attribute reduction algorithms in probabilistic rough set model. The main objective of this paper is to develop heuristic attribute reduction algorithms for finding distribution reducts in probabilistic rough set model. For one thing, two monotonic fitness functions are constructed, from which equivalence definitions of distribution reducts can be obtained. For another, two modified monotonic fitness functions are proposed to evaluate the significance of attributes more effectively. On this basis, two heuristic attribute reduction algorithms for finding distribution reducts are developed based on addition-deletion method and deletion method. In particular, the monotonicity of fitness functions guarantees the rationality of the proposed heuristic attribute reduction algorithms. Results of experimental analysis are included to quantify the effectiveness of the proposed fitness functions and distribution reducts.Comment: 44 pages, 24 figure

    Extended Tolerance Relation to Define a New Rough Set Model in Incomplete Information Systems

    Get PDF
    This paper discusses and proposes a rough set model for an incomplete information system, which defines an extended tolerance relation using frequency of attribute values in such a system. It first discusses some rough set extensions in incomplete information systems. Next, “probability of matching” is defined from data in information systems and then measures the degree of tolerance. Consequently, a rough set model is developed using a tolerance relation defined with a threshold. The paper discusses the mathematical properties of the newly developed rough set model and also introduces a method to derive reducts and the core

    Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks

    Get PDF
    Wireless sensor networks (WSN) are envisioned to revolutionize the paradigm of monitoring complex real-world systems at a very high resolution. However, the deployment of a large number of unattended sensor nodes in hostile environments, frequent changes of environment dynamics, and severe resource constraints pose uncertainties and limit the potential use of WSN in complex real-world applications. Although uncertainty management in Artificial Intelligence (AI) is well developed and well investigated, its implications in wireless sensor environments are inadequately addressed. This dissertation addresses uncertainty management issues of spatio-temporal patterns generated from sensor data. It provides a framework for characterizing spatio-temporal pattern in WSN. Using rough set theory and temporal reasoning a novel formalism has been developed to characterize and quantify the uncertainties in predicting spatio-temporal patterns from sensor data. This research also uncovers the trade-off among the uncertainty measures, which can be used to develop a multi-objective optimization model for real-time decision making in sensor data aggregation and samplin

    High Granular Operator Spaces, and Less-Contaminated General Rough Mereologies

    Full text link
    Granular operator spaces and variants had been introduced and used in theoretical investigations on the foundations of general rough sets by the present author over the last few years. In this research, higher order versions of these are presented uniformly as partial algebraic systems. They are also adapted for practical applications when the data is representable by data table-like structures according to a minimalist schema for avoiding contamination. Issues relating to valuations used in information systems or tables are also addressed. The concept of contamination introduced and studied by the present author across a number of her papers, concerns mixing up of information across semantic domains (or domains of discourse). Rough inclusion functions (\textsf{RIF}s), variants, and numeric functions often have a direct or indirect role in contaminating algorithms. Some solutions that seek to replace or avoid them have been proposed and investigated by the present author in some of her earlier papers. Because multiple kinds of solution are of interest to the contamination problem, granular generalizations of RIFs are proposed, and investigated. Interesting representation results are proved and a core algebraic strategy for generalizing Skowron-Polkowski style of rough mereology (though for a very different purpose) is formulated. A number of examples have been added to illustrate key parts of the proposal in higher order variants of granular operator spaces. Further algorithms grounded in mereological nearness, suited for decision-making in human-machine interaction contexts, are proposed by the present author. Applications of granular \textsf{RIF}s to partial/soft solutions of the inverse problem are also invented in this paper.Comment: Research paper: Preprint: Final versio

    Some contributions to decision making in complex information settings with imprecise probabilities and incomplete preferences

    Get PDF

    A semantical and computational approach to covering-based rough sets

    Get PDF
    corecore