Search CORE

437 research outputs found

A Comparison of the Quality of Rule Induction from Inconsistent Data Sets and Incomplete Data Sets

Author: Su Xiaomeng
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2015
Field of study

In data mining, decision rules induced from known examples are used to classify unseen cases. There are various rule induction algorithms, such as LEM1 (Learning from Examples Module version 1), LEM2 (Learning from Examples Module version 2) and MLEM2 (Modified Learning from Examples Module version 2). In the real world, many data sets are imperfect, either inconsistent or incomplete. The idea of lower and upper approximations, or more generally, the probabilistic approximation, provides an effective way to induce rules from inconsistent data sets and incomplete data sets. But the accuracies of rule sets induced from imperfect data sets are expected to be lower. The objective of this project is to investigate which kind of imperfect data sets (inconsistent or incomplete) is worse in terms of the quality of rule induction. In this project, experiments were conducted on eight inconsistent data sets and eight incomplete data sets with lost values. We implemented the MLEM2 algorithm to induce certain and possible rules from inconsistent data sets, and implemented the local probabilistic version of MLEM2 algorithm to induce certain and possible rules from incomplete data sets. A program called Rule Checker was also developed to classify unseen cases with induced rules and measure the classification error rate. Ten-fold cross validation was carried out and the average error rate was used as the criterion for comparison. The Mann-Whitney nonparametric tests were performed to compare, separately for certain and possible rules, incompleteness with inconsistency. The results show that there is no significant difference between inconsistent and incomplete data sets in terms of the quality of rule induction

Learning Fuzzy {\beta}-Certain and {\beta}-Possible rules from incomplete quantitative data by rough sets

Author: Asadzadeh L.
Mohammadi Ali Soltan
Rezaee D. D.
Publication venue
Publication date: 06/04/2012
Field of study

The rough-set theory proposed by Pawlak, has been widely used in dealing with data classification problems. The original rough-set model is, however, quite sensitive to noisy data. Tzung thus proposed deals with the problem of producing a set of fuzzy certain and fuzzy possible rules from quantitative data with a predefined tolerance degree of uncertainty and misclassification. This model allowed, which combines the variable precision rough-set model and the fuzzy set theory, is thus proposed to solve this problem. This paper thus deals with the problem of producing a set of fuzzy certain and fuzzy possible rules from incomplete quantitative data with a predefined tolerance degree of uncertainty and misclassification. A new method, incomplete quantitative data for rough-set model and the fuzzy set theory, is thus proposed to solve this problem. It first transforms each quantitative value into a fuzzy set of linguistic terms using membership functions and then finding incomplete quantitative data with lower and the fuzzy upper approximations. It second calculates the fuzzy {\beta}-lower and the fuzzy {\beta}-upper approximations. The certain and possible rules are then generated based on these fuzzy approximations. These rules can then be used to classify unknown objects.Comment: hi thanks for attentio

arXiv.org e-Print Archive

A comparison of sixteen classification strategies of rule induction from incomplete data using the MLEM2 algorithm

Author: Nelakurthi Venkata Siva Pavan Kumar Kumar
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2020
Field of study

In data mining, rule induction is a process of extracting formal rules from decision tables, where the later are the tabulated observations, which typically consist of few attributes, i.e., independent variables and a decision, i.e., a dependent variable. Each tuple in the table is considered as a case, and there could be n number of cases for a table specifying each observation. The efficiency of the rule induction depends on how many cases are successfully characterized by the generated set of rules, i.e., ruleset. There are different rule induction algorithms, such as LEM1, LEM2, MLEM2. In the real world, datasets will be imperfect, inconsistent, and incomplete. MLEM2 is an efficient algorithm to deal with such sorts of data, but the quality of rule induction largely depends on the chosen classification strategy. We tried to compare the 16 classification strategies of rule induction using MLEM2 on incomplete data. For this, we implemented MLEM2 for inducing rulesets based on the selection of the type of approximation, i.e., singleton, subset or concept, and the value of alpha for calculating probabilistic approximations. A program called rule checker is used to calculate the error rate based on the classification strategy specified. To reduce the anomalies, we used ten-fold cross-validation to measure the error rate for each classification. Error rates for the above strategies are being calculated for different datasets, compared, and presented

不完全な情報システムのためのラフ集合モデルと知識獲得

Author: NGUYEN DO VAN
Publication venue: 教授（主査）山田耕一、教授　福村好美、教授　湯川高志、准教授　マーラシンハ　チャンドラジット　アーシュボーダ、准教授　高橋弘毅
Publication date: 31/08/2014
Field of study

国立大学法人長岡技術科学大

Heuristic algorithms for finding distribution reducts in probabilistic rough set model

Author: Ma Xi'ao
Wang Guoyin
Yu Hong
Publication venue
Publication date: 22/12/2015
Field of study

Attribute reduction is one of the most important topics in rough set theory. Heuristic attribute reduction algorithms have been presented to solve the attribute reduction problem. It is generally known that fitness functions play a key role in developing heuristic attribute reduction algorithms. The monotonicity of fitness functions can guarantee the validity of heuristic attribute reduction algorithms. In probabilistic rough set model, distribution reducts can ensure the decision rules derived from the reducts are compatible with those derived from the original decision table. However, there are few studies on developing heuristic attribute reduction algorithms for finding distribution reducts. This is partly due to the fact that there are no monotonic fitness functions that are used to design heuristic attribute reduction algorithms in probabilistic rough set model. The main objective of this paper is to develop heuristic attribute reduction algorithms for finding distribution reducts in probabilistic rough set model. For one thing, two monotonic fitness functions are constructed, from which equivalence definitions of distribution reducts can be obtained. For another, two modified monotonic fitness functions are proposed to evaluate the significance of attributes more effectively. On this basis, two heuristic attribute reduction algorithms for finding distribution reducts are developed based on addition-deletion method and deletion method. In particular, the monotonicity of fitness functions guarantees the rationality of the proposed heuristic attribute reduction algorithms. Results of experimental analysis are included to quantify the effectiveness of the proposed fitness functions and distribution reducts.Comment: 44 pages, 24 figure

arXiv.org e-Print Archive

Extended Tolerance Relation to Define a New Rough Set Model in Incomplete Information Systems

Author: Do Van Nguyen
Koichi Yamada
Muneyuki Unehara
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

This paper discusses and proposes a rough set model for an incomplete information system, which defines an extended tolerance relation using frequency of attribute values in such a system. It first discusses some rough set extensions in incomplete information systems. Next, “probability of matching” is defined from data in information systems and then measures the degree of tolerance. Consequently, a rough set model is developed using a tolerance relation defined with a threshold. The paper discusses the mathematical properties of the newly developed rough set model and also introduces a method to derive reducts and the core

Directory of Open Access Journals

Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks

Author: Mal-sarkar Sanchita
Publication venue: EngagedScholarship@CSU
Publication date: 01/01/2009
Field of study

Wireless sensor networks (WSN) are envisioned to revolutionize the paradigm of monitoring complex real-world systems at a very high resolution. However, the deployment of a large number of unattended sensor nodes in hostile environments, frequent changes of environment dynamics, and severe resource constraints pose uncertainties and limit the potential use of WSN in complex real-world applications. Although uncertainty management in Artificial Intelligence (AI) is well developed and well investigated, its implications in wireless sensor environments are inadequately addressed. This dissertation addresses uncertainty management issues of spatio-temporal patterns generated from sensor data. It provides a framework for characterizing spatio-temporal pattern in WSN. Using rough set theory and temporal reasoning a novel formalism has been developed to characterize and quantify the uncertainties in predicting spatio-temporal patterns from sensor data. This research also uncovers the trade-off among the uncertainty measures, which can be used to develop a multi-objective optimization model for real-time decision making in sensor data aggregation and samplin

OhioLINK Electronic Thesis and Dissertation Center

Cleveland-Marshall College of Law

High Granular Operator Spaces, and Less-Contaminated General Rough Mereologies

Author: Mani A.
Publication venue
Publication date: 14/10/2019
Field of study

Granular operator spaces and variants had been introduced and used in theoretical investigations on the foundations of general rough sets by the present author over the last few years. In this research, higher order versions of these are presented uniformly as partial algebraic systems. They are also adapted for practical applications when the data is representable by data table-like structures according to a minimalist schema for avoiding contamination. Issues relating to valuations used in information systems or tables are also addressed. The concept of contamination introduced and studied by the present author across a number of her papers, concerns mixing up of information across semantic domains (or domains of discourse). Rough inclusion functions (\textsf{RIF}s), variants, and numeric functions often have a direct or indirect role in contaminating algorithms. Some solutions that seek to replace or avoid them have been proposed and investigated by the present author in some of her earlier papers. Because multiple kinds of solution are of interest to the contamination problem, granular generalizations of RIFs are proposed, and investigated. Interesting representation results are proved and a core algebraic strategy for generalizing Skowron-Polkowski style of rough mereology (though for a very different purpose) is formulated. A number of examples have been added to illustrate key parts of the proposal in higher order variants of granular operator spaces. Further algorithms grounded in mereological nearness, suited for decision-making in human-machine interaction contexts, are proposed by the present author. Applications of granular \textsf{RIF}s to partial/soft solutions of the inverse problem are also invented in this paper.Comment: Research paper: Preprint: Final versio

arXiv.org e-Print Archive

Some contributions to decision making in complex information settings with imprecise probabilities and incomplete preferences

Author: Jansen Christoph
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 20/07/2018
Field of study

A semantical and computational approach to covering-based rough sets

Author: D'eer Lynn
Publication venue: Ghent University. Faculty of Sciences
Publication date: 01/01/2017
Field of study