437 research outputs found
A Comparison of the Quality of Rule Induction from Inconsistent Data Sets and Incomplete Data Sets
In data mining, decision rules induced from known examples are used to classify unseen cases. There are various rule induction algorithms, such as LEM1 (Learning from Examples Module version 1), LEM2 (Learning from Examples Module version 2) and MLEM2 (Modified Learning from Examples Module version 2). In the real world, many data sets are imperfect, either inconsistent or incomplete. The idea of lower and upper approximations, or more generally, the probabilistic approximation, provides an effective way to induce rules from inconsistent data sets and incomplete data sets. But the accuracies of rule sets induced from imperfect data sets are expected to be lower. The objective of this project is to investigate which kind of imperfect data sets (inconsistent or incomplete) is worse in terms of the quality of rule induction. In this project, experiments were conducted on eight inconsistent data sets and eight incomplete data sets with lost values. We implemented the MLEM2 algorithm to induce certain and possible rules from inconsistent data sets, and implemented the local probabilistic version of MLEM2 algorithm to induce certain and possible rules from incomplete data sets. A program called Rule Checker was also developed to classify unseen cases with induced rules and measure the classification error rate. Ten-fold cross validation was carried out and the average error rate was used as the criterion for comparison. The Mann-Whitney nonparametric tests were performed to compare, separately for certain and possible rules, incompleteness with inconsistency. The results show that there is no significant difference between inconsistent and incomplete data sets in terms of the quality of rule induction
Learning Fuzzy {\beta}-Certain and {\beta}-Possible rules from incomplete quantitative data by rough sets
The rough-set theory proposed by Pawlak, has been widely used in dealing with
data classification problems. The original rough-set model is, however, quite
sensitive to noisy data. Tzung thus proposed deals with the problem of
producing a set of fuzzy certain and fuzzy possible rules from quantitative
data with a predefined tolerance degree of uncertainty and misclassification.
This model allowed, which combines the variable precision rough-set model and
the fuzzy set theory, is thus proposed to solve this problem. This paper thus
deals with the problem of producing a set of fuzzy certain and fuzzy possible
rules from incomplete quantitative data with a predefined tolerance degree of
uncertainty and misclassification. A new method, incomplete quantitative data
for rough-set model and the fuzzy set theory, is thus proposed to solve this
problem. It first transforms each quantitative value into a fuzzy set of
linguistic terms using membership functions and then finding incomplete
quantitative data with lower and the fuzzy upper approximations. It second
calculates the fuzzy {\beta}-lower and the fuzzy {\beta}-upper approximations.
The certain and possible rules are then generated based on these fuzzy
approximations. These rules can then be used to classify unknown objects.Comment: hi thanks for attentio
A comparison of sixteen classification strategies of rule induction from incomplete data using the MLEM2 algorithm
In data mining, rule induction is a process of extracting formal rules from decision tables, where the later are the tabulated observations, which typically consist of few attributes, i.e., independent variables and a decision, i.e., a dependent variable. Each tuple in the table is considered as a case, and there could be n number of cases for a table specifying each observation. The efficiency of the rule induction depends on how many cases are successfully characterized by the generated set of rules, i.e., ruleset. There are different rule induction algorithms, such as LEM1, LEM2, MLEM2. In the real world, datasets will be imperfect, inconsistent, and incomplete. MLEM2 is an efficient algorithm to deal with such sorts of data, but the quality of rule induction largely depends on the chosen classification strategy. We tried to compare the 16 classification strategies of rule induction using MLEM2 on incomplete data. For this, we implemented MLEM2 for inducing rulesets based on the selection of the type of approximation, i.e., singleton, subset or concept, and the value of alpha for calculating probabilistic approximations. A program called rule checker is used to calculate the error rate based on the classification strategy specified. To reduce the anomalies, we used ten-fold cross-validation to measure the error rate for each classification. Error rates for the above strategies are being calculated for different datasets, compared, and presented
不完全な情報システムのためのラフ集合モデルと知識獲得
国立大学法人長岡技術科学大
Heuristic algorithms for finding distribution reducts in probabilistic rough set model
Attribute reduction is one of the most important topics in rough set theory.
Heuristic attribute reduction algorithms have been presented to solve the
attribute reduction problem. It is generally known that fitness functions play
a key role in developing heuristic attribute reduction algorithms. The
monotonicity of fitness functions can guarantee the validity of heuristic
attribute reduction algorithms. In probabilistic rough set model, distribution
reducts can ensure the decision rules derived from the reducts are compatible
with those derived from the original decision table. However, there are few
studies on developing heuristic attribute reduction algorithms for finding
distribution reducts. This is partly due to the fact that there are no
monotonic fitness functions that are used to design heuristic attribute
reduction algorithms in probabilistic rough set model. The main objective of
this paper is to develop heuristic attribute reduction algorithms for finding
distribution reducts in probabilistic rough set model. For one thing, two
monotonic fitness functions are constructed, from which equivalence definitions
of distribution reducts can be obtained. For another, two modified monotonic
fitness functions are proposed to evaluate the significance of attributes more
effectively. On this basis, two heuristic attribute reduction algorithms for
finding distribution reducts are developed based on addition-deletion method
and deletion method. In particular, the monotonicity of fitness functions
guarantees the rationality of the proposed heuristic attribute reduction
algorithms. Results of experimental analysis are included to quantify the
effectiveness of the proposed fitness functions and distribution reducts.Comment: 44 pages, 24 figure
Extended Tolerance Relation to Define a New Rough Set Model in Incomplete Information Systems
This paper discusses and proposes a rough set model for an incomplete information system, which defines an extended tolerance relation using frequency of attribute values in such a system. It first discusses some rough set extensions in incomplete information systems. Next, “probability of matching” is defined from data in information systems and then measures the degree of tolerance. Consequently, a rough set model is developed using a tolerance relation defined with a threshold. The paper discusses the mathematical properties of the newly developed rough set model and also introduces a method to derive reducts and the core
Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks
Wireless sensor networks (WSN) are envisioned to revolutionize the paradigm of monitoring complex real-world systems at a very high resolution. However, the deployment of a large number of unattended sensor nodes in hostile environments, frequent changes of environment dynamics, and severe resource constraints pose uncertainties and limit the potential use of WSN in complex real-world applications. Although uncertainty management in Artificial Intelligence (AI) is well developed and well investigated, its implications in wireless sensor environments are inadequately addressed. This dissertation addresses uncertainty management issues of spatio-temporal patterns generated from sensor data. It provides a framework for characterizing spatio-temporal pattern in WSN. Using rough set theory and temporal reasoning a novel formalism has been developed to characterize and quantify the uncertainties in predicting spatio-temporal patterns from sensor data. This research also uncovers the trade-off among the uncertainty measures, which can be used to develop a multi-objective optimization model for real-time decision making in sensor data aggregation and samplin
High Granular Operator Spaces, and Less-Contaminated General Rough Mereologies
Granular operator spaces and variants had been introduced and used in
theoretical investigations on the foundations of general rough sets by the
present author over the last few years. In this research, higher order versions
of these are presented uniformly as partial algebraic systems. They are also
adapted for practical applications when the data is representable by data
table-like structures according to a minimalist schema for avoiding
contamination. Issues relating to valuations used in information systems or
tables are also addressed. The concept of contamination introduced and studied
by the present author across a number of her papers, concerns mixing up of
information across semantic domains (or domains of discourse). Rough inclusion
functions (\textsf{RIF}s), variants, and numeric functions often have a direct
or indirect role in contaminating algorithms. Some solutions that seek to
replace or avoid them have been proposed and investigated by the present author
in some of her earlier papers. Because multiple kinds of solution are of
interest to the contamination problem, granular generalizations of RIFs are
proposed, and investigated. Interesting representation results are proved and a
core algebraic strategy for generalizing Skowron-Polkowski style of rough
mereology (though for a very different purpose) is formulated. A number of
examples have been added to illustrate key parts of the proposal in higher
order variants of granular operator spaces. Further algorithms grounded in
mereological nearness, suited for decision-making in human-machine interaction
contexts, are proposed by the present author. Applications of granular
\textsf{RIF}s to partial/soft solutions of the inverse problem are also
invented in this paper.Comment: Research paper: Preprint: Final versio
- …