210,087 research outputs found
Generalized Discernibility Function Based Attribute Reduction in Incomplete Decision Systems
A rough set approach for attribute reduction is an important research subject in data mining and machine learning. However, most attribute reduction methods are performed on a complete decision system table. In this paper, we propose methods for attribute reduction in static incomplete decision systems and dynamic incomplete decision systems with dynamically-increasing and decreasing conditional attributes. Our methods use generalized discernibility matrix and function in tolerance-based rough sets
Approximations from Anywhere and General Rough Sets
Not all approximations arise from information systems. The problem of fitting
approximations, subjected to some rules (and related data), to information
systems in a rough scheme of things is known as the \emph{inverse problem}. The
inverse problem is more general than the duality (or abstract representation)
problems and was introduced by the present author in her earlier papers. From
the practical perspective, a few (as opposed to one) theoretical frameworks may
be suitable for formulating the problem itself. \emph{Granular operator spaces}
have been recently introduced and investigated by the present author in her
recent work in the context of antichain based and dialectical semantics for
general rough sets. The nature of the inverse problem is examined from
number-theoretic and combinatorial perspectives in a higher order variant of
granular operator spaces and some necessary conditions are proved. The results
and the novel approach would be useful in a number of unsupervised and semi
supervised learning contexts and algorithms.Comment: 20 Pages. Scheduled to appear in IJCRS'2017 LNCS Proceedings,
Springe
New Learning Models for Generating Classification Rules Based on Rough Set Approach
Data sets, static or dynamic, are very important and useful for presenting real life
features in different aspects of industry, medicine, economy, and others. Recently,
different models were used to generate knowledge from vague and uncertain data
sets such as induction decision tree, neural network, fuzzy logic, genetic algorithm,
rough set theory, and others. All of these models take long time to learn for a huge
and dynamic data set. Thus, the challenge is how to develop an efficient model that
can decrease the learning time without affecting the quality of the generated
classification rules. Huge information systems or data sets usually have some
missing values due to unavailable data that affect the quality of the generated
classification rules. Missing values lead to the difficulty of extracting useful
information from that data set. Another challenge is how to solve the problem of
missing data. Rough set theory is a new mathematical tool to deal with vagueness and uncertainty.
It is a useful approach for uncovering classificatory knowledge and building a
classification rules. So, the application of the theory as part of the learning models
was proposed in this thesis.
Two different models for learning in data sets were proposed based on two different
reduction algorithms. The split-condition-merge-reduct algorithm ( SCMR) was
performed on three different modules: partitioning the data set vertically into subsets,
applying rough set concepts of reduction to each subset, and merging the reducts of
all subsets to form the best reduct. The enhanced-split-condition-merge-reduct
algorithm (E SCMR) was performed on the above three modules followed by another
module that applies the rough set reduction concept again to the reduct generated by
SCMR in order to generate the best reduct, which plays the same role as if all
attributes in this subset existed. Classification rules were generated based on the best
reduct.
For the problem of missing data, a new approach was proposed based on data
partitioning and function mode. In this new approach, the data set was partitioned
horizontally into different subsets. All objects in each subset of data were described
by only one classification value. The mode function was applied to each subset of
data that has missing values in order to find the most frequently occurring value in
each attribute. Missing values in that attribute were replaced by the mode value.
The proposed approach for missing values produced better results compared to other
approaches. Also, the proposed models for learning in data sets generated the classification rules faster than other methods. The accuracy of the classification rules
by the proposed models was high compared to other models
An improved moth flame optimization algorithm based on rough sets for tomato diseases detection
Plant diseases is one of the major bottlenecks in agricultural production that have bad effects on the economic of any country. Automatic detection of such disease could minimize these effects. Features selection is a usual pre-processing step used for automatic disease detection systems. It is an important process for detecting and eliminating noisy, irrelevant, and redundant data. Thus, it could lead to improve the detection performance. In this paper, an improved moth-flame approach to automatically detect tomato diseases was proposed. The moth-flame fitness function depends on the rough sets dependency degree and it takes into a consideration the number of selected features. The proposed algorithm used both of the power of exploration of the moth flame and the high performance of rough sets for the feature selection task to find the set of features maximizing the classification accuracy which was evaluated using the support vector machine (SVM). The performance of the MFORSFS algorithm was evaluated using many benchmark datasets taken from UCI machine learning data repository and then compared with feature selection approaches based on Particle Swarm Optimization (PSO) and Genetic Algorithms (GA) with rough sets. The proposed algorithm was then used in a real-life problem, detecting tomato diseases (Powdery mildew and early blight) where a real dataset of tomato disease were manually built and a tomato disease detection approach was proposed and evaluated using this dataset. The experimental results showed that the proposed algorithm was efficient in terms of Recall, Precision, Accuracy and F-Score, as long as feature size reduction and execution time
Fuzzy-Rough Sets Assisted Attribute Selection
Attribute selection (AS) refers to the problem of selecting those input attributes or features that are most predictive of a given outcome; a problem encountered in many areas such as machine learning, pattern recognition and signal processing. Unlike other dimensionality reduction methods, attribute selectors preserve the original meaning of the attributes after reduction. This has found application in tasks that involve datasets containing huge numbers of attributes (in the order of tens of thousands) which, for some learning algorithms, might be impossible to process further. Recent examples include text processing and web content classification. AS techniques have also been applied to small and medium-sized datasets in order to locate the most informative attributes for later use. One of the many successful applications of rough set theory has been to this area. The rough set ideology of using only the supplied data and no other information has many benefits in AS, where most other methods require supplementary knowledge. However, the main limitation of rough set-based attribute selection in the literature is the restrictive requirement that all data is discrete. In classical rough set theory, it is not possible to consider real-valued or noisy data. This paper investigates a novel approach based on fuzzy-rough sets, fuzzy rough feature selection (FRFS), that addresses these problems and retains dataset semantics. FRFS is applied to two challenging domains where a feature reducing step is important; namely, web content classification and complex systems monitoring. The utility of this approach is demonstrated and is compared empirically with several dimensionality reducers. In the experimental studies, FRFS is shown to equal or improve classification accuracy when compared to the results from unreduced data. Classifiers that use a lower dimensional set of attributes which are retained by fuzzy-rough reduction outperform those that employ more attributes returned by the existing crisp rough reduction method. In addition, it is shown that FRFS is more powerful than the other AS techniques in the comparative study
Fuzzy-Rough Data Reduction with Ant Colony Optimization
Feature selection refers to the problem of selecting those input features that are most predictive of a given outcome; a problem encountered in many areas such as machine learning, pattern recognition and signal processing. In particular, solution to this has found successful application in tasks that involve datasets containing huge numbers of features (in the order of tens of thousands), which would be impossible to process further. Recent examples include text processing and web content classification. Rough set theory has been used as such a dataset pre-processor with much success, but current methods are inadequate at finding minimal reductions, the smallest sets of features possible. To alleviate this difficulty, a feature selection technique that employs a hybrid variant of rough sets, fuzzy-rough sets, has been developed recently and has been shown to be effective. However, this method is still not able to find the optimal subsets regularly. This paper proposes a new feature selection mechanism based on Ant Colony Optimization in an attempt to combat this. The method is then applied to the problem of finding optimal feature subsets in the fuzzy-rough data reduction process. The present work is applied to complex systems monitoring and experimentally compared with the original fuzzy-rough method, an entropy-based feature selector, and a transformation-based reduction method, PCA. Comparisons with the use of a support vector classifier are also included
Recommended from our members
Proceedings of the Seventh International Symposium on Methodologies for Intelligent Systems (Poster Session)
This report contains the following papers: Implications in vivid logic; a self-learning bayesian expert system; a natural language generation system for a heterogeneous distributed database system; competence-switching'' managed by intelligent systems; strategy acquisition by an artificial neural network: Experiments in learning to play a stochastic game; viewpoints and selective inheritance in object-oriented modeling; multivariate discretization of continuous attributes for machine learning; utilization of the case-based reasoning method to resolve dynamic problems; formalization of an ontology of ceramic science in CLASSIC; linguistic tools for intelligent systems; an application of rough sets in knowledge synthesis; and a relational model for imprecise queries. These papers have been indexed separately
The usefulness of a machine learning approach to knowledge acquisition
This paper presents results of experiments showing how machine learning methods are useful for rule induction in the process of knowledge acquisition for expert systems. Four machine learning methods were used: ID3, ID3 with dropping conditions, and two options of the system LERS (Learning from Examples based on Rough Sets): LEM1 and LEM2. Two knowledge acquisition options of LERS were used as well. All six methods were used for rule induction from six real-life data sets. The main objective was to test how an expert system, supplied with these rule sets, performs without information on a few attributes. Thus an expert system attempts to classify examples with all missing values of some attributes. As a result of experiments, it is clear that all machine learning methods performed much worse than knowledge acquisition options of LERS. Thus, machine learning methods used for knowledge acquisition should be replaced by other methods of rule induction that will generate complete sets of rules. Knowledge acquisition options of LERS are examples of such appropriate ways of inducing rules for building knowledge bases
- …