11,847 research outputs found
New Learning Models for Generating Classification Rules Based on Rough Set Approach
Data sets, static or dynamic, are very important and useful for presenting real life
features in different aspects of industry, medicine, economy, and others. Recently,
different models were used to generate knowledge from vague and uncertain data
sets such as induction decision tree, neural network, fuzzy logic, genetic algorithm,
rough set theory, and others. All of these models take long time to learn for a huge
and dynamic data set. Thus, the challenge is how to develop an efficient model that
can decrease the learning time without affecting the quality of the generated
classification rules. Huge information systems or data sets usually have some
missing values due to unavailable data that affect the quality of the generated
classification rules. Missing values lead to the difficulty of extracting useful
information from that data set. Another challenge is how to solve the problem of
missing data. Rough set theory is a new mathematical tool to deal with vagueness and uncertainty.
It is a useful approach for uncovering classificatory knowledge and building a
classification rules. So, the application of the theory as part of the learning models
was proposed in this thesis.
Two different models for learning in data sets were proposed based on two different
reduction algorithms. The split-condition-merge-reduct algorithm ( SCMR) was
performed on three different modules: partitioning the data set vertically into subsets,
applying rough set concepts of reduction to each subset, and merging the reducts of
all subsets to form the best reduct. The enhanced-split-condition-merge-reduct
algorithm (E SCMR) was performed on the above three modules followed by another
module that applies the rough set reduction concept again to the reduct generated by
SCMR in order to generate the best reduct, which plays the same role as if all
attributes in this subset existed. Classification rules were generated based on the best
reduct.
For the problem of missing data, a new approach was proposed based on data
partitioning and function mode. In this new approach, the data set was partitioned
horizontally into different subsets. All objects in each subset of data were described
by only one classification value. The mode function was applied to each subset of
data that has missing values in order to find the most frequently occurring value in
each attribute. Missing values in that attribute were replaced by the mode value.
The proposed approach for missing values produced better results compared to other
approaches. Also, the proposed models for learning in data sets generated the classification rules faster than other methods. The accuracy of the classification rules
by the proposed models was high compared to other models
Class Association Rules Mining based Rough Set Method
This paper investigates the mining of class association rules with rough set
approach. In data mining, an association occurs between two set of elements
when one element set happen together with another. A class association rule set
(CARs) is a subset of association rules with classes specified as their
consequences. We present an efficient algorithm for mining the finest class
rule set inspired form Apriori algorithm, where the support and confidence are
computed based on the elementary set of lower approximation included in the
property of rough set theory. Our proposed approach has been shown very
effective, where the rough set approach for class association discovery is much
simpler than the classic association method.Comment: 10 pages, 2 figure
Set-oriented data mining in relational databases
Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are expressed in terms of relational or set-oriented operations. Query optimization technology can then be used for efficient processing.\ud
\ud
In this paper, we describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and thus may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the range of parameter values. It is easily parallelized and we suggest several additional optimizations. The set-oriented nature of Algorithm SETM makes it possible to develop extensions easily and its performance makes it feasible to build interactive data mining tools for large databases
Decision table for classifying point sources based on FIRST and 2MASS databases
With the availability of multiwavelength, multiscale and multiepoch
astronomical catalogues, the number of features to describe astronomical
objects has increases. The better features we select to classify objects, the
higher the classification accuracy is. In this paper, we have used data sets of
stars and quasars from near infrared band and radio band. Then best-first
search method was applied to select features. For the data with selected
features, the algorithm of decision table was implemented. The classification
accuracy is more than 95.9%. As a result, the feature selection method improves
the effectiveness and efficiency of the classification method. Moreover the
result shows that decision table is robust and effective for discrimination of
celestial objects and used for preselecting quasar candidates for large survey
projects.Comment: 10 pages. accepted by Advances in Space Researc
An overview of decision table literature 1982-1995.
This report gives an overview of the literature on decision tables over the past 15 years. As much as possible, for each reference, an author supplied abstract, a number of keywords and a classification are provided. In some cases own comments are added. The purpose of these comments is to show where, how and why decision tables are used. The literature is classified according to application area, theoretical versus practical character, year of publication, country or origin (not necessarily country of publication) and the language of the document. After a description of the scope of the interview, classification results and the classification by topic are presented. The main body of the paper is the ordered list of publications with abstract, classification and comments.
- …