25,198 research outputs found
Empirical analysis of rough set categorical clustering techniques based on rough purity and value set
Clustering a set of objects into homogeneous groups is a fundamental operation
in data mining. Recently, attention has been put on categorical data clustering,
where data objects are made up of non-numerical attributes. The implementation of
several existing categorical clustering techniques is challenging as some are unable
to handle uncertainty and others have stability issues. In the process of dealing
with categorical data and handling uncertainty, the rough set theory has become
well-established mechanism in a wide variety of applications including databases.
The recent techniques such as Information-Theoretic Dependency Roughness (ITDR),
Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA)
outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness
(TR), Min-Min Roughness (MMR), and standard-deviation roughness (SDR). This
work explores the limitations and issues of ITDR, MDA and MSA techniques on
data sets where these techniques fails to select or faces difficulty in selecting their
best clustering attribute. Accordingly, two alternative techniques named Rough Purity
Approach (RPA) and Maximum Value Attribute (MVA) are proposed. The novelty
of both proposed approaches is that, the RPA presents a new uncertainty definition
based on purity of rough relational data base whereas, the MVA unlike other rough
set theory techniques uses the domain knowledge such as value set combined with
number of clusters (NoC). To show the significance, mathematical and theoretical
basis for proposed approaches, several propositions are illustrated. Moreover, the
recent rough categorical techniques like MDA, MSA, ITDR and classical clustering
technique like simple K-mean are used for comparison and the results are presented
in tabular and graphical forms. For experiments, data sets from previously utilized
research cases, a real supply base management (SBM) data set and UCI repository
are utilized. The results reveal significant improvement by proposed techniques for
categorical clustering in terms of purity (21%), entropy (9%), accuracy (16%), rough
accuracy (11%), iterations (99%) and time (93%).
vi
Rough sets theory for travel demand analysis in Malaysia
This study integrates the rough sets theory into tourism demand analysis. Originated from the area of Artificial Intelligence, the rough sets theory was introduced to disclose important structures and to classify objects. The Rough Sets methodology provides definitions and methods for finding which attributes separates one class or classification from another. Based on this theory can propose a formal framework for the automated transformation of data into knowledge. This makes the rough sets approach a useful classification and pattern recognition technique. This study introduces a new rough sets approach for deriving rules from information table of tourist in Malaysia. The induced rules were able to forecast change in demand with certain accuracy
Change detection in categorical evolving data streams
Detecting change in evolving data streams is a central issue for accurate adaptive learning. In real world applications, data streams have categorical features, and changes induced in the data distribution of these categorical features have not been considered extensively so far. Previous work on change detection focused on detecting changes in the accuracy of the learners, but without considering changes in the data distribution.
To cope with these issues, we propose a new unsupervised change detection method, called CDCStream (Change Detection in Categorical Data Streams), well suited for categorical data streams. The proposed method is able to detect changes in a batch incremental scenario. It is based on the two following characteristics: (i) a summarization strategy is proposed to compress the actual batch by extracting a descriptive summary and (ii) a new segmentation algorithm is proposed to highlight changes and issue warnings for a data stream. To evaluate our proposal we employ it in a learning task over real world data and we compare its results with state of the art methods. We also report qualitative evaluation in order to show the behavior of CDCStream
Interpretations of Association Rules by Granular Computing
We present interpretations for association rules. We first introduce Pawlak's method, and the corresponding algorithm of finding decision rules (a kind of association rules). We then use extended random sets to present a new algorithm of finding interesting rules. We prove that the new algorithm is faster than Pawlak's algorithm. The extended random sets are easily to include more than one criterion for determining interesting rules. We also provide two measures for dealing with uncertainties in association rules
- …