78,090 research outputs found

    Empirical analysis of rough set categorical clustering techniques based on rough purity and value set

    Get PDF
    Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, attention has been put on categorical data clustering, where data objects are made up of non-numerical attributes. The implementation of several existing categorical clustering techniques is challenging as some are unable to handle uncertainty and others have stability issues. In the process of dealing with categorical data and handling uncertainty, the rough set theory has become well-established mechanism in a wide variety of applications including databases. The recent techniques such as Information-Theoretic Dependency Roughness (ITDR), Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR), Min-Min Roughness (MMR), and standard-deviation roughness (SDR). This work explores the limitations and issues of ITDR, MDA and MSA techniques on data sets where these techniques fails to select or faces difficulty in selecting their best clustering attribute. Accordingly, two alternative techniques named Rough Purity Approach (RPA) and Maximum Value Attribute (MVA) are proposed. The novelty of both proposed approaches is that, the RPA presents a new uncertainty definition based on purity of rough relational data base whereas, the MVA unlike other rough set theory techniques uses the domain knowledge such as value set combined with number of clusters (NoC). To show the significance, mathematical and theoretical basis for proposed approaches, several propositions are illustrated. Moreover, the recent rough categorical techniques like MDA, MSA, ITDR and classical clustering technique like simple K-mean are used for comparison and the results are presented in tabular and graphical forms. For experiments, data sets from previously utilized research cases, a real supply base management (SBM) data set and UCI repository are utilized. The results reveal significant improvement by proposed techniques for categorical clustering in terms of purity (21%), entropy (9%), accuracy (16%), rough accuracy (11%), iterations (99%) and time (93%). vi

    Rough set approach for categorical data clustering

    Get PDF
    A few techniques of rough categorical data clustering exist to group objects having similar characteristics. However, the performance of the techniques is an issue due to low accuracy, high computational complexity and clusters purity. This work proposes a new technique called Maximum Dependency Attributes (MDA) to improve the previous techniques due to these issues. The proposed technique is based on rough set theory by taking into account the dependency of attributes of an information system. The main contribution of this technique is to introduce a new technique to classify objects from categorical datasets which has better performance as compared to the baseline techniques. The algorithm of the proposed technique is implemented in MATLAB® version 7.6.0.324 (R2008a). They are executed sequentially on a processor Intel Core 2 Duo CPUs. The total main memory is 1 Gigabyte and the operating system is Windows XP Professional SP3. Results collected during the experiments on four small datasets and thirteen UCI benchmark datasets for selecting a clustering attribute show that the proposed MDA technique is an efficient approach in terms of accuracy and computational complexity as compared to BC, TR and MMR techniques. For the clusters purity, the results on Soybean and Zoo datasets show that MDA technique provided better purity up to 17% and 9%, respectively. The experimental result on supplier chain management clustering also demonstrates how MDA technique can contribute to practical system and establish the better performance for computation complexity and clusters purity up to 90% and 23%, respectively

    Analysing imperfect temporal information in GIS using the Triangular Model

    Get PDF
    Rough set and fuzzy set are two frequently used approaches for modelling and reasoning about imperfect time intervals. In this paper, we focus on imperfect time intervals that can be modelled by rough sets and use an innovative graphic model [i.e. the triangular model (TM)] to represent this kind of imperfect time intervals. This work shows that TM is potentially advantageous in visualizing and querying imperfect time intervals, and its analytical power can be better exploited when it is implemented in a computer application with graphical user interfaces and interactive functions. Moreover, a probabilistic framework is proposed to handle the uncertainty issues in temporal queries. We use a case study to illustrate how the unique insights gained by TM can assist a geographical information system for exploratory spatio-temporal analysis
    • …
    corecore