424 research outputs found

    Enabling decision trend analysis with interactive scatter plot matrices visualization

    Full text link
    © 2015 Elsevier Ltd. This paper presents a new interactive scatter plot visualization for multi-dimensional data analysis. We apply Rough Set Theory (RST) to reduce the visual complexity through dimensionality reduction. We use an innovative point-to-region mouse click concept to enable direct interactions with scatter points that are theoretically impossible. To show the decision trend we use a virtual Z dimension to display a set of linear flows showing approximation of the decision trend. We conducted case studies to demonstrate the effectiveness and usefulness of our new technique for analyzing the property of three popular data sets including wine quality, wages and cars. The paper also includes a pilot usability study to evaluate parallel coordinate visualization with scatter plot matrices visualization with RST results

    A Rough Set Approach to Dimensionality Reduction for Performance Enhancement in Machine Learning

    Get PDF
    Machine learning uses complex mathematical algorithms to turn data set into a model for a problem domain. Analysing high dimensional data in their raw form usually causes computational overhead because the higher the size of the data, the higher the time it takes to process it. Therefore, there is a need for a more robust dimensionality reduction approach, among other existing methods, for feature projection (extraction) and selection from data set, which can be passed to a machine learning algorithm for optimal performance. This paper presents a generic mathematical approach for transforming data from a high dimensional space to low dimensional space in such a manner that the intrinsic dimension of the original data is preserved using the concept of indiscernibility, reducts, and the core of the rough set theory. The flue detection dataset available on the Kaggle website was used in this research for demonstration purposes. The original and reduced datasets were tested using a logistic regression machine learning algorithm yielding the same accuracy of 97% with a training time of 25 min and 11 min respectively

    A Distance-Based Method for Attribute Reduction in Incomplete Decision Systems

    Get PDF
    There are limitations in recent research undertaken on attribute reduction in incomplete decision systems. In this paper, we propose a distance-based method for attribute reduction in an incomplete decision system. In addition, we prove theoretically that our method is more effective than some other methods

    A Detailed Study of the Distributed Rough Set Based Locality Sensitive Hashing Feature Selection Technique

    Get PDF
    International audienceIn the context of big data, granular computing has recently been implemented by some mathematical tools, especially Rough Set Theory (RST). As a key topic of rough set theory, feature selection has been investigated to adapt the related granular concepts of RST to deal with large amounts of data, leading to the development of the distributed RST version. However, despite of its scalability, the distributed RST version faces a key challenge tied to the partitioning of the feature search space in the distributed environment while guaranteeing data dependency. Therefore, in this manuscript, we propose a new distributed RST version based on Locality Sensitive Hashing (LSH), named LSH-dRST, for big data feature selection. LSH-dRST uses LSH to match similar features into the same bucket and maps the generated buckets into partitions to enable the splitting of the universe in a more efficient way. More precisely, in this paper, we perform a detailed analysis of the performance of LSH-dRST by comparing it to the standard distributed RST version, which is based on a random partitioning of the universe. We demonstrate that our LSH-dRST is scalable when dealing with large amounts of data. We also demonstrate * This work is part of a project that has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 702527. 2 Z. Chelly Dagdia, C. Zarges / LSH-RST for an Efficient Big Data Pre-processing that LSH-dRST ensures the partitioning of the high dimensional feature search space in a more reliable way; hence better preserving data dependency in the distributed environment and ensuring a lower computational cost
    • …
    corecore