1,472 research outputs found

    A Method Non-Deterministic and Computationally Viable for Detecting Outliers in Large Datasets

    Get PDF
    This paper presents an outlier detection method that is based on a Variable Precision Rough Set Model (VPRSM). This method generalizes the standard set inclusion relation, which is the foundation of the Rough Sets Basic Model (RSBM). The main contribution of this research is an improvement in the quality of detection because this generalization allows us to classify when there is some degree of uncertainty. From the proposed method, a computationally viable algorithm for large volumes of data is also introduced. The experiments performed in a real scenario and a comparison of the results with the RSBM-based method demonstrate the efficiency of both the method and the algorithm in diverse contexts that involve large volumes of data.This work has been supported by grant TIN2016-78103-C2-2-R, and University of Alicante projects GRE14-02 and Smart University

    GBG++: A Fast and Stable Granular Ball Generation Method for Classification

    Full text link
    Granular ball computing (GBC), as an efficient, robust, and scalable learning method, has become a popular research topic of granular computing. GBC includes two stages: granular ball generation (GBG) and multi-granularity learning based on the granular ball (GB). However, the stability and efficiency of existing GBG methods need to be further improved due to their strong dependence on kk-means or kk-division. In addition, GB-based classifiers only unilaterally consider the GB's geometric characteristics to construct classification rules, but the GB's quality is ignored. Therefore, in this paper, based on the attention mechanism, a fast and stable GBG (GBG++) method is proposed first. Specifically, the proposed GBG++ method only needs to calculate the distances from the data-driven center to the undivided samples when splitting each GB instead of randomly selecting the center and calculating the distances between it and all samples. Moreover, an outlier detection method is introduced to identify local outliers. Consequently, the GBG++ method can significantly improve effectiveness, robustness, and efficiency while being absolutely stable. Second, considering the influence of the sample size within the GB on the GB's quality, based on the GBG++ method, an improved GB-based kk-nearest neighbors algorithm (GBkkNN++) is presented, which can reduce misclassification at the class boundary. Finally, the experimental results indicate that the proposed method outperforms several existing GB-based classifiers and classical machine learning classifiers on 2424 public benchmark datasets

    Informational Paradigm, management of uncertainty and theoretical formalisms in the clustering framework: A review

    Get PDF
    Fifty years have gone by since the publication of the first paper on clustering based on fuzzy sets theory. In 1965, L.A. Zadeh had published “Fuzzy Sets” [335]. After only one year, the first effects of this seminal paper began to emerge, with the pioneering paper on clustering by Bellman, Kalaba, Zadeh [33], in which they proposed a prototypal of clustering algorithm based on the fuzzy sets theory

    An Advanced Conceptual Diagnostic Healthcare Framework for Diabetes and Cardiovascular Disorders

    Full text link
    The data mining along with emerging computing techniques have astonishingly influenced the healthcare industry. Researchers have used different Data Mining and Internet of Things (IoT) for enrooting a programmed solution for diabetes and heart patients. However, still, more advanced and united solution is needed that can offer a therapeutic opinion to individual diabetic and cardio patients. Therefore, here, a smart data mining and IoT (SMDIoT) based advanced healthcare system for proficient diabetes and cardiovascular diseases have been proposed. The hybridization of data mining and IoT with other emerging computing techniques is supposed to give an effective and economical solution to diabetes and cardio patients. SMDIoT hybridized the ideas of data mining, Internet of Things, chatbots, contextual entity search (CES), bio-sensors, semantic analysis and granular computing (GC). The bio-sensors of the proposed system assist in getting the current and precise status of the concerned patients so that in case of an emergency, the needful medical assistance can be provided. The novelty lies in the hybrid framework and the adequate support of chatbots, granular computing, context entity search and semantic analysis. The practical implementation of this system is very challenging and costly. However, it appears to be more operative and economical solution for diabetes and cardio patients.Comment: 11 PAGE

    GBMST: An Efficient Minimum Spanning Tree Clustering Based on Granular-Ball Computing

    Full text link
    Most of the existing clustering methods are based on a single granularity of information, such as the distance and density of each data. This most fine-grained based approach is usually inefficient and susceptible to noise. Therefore, we propose a clustering algorithm that combines multi-granularity Granular-Ball and minimum spanning tree (MST). We construct coarsegrained granular-balls, and then use granular-balls and MST to implement the clustering method based on "large-scale priority", which can greatly avoid the influence of outliers and accelerate the construction process of MST. Experimental results on several data sets demonstrate the power of the algorithm. All codes have been released at https://github.com/xjnine/GBMST

    A Modified Distance Dynamics Model for Improvement of Community Detection

    Get PDF
    © 2018 IEEE. Community detection is a key technique for identifying the intrinsic community structures of complex networks. The distance dynamics model has been proven effective in finding communities with arbitrary size and shape and identifying outliers. However, to simulate distance dynamics, the model requires manual parameter specification and is sensitive to the cohesion threshold parameter, which is difficult to determine. Furthermore, it has difficulty handling rough outliers and ignores hubs (nodes that bridge communities). In this paper, we propose a robust distance dynamics model, namely, Attractor++, which uses a dynamic membership degree. In Attractor++, the dynamic membership degree is used to determine the influence of exclusive neighbors on the distance instead of setting the cohesion threshold. In addition, considering its inefficiency and low accuracy in handling outliers and identifying hubs, we design an outlier optimization model that is based on triangle adjacency. By using optimization rules, a postprocessing method further judges whether a singleton node should be merged into the same community as its triangles or regarded as a hub or an outlier. Extensive experiments on both real-world and synthetic networks demonstrate that our algorithm more accurately identifies nodes that have special roles (hubs and outliers) and more effectively identifies community structures

    A Rough Set Approach to Spatio-temporal Outlier Detection

    Get PDF
    Abstract. Detecting outliers which are grossly different from or inconsistent with the remaining spatio-temporal dataset is a major challenge in real-world knowledge discovery and data mining applications. In this paper, we deal with the outlier detection problem in spatio-temporal data and we describe a rough set approach that finds the top outliers in an unlabeled spatio-temporal dataset. The proposed method, called Rough Outlier Set Extraction (ROSE), relies on a rough set theoretic representation of the outlier set using the rough set approximations, i.e. lower and upper approximations. It is also introduced a new set, called Kernel set, a representative subset of the original dataset, significative to outlier detection. Experimental results on real world datasets demonstrate its superiority over results obtained by various clustering algorithms. It is also shown that the kernel set is able to detect the same outliers set but with such less computational time
    corecore