4 research outputs found

    Clustering large-scale data based on modified affinity propagation algorithm

    Get PDF
    Traditional clustering algorithms are no longer suitable for use in data mining applications that make use of large-scale data. There have been many large-scale data clustering algorithms proposed in recent years, but most of them do not achieve clustering with high quality. Despite that Affinity Propagation (AP) is effective and accurate in normal data clustering, but it is not effective for large-scale data. This paper proposes two methods for large-scale data clustering that depend on a modified version of AP algorithm. The proposed methods are set to ensure both low time complexity and good accuracy of the clustering method. Firstly, a data set is divided into several subsets using one of two methods random fragmentation or K-means. Secondly, subsets are clustered into K clusters using K-Affinity Propagation (KAP) algorithm to select local cluster exemplars in each subset. Thirdly, the inverse weighted clustering

    نظام كشف التسلل باستخدام خورازمية تقارب الانتشار المعدلة وخورازميات التصنيف

    No full text
    Network security is one of the most serious problems in the world because of the continuing increase in malicious activities and networks attacks. The increasing use of web services in many systems such as e-government services, banking services, E-mail and e-commerce expose these services to several types of malicious attacks. Intrusion Detection Systems (IDS) are widely used to protect information systems and reduce the damage caused by these attacks. Some of the malicious activities are still hidden, and there is an urgent need to continue in developing new effective and adaptive approach to countermeasure such activities. Many studies try to find the best model for IDS to achieve the best detection rate and lowest false alarm rate. Various artificial intelligence and data mining algorithms have been used in this field such as Clustering algorithms, Neural Networks, Naïve Bayes, Decision Tree, etc. IDSs are divided into two main types: misuse detection and anomaly detection. The former is used to detect known attacks by extracting features from network traffic, matching them to a list of signatures, while the latter identifies any anomalous behavior by computing deviation from normal behavior. This study proposes a new clustering algorithm called IWC-KAP for large-scale data sets. IWC-KAP can directly generate K clusters, as specified by the user. It retains the advantages of K-Affinity Propagation and Inverse weighted clustering algorithm. Experiments on IWC-KAP show that it can generate K clusters directly without any parameter tuning, and can cluster large-scale data more efficiently than other related algorithms. Moreover, given a specified cluster number, results show that the proposed clustering method can significantly reduce the clustering time and produce better clustering result in a way that is more effective and accurate than AP, KAP, and HAP algorithms. Furthermore, the study used the IWC-KAP to propose two hybrid anomaly detection models to improve the performance of intrusion detection system in term of detection, accuracy, and false alarm rate. The first model combines IWC-KAP Clustering algorithm and Naïve Bayes algorithm. IWC-KAP uses to cluster all the data into clusters based on their behavior, such as malicious and non-malicious activities. In the second phase, Naïve Bayes classifier uses to classify clustered data into correct categories. The second model combines IWC-KAP algorithm and Decision Tree algorithm instead of Naïve Bayes classifier. KDD Cup '99 dataset is used for training and evaluating the performance of the proposed models

    Traditional clustering algorithms are no longer suitable for use in data mining applications that make use of large-scale data. There have been many large-scale data clustering …

    No full text
    The purpose of this study was to generate more concise rule extraction from the Recursive-Rule Extraction (Re-RX) algorithm by replacing the C4. 5 program currently employed in Re-RX with the J48graft algorithm. Experiments were subsequently conducted to determine rules for six different two-class mixed datasets having discrete and continuous attributes and to compare the resulting accuracy, comprehensibility..

    Overcoming the problem of different density-regions using the Inter-Connectivity and the Closeness.

    No full text
    The density based algorithms considered as one of the most common and powerful algorithms in data clustering, this paper presents new way to solve the problem of detecting the clusters of varying density which most of the density based algorithms can't deal with it correctly. Our approach depending on the merging of the Inter-Connectivity and the Closeness techniques, which applied on the resulting of the subclusters by using the density based clustering technique to conflation it in a new clusters, the proposed algorithm help to decide if the different density regions belonged to the same cluster or not. The experimental results show that the proposed clustering algorithm gives satisfied results
    corecore