6,947 research outputs found

    IMPLEMENTASI METODE K-MEANS DAN K-MEDOIDS PADA PENGELOMPOKAN PROVINSI INDONESIA BERDASARKAN ASPEK PENDIDIKAN PEMUDA

    Get PDF
    The quality of education in Indonesia is still a concern, seen from a number of problems that become obstacles to improving the quality of education as well as affecting the quality of Indonesian youth. This study aims to group provinces in Indonesia based on the aspect of youth education using the K-Means and K-Medoids methods. To determine the optimum k, the average silhouette method is used and the SW and SB ratio is used to evaluate the cluster results. The results obtained are 2 clusters optimum. For the K-Means method, cluster 1 consists of 19 provinces and cluster 2 consists of 14 provinces. Whereas in the K-Medoids method, cluster 1 consists of 22 provinces and cluster 2 consists of 11 provinces. The K-Means method is better than the K-Medoids method because it has a ratio value of 0.527941 which is smaller than the K-Medoid ratio value of 0.5612719.Keyword: K-Means; K-Medoid; Education; Average Silhouette; Standard Deviation

    Optimal interval clustering: Application to Bregman clustering and statistical mixture learning

    Full text link
    We present a generic dynamic programming method to compute the optimal clustering of nn scalar elements into kk pairwise disjoint intervals. This case includes 1D Euclidean kk-means, kk-medoids, kk-medians, kk-centers, etc. We extend the method to incorporate cluster size constraints and show how to choose the appropriate kk by model selection. Finally, we illustrate and refine the method on two case studies: Bregman clustering and statistical mixture learning maximizing the complete likelihood.Comment: 10 pages, 3 figure

    Unsupervised clustering approach for network anomaly detection

    No full text
    This paper describes the advantages of using the anomaly detection approach over the misuse detection technique in detecting unknown network intrusions or attacks. It also investigates the performance of various clustering algorithms when applied to anomaly detection. Five different clustering algorithms: k-Means, improved k-Means, k-Medoids, EM clustering and distance-based outlier detection algorithms are used. Our experiment shows that misuse detection techniques, which implemented four different classifiers (naïve Bayes, rule induction, decision tree and nearest neighbour) failed to detect network traffic, which contained a large number of unknown intrusions; where the highest accuracy was only 63.97% and the lowest false positive rate was 17.90%. On the other hand, the anomaly detection module showed promising results where the distance-based outlier detection algorithm outperformed other algorithms with an accuracy of 80.15%. The accuracy for EM clustering was 78.06%, for k-Medoids it was 76.71%, for improved k-Means it was 65.40% and for k-Means it was 57.81%. Unfortunately, our anomaly detection module produces high false positive rate (more than 20%) for all four clustering algorithms. Therefore, our future work will be more focus in reducing the false positive rate and improving the accuracy using more advance machine learning technique

    Perbandingan Hybrid Genetic K-Means++ dan Hybrid Genetic K-Medoid untuk Klasterisasi Dataset EEG Eyestate

    Get PDF
    K-Means++ and K-Medoids are data clustering methods. The data cluster speed is determined by the iteration value, the lower the iteration value, the faster the data clustering is done. Data clustering performance can be optimized to get more optimal clustering results. One algorithm that can optimize cluster speed is Genetic Algorithm (GA). The dataset used in the study is a dataset of EEG Eyestate. The optimization results before hybrid GA on K-Means++ are the iteration average values is 11.6 to 5,15, and in K-Medoid are the iteration average values decreased from 5.9 to 5.2. Based on the comparison of GA K-Means++ and GA K-Medoids iterations, it can be concluded that GA - K-Means++ bette

    Performance analysis in text clustering using k-means and k-medoids algorithms for Malay crime documents

    Get PDF
    Few studies on text clustering for the Malay language have been conducted due to some limitations that need to be addressed. The purpose of this article is to compare the two clustering algorithms of k-means and k-medoids using Euclidean distance similarity to determine which method is the best for clustering documents. Both algorithms are applied to 1000 documents pertaining to housebreaking crimes involving a variety of different modus operandi. Comparability results indicate that the k-means algorithm performed the best at clustering the relevant documents, with a 78% accuracy rate. K-means clustering also achieves the best performance for cluster evaluation when comparing the average within-cluster distance to the k-medoids algorithm. However, k-medoids perform exceptionally well on the Davis Bouldin index (DBI). Furthermore, the accuracy of k-means is dependent on the number of initial clusters, where the appropriate cluster number can be determined using the elbow method

    A Review and Evaluation of Elastic Distance Functions for Time Series Clustering

    Full text link
    Time series clustering is the act of grouping time series data without recourse to a label. Algorithms that cluster time series can be classified into two groups: those that employ a time series specific distance measure; and those that derive features from time series. Both approaches usually rely on traditional clustering algorithms such as kk-means. Our focus is on distance based time series that employ elastic distance measures, i.e. distances that perform some kind of realignment whilst measuring distance. We describe nine commonly used elastic distance measures and compare their performance with k-means and k-medoids clustering. Our findings are surprising. The most popular technique, dynamic time warping (DTW), performs worse than Euclidean distance with k-means, and even when tuned, is no better. Using k-medoids rather than k-means improved the clusterings for all nine distance measures. DTW is not significantly better than Euclidean distance with k-medoids. Generally, distance measures that employ editing in conjunction with warping perform better, and one distance measure, the move-split-merge (MSM) method, is the best performing measure of this study. We also compare to clustering with DTW using barycentre averaging (DBA). We find that DBA does improve DTW k-means, but that the standard DBA is still worse than using MSM. Our conclusion is to recommend MSM with k-medoids as the benchmark algorithm for clustering time series with elastic distance measures. We provide implementations in the aeon toolkit, results and guidance on reproducing results on the associated GitHub repository

    Finding Similar Documents Using Different Clustering Techniques

    Get PDF
    AbstractText clustering is an important application of data mining. It is concerned with grouping similar text documents together. In this paper, several models are built to cluster capstone project documents using three clustering techniques: k-means, k-means fast, and k-medoids. Our datatset is obtained from the library of the College of Computer and Information Sciences, King Saud University, Riyadh. Three similarity measure are tested: cosine similarity, Jaccard similarity, and Correlation Coefficient. The quality of the obtained models is evaluated and compared. The results indicate that the best performance is achieved using k-means and k-medoids combined with cosine similarity. We observe variation in the quality of clustering based on the evaluation measure used. In addition, as the value of k increases, the quality of the resulting cluster improves. Finally, we reveal the categories of graduation projects offered in the Information Technology department for female students

    Non-Exhaustive, Overlapping k-medoids for Document Clustering

    Get PDF
    Manual document categorization is time consuming, expensive, and difficult to manage for large collections. Unsupervised clustering algorithms perform well when documents belong to only one group. However, individual documents may be outliers or span multiple topics. This paper proposes a new clustering algorithm called non-exhaustive overlapping k-medoids inspired by k-medoids and non-exhaustive overlapping k-means. The proposed algorithm partitions a set of objects into k clusters based on pairwise similarity. Each object is assigned to zero, one, or many groups to emulate manual results. The algorithm uses dissimilarity instead of distance measures and applies to text and other abstract data. Neo-k-medoids is tested against manually tagged movie descriptions and Wikipedia comments. Initial results are primarily poor but show promise. Future research is described to improve the proposed algorithm and explore alternate evaluation measures
    corecore