Search CORE

6,947 research outputs found

IMPLEMENTASI METODE K-MEANS DAN K-MEDOIDS PADA PENGELOMPOKAN PROVINSI INDONESIA BERDASARKAN ASPEK PENDIDIKAN PEMUDA

Author: Cusanti Cusanti
Hasanah Insani
Panggol Sri Arista
Permatasari Retno
Ramdhanti Tiara
Tusyakdiah Halima
Widodo Edy
Publication venue: Universitas Muhammadiyah Tangerang
Publication date: 18/12/2023
Field of study

The quality of education in Indonesia is still a concern, seen from a number of problems that become obstacles to improving the quality of education as well as affecting the quality of Indonesian youth. This study aims to group provinces in Indonesia based on the aspect of youth education using the K-Means and K-Medoids methods. To determine the optimum k, the average silhouette method is used and the SW and SB ratio is used to evaluate the cluster results. The results obtained are 2 clusters optimum. For the K-Means method, cluster 1 consists of 19 provinces and cluster 2 consists of 14 provinces. Whereas in the K-Medoids method, cluster 1 consists of 22 provinces and cluster 2 consists of 11 provinces. The K-Means method is better than the K-Medoids method because it has a ratio value of 0.527941 which is smaller than the K-Medoid ratio value of 0.5612719.Keyword: K-Means; K-Medoid; Education; Average Silhouette; Standard Deviation

UMT Journal Management System

Optimal interval clustering: Application to Bregman clustering and statistical mixture learning

Author: Nielsen Frank
Nock Richard
Publication venue
Publication date: 01/01/2014
Field of study

We present a generic dynamic programming method to compute the optimal clustering of

n

scalar elements into

k

pairwise disjoint intervals. This case includes 1D Euclidean

k

-means,

k

-medoids,

k

-medians,

k

-centers, etc. We extend the method to incorporate cluster size constraints and show how to choose the appropriate

k

by model selection. Finally, we illustrate and refine the method on two case studies: Bregman clustering and statistical mixture learning maximizing the complete likelihood.Comment: 10 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

Unsupervised clustering approach for network anomaly detection

Author: Prugel-Bennett Adam
Syarif Iwan
Wills Gary B.
Publication venue
Publication date: 24/04/2012
Field of study

This paper describes the advantages of using the anomaly detection approach over the misuse detection technique in detecting unknown network intrusions or attacks. It also investigates the performance of various clustering algorithms when applied to anomaly detection. Five different clustering algorithms: k-Means, improved k-Means, k-Medoids, EM clustering and distance-based outlier detection algorithms are used. Our experiment shows that misuse detection techniques, which implemented four different classifiers (naïve Bayes, rule induction, decision tree and nearest neighbour) failed to detect network traffic, which contained a large number of unknown intrusions; where the highest accuracy was only 63.97% and the lowest false positive rate was 17.90%. On the other hand, the anomaly detection module showed promising results where the distance-based outlier detection algorithm outperformed other algorithms with an accuracy of 80.15%. The accuracy for EM clustering was 78.06%, for k-Medoids it was 76.71%, for improved k-Means it was 65.40% and for k-Means it was 57.81%. Unfortunately, our anomaly detection module produces high false positive rate (more than 20%) for all four clustering algorithms. Therefore, our future work will be more focus in reducing the false positive rate and improving the accuracy using more advance machine learning technique

Southampton (e-Prints Soton)

Perbandingan Hybrid Genetic K-Means++ dan Hybrid Genetic K-Medoid untuk Klasterisasi Dataset EEG Eyestate

Author: Al Rivan Muhammad Ezar
Gandi Giovani Prakasa
Lukman Fendy Novianto
Publication venue: 'Sekolah Tinggi Teknik-PLN'
Publication date: 02/10/2020
Field of study

K-Means++ and K-Medoids are data clustering methods. The data cluster speed is determined by the iteration value, the lower the iteration value, the faster the data clustering is done. Data clustering performance can be optimized to get more optimal clustering results. One algorithm that can optimize cluster speed is Genetic Algorithm (GA). The dataset used in the study is a dataset of EEG Eyestate. The optimization results before hybrid GA on K-Means++ are the iteration average values is 11.6 to 5,15, and in K-Medoid are the iteration average values decreased from 5.9 to 5.2. Based on the comparison of GA K-Means++ and GA K-Medoids iterations, it can be concluded that GA - K-Means++ bette

Open Journal System Sekolah Tinggi Teknik-PLN

Performance analysis in text clustering using k-means and k-medoids algorithms for Malay crime documents

Author: Mohamad Noor Noor Maizura
Mohd Muhait Nazratul Naziah
Mohemad Rosmayati
Othman Zulaiha Ali
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2022
Field of study

Few studies on text clustering for the Malay language have been conducted due to some limitations that need to be addressed. The purpose of this article is to compare the two clustering algorithms of k-means and k-medoids using Euclidean distance similarity to determine which method is the best for clustering documents. Both algorithms are applied to 1000 documents pertaining to housebreaking crimes involving a variety of different modus operandi. Comparability results indicate that the k-means algorithm performed the best at clustering the relevant documents, with a 78% accuracy rate. K-means clustering also achieves the best performance for cluster evaluation when comparing the average within-cluster distance to the k-medoids algorithm. However, k-medoids perform exceptionally well on the Davis Bouldin index (DBI). Furthermore, the accuracy of k-means is dependent on the number of initial clusters, where the appropriate cluster number can be determined using the elbow method

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

A Review and Evaluation of Elastic Distance Functions for Time Series Clustering

Author: Bagnall Anthony
Holder Chris
Middlehurst Matthew
Publication venue
Publication date: 26/04/2023
Field of study

Time series clustering is the act of grouping time series data without recourse to a label. Algorithms that cluster time series can be classified into two groups: those that employ a time series specific distance measure; and those that derive features from time series. Both approaches usually rely on traditional clustering algorithms such as

k

-means. Our focus is on distance based time series that employ elastic distance measures, i.e. distances that perform some kind of realignment whilst measuring distance. We describe nine commonly used elastic distance measures and compare their performance with k-means and k-medoids clustering. Our findings are surprising. The most popular technique, dynamic time warping (DTW), performs worse than Euclidean distance with k-means, and even when tuned, is no better. Using k-medoids rather than k-means improved the clusterings for all nine distance measures. DTW is not significantly better than Euclidean distance with k-medoids. Generally, distance measures that employ editing in conjunction with warping perform better, and one distance measure, the move-split-merge (MSM) method, is the best performing measure of this study. We also compare to clustering with DTW using barycentre averaging (DBA). We find that DBA does improve DTW k-means, but that the standard DBA is still worse than using MSM. Our conclusion is to recommend MSM with k-medoids as the benchmark algorithm for clustering time series with elastic distance measures. We provide implementations in the aeon toolkit, results and guidance on reproducing results on the associated GitHub repository

arXiv.org e-Print Archive

Finding Similar Documents Using Different Clustering Techniques

Author: Al-Anazi Sumayia
Al-Turaiki Isra
AlMahmoud Hind
Publication venue: The Author(s). Published by Elsevier B.V.
Publication date: 31/12/2016
Field of study

AbstractText clustering is an important application of data mining. It is concerned with grouping similar text documents together. In this paper, several models are built to cluster capstone project documents using three clustering techniques: k-means, k-means fast, and k-medoids. Our datatset is obtained from the library of the College of Computer and Information Sciences, King Saud University, Riyadh. Three similarity measure are tested: cosine similarity, Jaccard similarity, and Correlation Coefficient. The quality of the obtained models is evaluated and compared. The results indicate that the best performance is achieved using k-means and k-medoids combined with cosine similarity. We observe variation in the quality of clustering based on the evaluation measure used. In addition, as the value of k increases, the quality of the resulting cluster improves. Finally, we reveal the categories of graduation projects offered in the Information Technology department for female students

Elsevier - Publisher Connector

Non-Exhaustive, Overlapping k-medoids for Document Clustering

Author: Kerstens Eric
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2020
Field of study

Manual document categorization is time consuming, expensive, and difficult to manage for large collections. Unsupervised clustering algorithms perform well when documents belong to only one group. However, individual documents may be outliers or span multiple topics. This paper proposes a new clustering algorithm called non-exhaustive overlapping k-medoids inspired by k-medoids and non-exhaustive overlapping k-means. The proposed algorithm partitions a set of objects into k clusters based on pairwise similarity. Each object is assigned to zero, one, or many groups to emulate manual results. The algorithm uses dissimilarity instead of distance measures and applies to text and other abstract data. Neo-k-medoids is tested against manually tagged movie descriptions and Wikipedia comments. Initial results are primarily poor but show promise. Future research is described to improve the proposed algorithm and explore alternate evaluation measures

Crossref

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)