Search CORE

730 research outputs found

Automatic identification of the number of clusters in hierarchical clustering

Author: Gibert Karina
Karna Ashutosh
Publication venue: Springer Nature
Publication date: 01/01/2022
Field of study

Hierarchical clustering is one of the most suitable tools to discover the underlying true structure of a dataset in the case of unsupervised learning where the ground truth is unknown and classical machine learning classifiers are not suitable. In many real applications, it provides a perspective on inner data structure and is preferred to partitional methods. However, determining the resulting number of clusters in hierarchical clustering requires human expertise to deduce this from the dendrogram and this represents a major challenge in making a fully automatic system such as the ones required for decision support in Industry 4.0. This research proposes a general criterion to perform the cut of a dendrogram automatically, by comparing six original criteria based on the Calinski-Harabasz index. The performance of each criterion on 95 real-life dendrograms of different topologies is evaluated against the number of classes proposed by the experts and a winner criterion is determined. This research is framed in a bigger project to build an Intelligent Decision Support system to assess the performance of 3D printers based on sensor data in real-time, although the proposed criteria can be used in other real applications of hierarchical clustering.The methodology is applied to a real-life dataset from the 3D printers and the huge reduction in CPU time is also shown by comparing the CPU time before and after this modification of the entire clustering method. It also reduces the dependability on human-expert to provide the number of clusters by inspecting the dendrogram. Further, such a process allows applying hierarchical clustering in an automatic mode in real-life industrial applications and allows the continuous monitoring of real 3D printers in production, and helps in building an Intelligent Decision Support System to detect operational modes, anomalies, and other behavioral patterns.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Student Academic Mark Clustering Analysis and Usability Scoring on Dashboard Development Using K-Means Algorithm and System Usability Scale

Author: Amalia Nur Laita Rizki
Ramdan Ade
Setiawan Nanang Yudi
Supianto Ahmad Afif
Yuliani Asri Rizki
Zilvan Vicky
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 04/07/2021
Field of study

Learning activities are one of the processes of delivering information or messages from teachers to students. SMPN 4 Sidoarjo is a State Junior High School (JHS) located in Sidoarjo Regency. During the learning process, the collected academic score data were still not well organized by teachers and school principals in monitoring student learning performance. The score data is from Bahasa Indonesia subject from a teacher with 222 data included at 2019/2020 school year. The method used in student clustering is K-Means. The number of clusters are determined using the elbow method and displayed in graphic form. Clustering result can be used as a reference for teachers in determining study groups and determining the best treatment for each cluster. The best clustering results are proven by validation score using Davies-Bouldin Index, Silhouette Width, and Calinski-Harabasz Index. Three clusters were obtained for each class level of data, while the cluster ranges from two to five for the data for each study group. The dashboard is used in order to visualize the clustering result. Usability testing using System Usability Scale (SUS) has a score value of 87.5, which means that the dashboard can be accepted by SMPN 4 Sidoarjo

Jurnal Ilmu Komputer dan Informasi

Indeks Calinski – Harabasz Analisis Fuzzy C – Means dan K – Means Cluster Kabupaten/Kota di Provinsi Jambi Menurut Potensi Pertambangan, Penggalian, Pengadaan Listrik, dan Gas

Author: Elisa Edi
Fadli Amril
Mardhotillah Bunga
Zurweni Zurweni
Publication venue: Universitas Jambi
Publication date: 12/06/2023
Field of study

Penelitian ini bertujuan untuk membandingkan Analisis Fuzzy C – Means dan K – Means Cluster dengan menghitung Indeks Calinski – Harabasz, di mana semakin tinggi Indeks Calinski – Harabasz suatu analisis cluster, semakin baik cluster yang terbentuk. Analisis Data menggunakan software JASP, data yang digunakan adalah data potensi pertambangan, penggalian, pengadaan listrik, dan gas berupa data kontibusi sektor – sektor tersebut dalam PDRB Kabupaten/Kota di Provinsi Jambi. Hasil penelitian menunjukkan dengan Analisis Cluster Fuzzy C – Means, terbentuk dua clusters, sedangkan dengan Analisis K – Means terbentuk tiga clusters. Indeks Calinski – Harabasz K – Means lebih tinggi dibandingkan dengan Fuzzy C – Means. Hasil penelitian ini menyimpulkan bahwa, berdasarkan perbandingan Indeks Calinski – Harabasz, Analisis Cluster K – Means lebih baik dibandingkan dengan Fuzzy C – Means Cluster

Jurnal Online Universitas Jambi

School motivation profiles of Dutch 9th graders

Author: Blom Denise M.
Faber Meike
Warrens Matthijs J.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2021
Field of study

The aim of this study was to identify school motivation profiles of Dutch 9th grade students in a four-dimensional motivation space, including mastery, performance, social and extrinsic motivation. Multiple clustering methods (K-means, K-medoids, restricted latent profile analysis) and multiple indices for selecting the optimal number of clusters were applied. The statistical selection methods did not completely concur on the optimal number of clusters, but a clear common denominator was provided by the Calinski-Harabasz index and the minimum and mean Silhouette values. All three indices indicated two clusters as the optimal number, regardless of the clustering method used: one cluster of 9th graders with high average scores on all dimensions and one cluster with low mean scores on all dimensions. In addition, we explored the substantive interpretation of multiple cluster solutions. It was discovered that most students are in clusters that can be classified into one of three profile types that may differ in level: (1) approximately equal mean scores on all dimensions, (2) relative high mean scores on mastery and social motivation, and (3) a relatively low mean score on performance motivation. The latter profile type is believed to be a new discovery

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

An application of a hybrid intelligent system for diagnosing primary headaches

Author: Calvo-Rolle José Luis
Sekulić Slobodan R.
Simić Dragan
Simić Svetislav D.
Simić Svetlana
Villar José R.
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

[Abstract] (1) Background: Modern medicine generates a great deal of information that stored in medical databases. Simultaneously, extracting useful knowledge and making scientific decisions for diagnosis and treatment of diseases becomes increasingly necessary. Headache disorders are the most prevalent of all the neurological conditions. Headaches have not only medical but also great socioeconomic significance. The aim of this research is to develop an intelligent system for diagnosing primary headache disorders. (2) Methods: This research applied various mathematical, statistical and artificial intelligence techniques, among which the most important are: Calinski-Harabasz index, Analytical Hierarchy Process, and Weighted Fuzzy C-means Clustering Algorithm. These methods, techniques and methodologies are used to create a hybrid intelligent system for diagnosing primary headache disorders. The proposed intelligent diagnostic system is tested with original real-world data set with different metrics. (3) Results: First at all, nine of 20 attributes – features from International Headache Society (IHS) criteria are selected, and then only five most important attributes from IHS criteria are selected. The calculation result based on the Calinski–Harabasz index value (178) for the optimal number of clusters is three, and they present three classes of headaches: (i) migraine, (ii) tension-type headaches (TTHs), and (iii) other primary headaches (OPHs). The proposed hybrid intelligent system shows the following quality metrics: Accuracy 75%; Precision 67% for migraine, 74% for TTHs, 86% for OPHs, and Average Precision 77%; Recall 86% for migraine, 73% for TTHs, 67% for OPHs, Average Recall 75%; F1 score 75% for migraine, 74% for TTHs, 75% for OPHs, and Average F1 score 75%. (4) Conclusions: The hybrid intelligent system presents qualitative and respectable experimental results. The implementation of existing diagnostics systems and the development of new diagnostics systems in medicine is necessary in order to help physicians make quality diagnosis and decide the best treatments for the patients.Ministerio de Ciencia e Innovación; MINECO-TIN2017-84804-RGobierno del Principado de Asturias; FCGRUPIN-IDI/2018/000226Serbia. Ministry of Education, Science and Technological Development; 451-03-68/2020-14/20015

Multidisciplinary Digital Publishing Institute

Repositorio da Universidade da Coruña

Towards expert-inspired automatic criterion to cut a dendrogram for real-industrial applications

Author: Gibert Karina
Karna Ashutosh
Suman Shikha
Publication venue: 'IOS Press'
Publication date: 01/01/2021
Field of study

Hierarchical clustering is one of the most preferred choices to understand the underlying structure of a dataset and defining typologies, with multiple applications in real life. Among the existing clustering algorithms, the hierarchical family is one of the most popular, as it permits to understand the inner structure of the dataset and find the number of clusters as an output, unlike popular methods, like k-means. One can adjust the granularity of final clustering to the goals of the analysis themselves. The number of clusters in a hierarchical method relies on the analysis of the resulting dendrogram itself. Experts have criteria to visually inspect the dendrogram and determine the number of clusters. Finding automatic criteria to imitate experts in this task is still an open problem. But, dependence on the expert to cut the tree represents a limitation in real applications like the fields industry 4.0 and additive manufacturing. This paper analyses several cluster validity indexes in the context of determining the suitable number of clusters in hierarchical clustering. A new Cluster Validity Index (CVI) is proposed such that it properly catches the implicit criteria used by experts when analyzing dendrograms. The proposal has been applied on a range of datasets and validated against experts ground-truth overcoming the results obtained by the State of the Art and also significantly reduces the computational cost .Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

A new approach for evaluating internal cluster validation indices

Author: Botta-Dukát Zoltán
Publication venue
Publication date: 02/08/2023
Field of study

A vast number of different methods are available for unsupervised classification. Since no algorithm and parameter setting performs best in all types of data, there is a need for cluster validation to select the actually best-performing algorithm. Several indices were proposed for this purpose without using any additional (external) information. These internal validation indices can be evaluated by applying them to classifications of datasets with a known cluster structure. Evaluation approaches differ in how they use the information on the ground-truth classification. This paper reviews these approaches, considering their advantages and disadvantages, and then suggests a new approach

arXiv.org e-Print Archive

Service quality dealer identification: the optimization of K-Means clustering

Author: Che Hussin Ab Razak
Enza Wella Yolanda
Insani Fitri
Okfalisa Okfalisa
Saeed Faisal
Publication venue: Universitas Mercu Buana
Publication date: 12/09/2023
Field of study

Service quality and customer satisfaction directly influence company branding, reputation and customer loyalty. As a liaison between producers and consumers, dealers must preserve valuable consumer relationships to increase customer satisfaction and adherence. Lack of comprehensive measurement and standardization regarding service quality emerges as a consideration issue towards the company service excellence. Therefore, identifying the service quality performance and grouping develops into valuable contributions in decision-making to control and enhance the company's intention. This study applies the K-Means Algorithm by optimizing the number of clusters in identifying dealer service quality performance. Hence, the ultimate service quality formation will be performed. The analysis found three dealer identification categories, including Cluster One, with 125 dealers grouped as good performance; Cluster Two, with 30 dealers grouped as very good performance; and Cluster Three, with 38 dealers grouped as not good performance. In order to evaluate the efficacy of optimum k value, the lists of testing approaches are conducted and compared, whereby Calinski-Harabasz, Elbow, Silhouette Score, and Davies-Bouldin Index (DBI) contribute in k=3. As a result, the optimum clusters are determined through the highest performance of k values as three. These three clusters have successfully identified the service quality level of dealers effectively and administered the company guidelines for corrective actions and improvements in customer service quality instead of the standardized normal distribution grouping calculation.

SINERGI

Birmingham City University Open Access Repository

BCU Open Access