730 research outputs found

    Automatic identification of the number of clusters in hierarchical clustering

    Get PDF
    Hierarchical clustering is one of the most suitable tools to discover the underlying true structure of a dataset in the case of unsupervised learning where the ground truth is unknown and classical machine learning classifiers are not suitable. In many real applications, it provides a perspective on inner data structure and is preferred to partitional methods. However, determining the resulting number of clusters in hierarchical clustering requires human expertise to deduce this from the dendrogram and this represents a major challenge in making a fully automatic system such as the ones required for decision support in Industry 4.0. This research proposes a general criterion to perform the cut of a dendrogram automatically, by comparing six original criteria based on the Calinski-Harabasz index. The performance of each criterion on 95 real-life dendrograms of different topologies is evaluated against the number of classes proposed by the experts and a winner criterion is determined. This research is framed in a bigger project to build an Intelligent Decision Support system to assess the performance of 3D printers based on sensor data in real-time, although the proposed criteria can be used in other real applications of hierarchical clustering.The methodology is applied to a real-life dataset from the 3D printers and the huge reduction in CPU time is also shown by comparing the CPU time before and after this modification of the entire clustering method. It also reduces the dependability on human-expert to provide the number of clusters by inspecting the dendrogram. Further, such a process allows applying hierarchical clustering in an automatic mode in real-life industrial applications and allows the continuous monitoring of real 3D printers in production, and helps in building an Intelligent Decision Support System to detect operational modes, anomalies, and other behavioral patterns.Peer ReviewedPostprint (author's final draft

    Student Academic Mark Clustering Analysis and Usability Scoring on Dashboard Development Using K-Means Algorithm and System Usability Scale

    Get PDF
    Learning activities are one of the processes of delivering information or messages from teachers to students. SMPN 4 Sidoarjo is a State Junior High School (JHS) located in Sidoarjo Regency. During the learning process, the collected academic score data were still not well organized by teachers and school principals in monitoring student learning performance. The score data is from Bahasa Indonesia subject from a teacher with 222 data included at 2019/2020 school year. The method used in student clustering is K-Means. The number of clusters are determined using the elbow method and displayed in graphic form. Clustering result can be used as a reference for teachers in determining study groups and determining the best treatment for each cluster. The best clustering results are proven by validation score using Davies-Bouldin Index, Silhouette Width, and Calinski-Harabasz Index. Three clusters were obtained for each class level of data, while the cluster ranges from two to five for the data for each study group. The dashboard is used in order to visualize the clustering result. Usability testing using System Usability Scale (SUS) has a score value of 87.5, which means that the dashboard can be accepted by SMPN 4 Sidoarjo

    Indeks Calinski – Harabasz Analisis Fuzzy C – Means dan K – Means Cluster Kabupaten/Kota di Provinsi Jambi Menurut Potensi Pertambangan, Penggalian, Pengadaan Listrik, dan Gas

    Get PDF
    Penelitian ini bertujuan untuk membandingkan Analisis Fuzzy C – Means dan K – Means Cluster dengan menghitung Indeks Calinski – Harabasz, di mana semakin tinggi Indeks Calinski – Harabasz suatu analisis cluster, semakin baik cluster yang terbentuk. Analisis Data menggunakan software JASP, data yang digunakan adalah data potensi pertambangan, penggalian, pengadaan listrik, dan gas berupa data kontibusi sektor – sektor tersebut dalam PDRB Kabupaten/Kota di Provinsi Jambi. Hasil penelitian menunjukkan dengan Analisis Cluster Fuzzy C – Means, terbentuk dua clusters, sedangkan dengan Analisis K – Means terbentuk tiga clusters. Indeks Calinski – Harabasz K – Means lebih tinggi dibandingkan dengan Fuzzy C – Means. Hasil penelitian ini menyimpulkan bahwa, berdasarkan perbandingan Indeks Calinski – Harabasz, Analisis Cluster K – Means lebih baik dibandingkan dengan Fuzzy C – Means Cluster

    School motivation profiles of Dutch 9th graders

    Get PDF
    The aim of this study was to identify school motivation profiles of Dutch 9th grade students in a four-dimensional motivation space, including mastery, performance, social and extrinsic motivation. Multiple clustering methods (K-means, K-medoids, restricted latent profile analysis) and multiple indices for selecting the optimal number of clusters were applied. The statistical selection methods did not completely concur on the optimal number of clusters, but a clear common denominator was provided by the Calinski-Harabasz index and the minimum and mean Silhouette values. All three indices indicated two clusters as the optimal number, regardless of the clustering method used: one cluster of 9th graders with high average scores on all dimensions and one cluster with low mean scores on all dimensions. In addition, we explored the substantive interpretation of multiple cluster solutions. It was discovered that most students are in clusters that can be classified into one of three profile types that may differ in level: (1) approximately equal mean scores on all dimensions, (2) relative high mean scores on mastery and social motivation, and (3) a relatively low mean score on performance motivation. The latter profile type is believed to be a new discovery

    An application of a hybrid intelligent system for diagnosing primary headaches

    Get PDF
    [Abstract] (1) Background: Modern medicine generates a great deal of information that stored in medical databases. Simultaneously, extracting useful knowledge and making scientific decisions for diagnosis and treatment of diseases becomes increasingly necessary. Headache disorders are the most prevalent of all the neurological conditions. Headaches have not only medical but also great socioeconomic significance. The aim of this research is to develop an intelligent system for diagnosing primary headache disorders. (2) Methods: This research applied various mathematical, statistical and artificial intelligence techniques, among which the most important are: Calinski-Harabasz index, Analytical Hierarchy Process, and Weighted Fuzzy C-means Clustering Algorithm. These methods, techniques and methodologies are used to create a hybrid intelligent system for diagnosing primary headache disorders. The proposed intelligent diagnostic system is tested with original real-world data set with different metrics. (3) Results: First at all, nine of 20 attributes – features from International Headache Society (IHS) criteria are selected, and then only five most important attributes from IHS criteria are selected. The calculation result based on the Calinski–Harabasz index value (178) for the optimal number of clusters is three, and they present three classes of headaches: (i) migraine, (ii) tension-type headaches (TTHs), and (iii) other primary headaches (OPHs). The proposed hybrid intelligent system shows the following quality metrics: Accuracy 75%; Precision 67% for migraine, 74% for TTHs, 86% for OPHs, and Average Precision 77%; Recall 86% for migraine, 73% for TTHs, 67% for OPHs, Average Recall 75%; F1 score 75% for migraine, 74% for TTHs, 75% for OPHs, and Average F1 score 75%. (4) Conclusions: The hybrid intelligent system presents qualitative and respectable experimental results. The implementation of existing diagnostics systems and the development of new diagnostics systems in medicine is necessary in order to help physicians make quality diagnosis and decide the best treatments for the patients.Ministerio de Ciencia e Innovación; MINECO-TIN2017-84804-RGobierno del Principado de Asturias; FCGRUPIN-IDI/2018/000226Serbia. Ministry of Education, Science and Technological Development; 451-03-68/2020-14/20015

    Towards expert-inspired automatic criterion to cut a dendrogram for real-industrial applications

    Get PDF
    Hierarchical clustering is one of the most preferred choices to understand the underlying structure of a dataset and defining typologies, with multiple applications in real life. Among the existing clustering algorithms, the hierarchical family is one of the most popular, as it permits to understand the inner structure of the dataset and find the number of clusters as an output, unlike popular methods, like k-means. One can adjust the granularity of final clustering to the goals of the analysis themselves. The number of clusters in a hierarchical method relies on the analysis of the resulting dendrogram itself. Experts have criteria to visually inspect the dendrogram and determine the number of clusters. Finding automatic criteria to imitate experts in this task is still an open problem. But, dependence on the expert to cut the tree represents a limitation in real applications like the fields industry 4.0 and additive manufacturing. This paper analyses several cluster validity indexes in the context of determining the suitable number of clusters in hierarchical clustering. A new Cluster Validity Index (CVI) is proposed such that it properly catches the implicit criteria used by experts when analyzing dendrograms. The proposal has been applied on a range of datasets and validated against experts ground-truth overcoming the results obtained by the State of the Art and also significantly reduces the computational cost .Peer ReviewedPostprint (published version

    A new approach for evaluating internal cluster validation indices

    Full text link
    A vast number of different methods are available for unsupervised classification. Since no algorithm and parameter setting performs best in all types of data, there is a need for cluster validation to select the actually best-performing algorithm. Several indices were proposed for this purpose without using any additional (external) information. These internal validation indices can be evaluated by applying them to classifications of datasets with a known cluster structure. Evaluation approaches differ in how they use the information on the ground-truth classification. This paper reviews these approaches, considering their advantages and disadvantages, and then suggests a new approach

    Service quality dealer identification: the optimization of K-Means clustering

    Get PDF
    Service quality and customer satisfaction directly influence company branding, reputation and customer loyalty. As a liaison between producers and consumers, dealers must preserve valuable consumer relationships to increase customer satisfaction and adherence. Lack of comprehensive measurement and standardization regarding service quality emerges as a consideration issue towards the company service excellence. Therefore, identifying the service quality performance and grouping develops into valuable contributions in decision-making to control and enhance the company's intention. This study applies the K-Means Algorithm by optimizing the number of clusters in identifying dealer service quality performance. Hence, the ultimate service quality formation will be performed. The analysis found three dealer identification categories, including Cluster One, with 125 dealers grouped as good performance; Cluster Two, with 30 dealers grouped as very good performance; and Cluster Three, with 38 dealers grouped as not good performance. In order to evaluate the efficacy of optimum k value, the lists of testing approaches are conducted and compared, whereby Calinski-Harabasz, Elbow, Silhouette Score, and Davies-Bouldin Index (DBI) contribute in k=3. As a result, the optimum clusters are determined through the highest performance of k values as three. These three clusters have successfully identified the service quality level of dealers effectively and administered the company guidelines for corrective actions and improvements in customer service quality instead of the standardized normal distribution grouping calculation.
    • …
    corecore