59,717 research outputs found

    Perbedaan tingkat akurasi metode k-means dan hierarchical clustering di bidang peramalan dan klasifikasi

    Get PDF
    Abstract: K-Means is a non-hierarchical data clustering method that attempts to partition existing data into one or more clusters/groups. This method partitions data into clusters so that data with the same characteristics are grouped into the same cluster and data with different characteristics are grouped into other groups. Hierarchical methods are clustering techniques to form a hierarchy or based on a certain level so that it resembles a tree structure. Thus, the grouping process is carried out in stages or stages. This research was conducted by reviewing research in national journals with topics that match the different levels of accuracy of the k-means and hierarchical clustering methods in the field of forecasting and classification. The purpose of this study was to determine the significant difference in the level of accuracy in forecasting and classification results between using the K-Means clustering method or using the Hierarchical clustering method. This research method uses a meta-analysis method by reviewing several articles from 2012-2022 related to differences in the level of accuracy of the k-means method and hierarchical clustering in the field of forecasting and classification. Data is collected from indexer databases such as Scopus, DOAJ, WorldCat, and Google Scholar. The data used is the result of research that contains the value of the correlation (r), and the number of data subjects (N). From the search results obtained publication data that meets as many as 60 publications. Based on the results of the analysis using JASP software, it was obtained that the k means method, the summary effect value of the forest plot was 0.67, in other words, the effect of the k means forecasting model on the accuracy rate was 67% with a moderate category, while in the hierarchical method the summary effect value of the forest plot was 0.61. in other words, the influence of the hierarchical method of forecasting models on the level of accuracy is 61% in the medium category

    Clustering and Visualizing the Status of Child Health in Kenya: A Data Mining Approach.

    Get PDF
    International audienceThe inauguration of the new constitution in Kenya has led to the devolution of health care in the counties. It is against this backdrop that has necessitated the need to develop a model of grouping these regions into natural groups with similar characteristics that can influence the child health for the purpose of health care planning and regulation. Little research has explored the methodology that can be used to create such groupings in Kenya. The purpose of this research was to develop and explore a methodology of clustering and visualizing the status of the child health in Kenya. In this research we propose a new model that clusters the counties based on the UNICEF indicators of child health. The cluster analysis methodology employed to achieve this was by use of k-means clustering algorithm. Both hierarchical and non-hierarchical clustering algorithms were used to build a consensus with the results of clusters obtained by k-means. The number of clusters selected was based on heuristic integrating a statistical-based measure of cluster fit. Using data from literature, the clustering methodology developed grouped the 47 counties into three distinctive clusters. These three clusters were made up of 12, 8 and 27 observations respectively. The study classified the clusters as well-off, most marginalized and moderately marginalized counties. The methodology developed was objective, replicable and sustainable to create the clusters. It was developed in a theoretically sound principle and can generalize across applications requiring clustering. An examination of several clustering algorithms revealed similar results

    Unsupervised learning algorithms applied to grouping problems

    Get PDF
    One of the tasks of great interest within process mining is the discovery of business process models, which consists of using an event log as input and producing a business process model by analyzing the data contained in the log and applying a process mining method, task and/or technique. The discovery allows the identification of the behaviors contained in the cases of the event log in order to detect possible deviations and/or validate that the business process is executed according to the business requirements. This paper presents an approach based on unsupervised learning techniques for the grouping of traces to generate simpler and more understandable models. The algorithms implemented for clustering are K-means, hierarchical agglomerative and density-based spatial clustering of applications with noise (DBSCAN)

    Root cause analysis of COVID-19 cases by enhanced text mining process

    Get PDF
    The main focus of this research is to find the reasons behind the fresh cases of COVID-19 from the public’s perception for data specific to India. The analysis is done using machine learning approaches and validating the inferences with medical professionals. The data processing and analysis is accomplished in three steps. First, the dimensionality of the vector space model (VSM) is reduced with improvised feature engineering (FE) process by using a weighted term frequency-inverse document frequency (TF-IDF) and forward scan trigrams (FST) followed by removal of weak features using feature hashing technique. In the second step, an enhanced K-means clustering algorithm is used for grouping, based on the public posts from Twitter®. In the last step, latent dirichlet allocation (LDA) is applied for discovering the trigram topics relevant to the reasons behind the increase of fresh COVID-19 cases. The enhanced K-means clustering improved Dunn index value by 18.11% when compared with the traditional K-means method. By incorporating improvised two-step FE process, LDA model improved by 14% in terms of coherence score and by 19% and 15% when compared with latent semantic analysis (LSA) and hierarchical dirichlet process (HDP) respectively thereby resulting in 14 root causes for spike in the disease

    A single currency for Asia? Evaluation and comparison using hierarchical and model-based cluster analysis

    Get PDF
    Today, there is increased speculation on the possibility of an Asian currency, as the region begins to show increased promise as a region of nascent economic activity. Any monetary integration scheme in East Asia would likely have to include both China and India though, so this paper attempts to assess the evolution of convergence among the East Asian countries, including China and India, according to the optimum currency area theory criteria, which is operationalized through the use of cluster analysis. In this paper we use both traditional "hierarchical" clustering as well as the more recently developed "model-based" clustering techniques and compare the outcome in each case. As the East Asian crisis of 1997-98 is likely to a¤ect the results, the exercise is done for pre-crisis, crisis, and post-crisis periods. The results reveal some structure among the countries, an increase in the degree of subregional homogeneity, and a robust relationship between Malaysia and Singapore

    APPLICATION OF CLUSTERING ANALYSIS TO DATA DISTRIBUTION OF COVID-19 IN BENGKULU PROVINCE

    Get PDF
    Bengkulu Province is one of the provinces in Indonesia. Based on the results of the Population Census (SP) in September 2020, carried out by BPS, there were 2,010,670 inhabitants in Bengkulu Province. The area of ​​Bengkulu Province is 19,813 km2, consisting of 10 regencies/cities. The large area and population encourage an effort to anticipate the transmission of COVID-19 that is soaring high in Bengkulu Province. One is by grouping regencies/cities in Bengkulu Province based on several variables that characterize objects using the Clustering method. This study aimed to group districts/cities in Bengkulu Province based on several variables that characterize objects related to the spread of COVID-19 in Bengkulu Province. The method used was the clustering method. The data used in this study was secondary data about the variable of the spread of COVID-19 in Bengkulu Province from January 1, 2021, to May 31, 2021. It is accessed through the official website of the Bengkulu Province government to convey information to the public regarding the increase of COVID-19 Cases in Bengkulu Province. The grouping using the Hierarchical Clustering method obtained the best model as complete linkage, with the number of clusters K = 2 and the K-Means method with K = 2. The results obtained are good because it has relatively tiny variability within the cluster, and the value of variability in both clusters is relatively large

    Clustering Methods for Electricity Consumers: An Empirical Study in Hvaler-Norway

    Get PDF
    The development of Smart Grid in Norway in specific and Europe/US in general will shortly lead to the availability of massive amount of fine-grained spatio-temporal consumption data from domestic households. This enables the application of data mining techniques for traditional problems in power system. Clustering customers into appropriate groups is extremely useful for operators or retailers to address each group differently through dedicated tariffs or customer-tailored services. Currently, the task is done based on demographic data collected through questionnaire, which is error-prone. In this paper, we used three different clustering techniques (together with their variants) to automatically segment electricity consumers based on their consumption patterns. We also proposed a good way to extract consumption patterns for each consumer. The grouping results were assessed using four common internal validity indexes. We found that the combination of Self Organizing Map (SOM) and k-means algorithms produce the most insightful and useful grouping. We also discovered that grouping quality cannot be measured effectively by automatic indicators, which goes against common suggestions in literature.Comment: 12 pages, 3 figure
    • …
    corecore