5 research outputs found

    A fuzzy c-means bi-sonar-based Metaheuristic Optimization Algorithm

    Get PDF
    Fuzzy clustering is an important problem which is the subject of active research in several real world applications. Fuzzy c-means (FCM) algorithm is one of the most popular fuzzy clustering techniques because it is efficient, straightforward, and easy to implement. Fuzzy clustering methods allow the objects to belong to several clusters simultaneously, with different degrees of membership. Objects on the boundaries between several classes are not forced to fully belong to one of the classes, but rather are assigned membership degrees between 0 and 1 indicating their partial membership. However FCM is sensitive to initialization and is easily trapped in local optima. Bi-sonar optimization (BSO) is a stochastic global Metaheuristic optimization tool and is a relatively new algorithm. In this paper a hybrid fuzzy clustering method FCB based on FCM and BSO is proposed which makes use of the merits of both algorithms. Experimental results show that this proposed method is efficient and reveals encouraging results

    Metode Hibridasi Artificial Bee Colony dan Fuzzy K-Modes untuk Klasterisasi Data Kategorikal

    Get PDF
    Fuzzy K-Modes is an effective method for clustering categorical data. This method is as extensions of fuzzy k-means algorithm by using modes in the process of matching the dissimilarity measure to update centroid of the cluster and to obtain the optimal solution. Nevertheless, Fuzzy K-Modes has the disadvantage of the possibility of stopping in the optimal local solution. Artificial Bee Colony (ABC) is an optimization method that has been proven effective and has the ability to obtain global solutions. This study proposes a hybridization between the Artificial Bee Colony algorithm and Fuzzy K-Modes for clustering categorical data. The implementation of hybridization between Artifical Bee Colony and Fuzzy K-Modes (ABC-FKMO) has been proven to be able to improve the performance of categorical data clustering especially in the aspects of Objective Function, F-Measure, and Accuracy. The test results with datasets of the Soybean Disease, Breast Cancer and Congressional Voting Records from the UCI data repository, showed the Accuracy averages of 0.991, 0.615, and 0.867. Objective Function is better at an average of 2.73%, F-Measure is better at an average of 4.31% and Accuracy is better at an average of 5.16%.Fuzzy K-Modes merupakan metode klasterisasi data yang efektif untuk data kategorikal. Metode ini merupakan perluasan fuzzy k-means dengan menggunakan modes (modus) dalam proses pencocokan ukuran ketidaksamaan (dissimilarity measure) untuk memutakhirkan titik pusat klaster dan mendapatkan solusi yang optimal. Meskipun demikian Fuzzy K-Modes memiliki kelemahan adanya kemungkinan berhenti dalam solusi lokal optimal. Artificial Bee Colony (ABC) merupakan metode optimasi yang sudah terbukti efektif dan memiliki kemampuan mendapatkan solusi global. Penelitian ini mengusulkan hibridasi algoritma Artificial Bee Colony dengan Fuzzy K-Modes untuk klasterisasi data kategorikal. Implementasi hibridasi Artifical Bee Colony dengan Fuzzy K-Modes (ABC-FKMO) terbukti mampu meningkatkan performa klasterisasi data kategorikal khususnya dalam aspek nilai Objective Function, F-Measure, dan Accuracy. Hasil pengujian dengan dataset Soybean Disease, Breast Cancer dan Congressional Voting Records dari UCI data repository, menunjukkan rata-rata Accuracy sebesar 0.991, 0.615, dan 0.867. Objective Function lebih baik rata rata sebesar 2,73 %, F-Measure lebih baik rata-rata sebesar 4,31 % dan Accuracy lebih baik rata-rata sebesar 5,16 %

    Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering

    Get PDF
    The conventional k-modes algorithm and its variants have been extensively used for categorical data clustering. However, these algorithms have some drawbacks, e.g., they can be trapped into local optima and sensitive to initial clusters/modes. Our numerical experiments even showed that the k-modes algorithm could not identify the optimal clustering results for some special datasets regardless the selection of the initial centers. In this paper, we developed an integer linear programming (ILP) approach for the k-modes clustering, which is independent to the initial solution and can obtain directly the optimal results for small-sized datasets. We also developed a heuristic algorithm that implements iterative partial optimization in the ILP approach based on a framework of variable neighborhood search, known as IPO-ILP-VNS, to search for near-optimal results of medium and large sized datasets with controlled computing time. Experiments on 38 datasets, including 27 synthesized small datasets and 11 known benchmark datasets from the UCI site were carried out to test the proposed ILP approach and the IPO-ILP-VNS algorithm. The experimental results outperformed the conventional and other existing enhanced k-modes algorithms in literature, updated 9 of the UCI benchmark datasets with new and improved results

    Development of a modeling algorithm to predict lean implementation success

    Get PDF
    ”Lean has become a common term and goal in organizations throughout the world. The approach of eliminating waste and continuous improvement may seem simple on the surface but can be more complex when it comes to implementation. Some firms implement lean with great success, getting complete organizational buy-in and realizing the efficiencies foundational to lean. Other organizations struggle to implement lean. Never able to get the buy-in or traction needed to really institute the sort of cultural change that is often needed to implement change. It would be beneficial to have a tool that organizations could use to assess their ability to implement lean, the degree to which they have implemented lean, and what specific areas they should focus on to improve their readiness or implementation level. This research investigates and proposes two methods for assessing lean implementation. The first is utilizing standard statistical regression. A regression model was developed that can be used to assess the implementation of lean within an organization. The second method is based in artificial intelligence. It utilizes an unsupervised learning algorithm to develop a training set corresponding to low, medium, and high implementation. This training set could then be used along with a supervised learning algorithm to dynamically monitor an organizations readiness or implementation level and make recommendations on areas to focus on to improve implementation success”--Abstract, page iv

    Categorical and Fuzzy Ensemble-Based Algorithms for Cluster Analysis

    Get PDF
    This dissertation focuses on improving multivariate methods of cluster analysis. In Chapter 3 we discuss methods relevant to the categorical clustering of tertiary data while Chapter 4 considers the clustering of quantitative data using ensemble algorithms. Lastly, in Chapter 5, future research plans are discussed to investigate the clustering of spatial binary data. Cluster analysis is an unsupervised methodology whose results may be influenced by the types of variables recorded on observations. When dealing with the clustering of categorical data, solutions produced may not accurately reflect the structure of the process that generated them. Increased variability within the latent structure of the data and the presence of noisy observations are two issues that may be obscured within the categories. It is also the presence of these issues that may cause clustering solutions produced in categorical cases to be less accurate. To remedy this, in Chapter 3, a method is proposed that utilizes concepts from statistics to improve the accuracy of clustering solutions produced in tertiary data objects. By pre-smoothing the dissimilarities used in traditional clustering algorithms, we show it is possible to produce clustering solutions more reflective of the latent process from which observations arose. To do this the Fienberg-Holland estimator, a shrinkage-based statistical smoother, is used along with 3 choices of smoothing. We show the method results in more accurate clusters via simulation and an application to diabetes. Solutions produced from clustering algorithms may vary regardless of the type of variables observed. Such variations may be due to the clustering algorithm used, the initial starting point of an algorithm, or by the type of algorithm used to produce such solutions. Furthermore, it may sometimes be of interest to produce clustering solutions that allow observations to share similarities with more than one cluster. One method proposed to combat these problems and add flexibility to clustering solutions is fuzzy ensemble-based clustering. In Chapter 4 three fuzzy ensemble based clustering algorithms are introduced for the clustering of quantitative data objects and compared to the performance of the traditional Fuzzy C-Means algorithm. The ensembles proposed in this case, however, differ from traditional ensemble-based methods of clustering in that the clustering solutions produced within the generation process have resulted from supervised classifiers and not from clustering algorithms. A simulation study and two data applications suggest that in certain settings, the proposed fuzzy ensemble-based algorithms of clustering produce more accurate clusters than the Fuzzy C-Means algorithm. In both of the aforementioned cases, only the types of variables recorded on each object were of importance in the clustering process. In Chapter 5 the types of variables recorded and their spatial nature are both of importance. An idea is presented that combines applications to geodesics with categorical cluster analysis to deal with the spatial and categorical nature of observations. The focus in this chapter is on producing an accurate method of clustering the binary and spatial data objects found in the Global Terrorism Database
    corecore