15,357 research outputs found

    Complete agglomerative hierarchy document’s clustering based on fuzzy luhn’s gibbs latent dirichlet allocation

    Get PDF
    Agglomerative hierarchical is a bottom up clustering method, where the distances between documents can be retrieved by extracting feature values using a topic-based latent dirichlet allocation method. To reduce the number of features, term selection can be done using Luhn’s Idea. Those methods can be used to build the better clusters for document. But, there is less research discusses it. Therefore, in this research, the term weighting calculation uses Luhn’s Idea to select the terms by defining upper and lower cut-off, and then extracts the feature of terms using gibbs sampling latent dirichlet allocation combined with term frequency and fuzzy Sugeno method. The feature values used to be the distance between documents, and clustered with single, complete and average link algorithm. The evaluations show the feature extraction with and without lower cut-off have less difference. But, the topic determination of each term based on term frequency and fuzzy Sugeno method is better than Tsukamoto method in finding more relevant documents. The used of lower cut-off and fuzzy Sugeno gibbs latent dirichlet allocation for complete agglomerative hierarchical clustering have consistent metric values. This clustering method suggested as a better method in clustering documents that is more relevant to its gold standard

    Perbandingan Pembobotan Kriteria dan Seleksi Kriteria Pada Pengelompokan Kinerja Karyawan Dengan Fuzzy C-Means

    Get PDF
    Perkembangan teknologi informasi mempermudah perusahaan dalam melakukan banyak hal dan mempengaruhi operasional perusahaan. Salah satu objek yang mempengaruhi operasional perusahaan adalah kinerja karyawan. Penilaian kinerja karyawan didasarkan pada empat kriteria, yaitu kedisiplinan, kejujuran, kerja sama, dan kualitas kerja, Tujuan penelitian ini untuk melakukan pengelompokan karyawan dengan fuzzy c-means. Pengelompokan yang dilakukan dalam penelitian ini terdiri dari dua macam, yaitu pengelompokan dengan pembobotan kriteria dan pengelompokan dengan seleksi kriteria. Dengan bobot sebesar 25%, 30%, 25%, dan 20% untuk kriteria kedisiplinan, kejujuran, kerja sama, dan kualitas kerja, pengelompokan dengan pembobotan kriteria menghasilkan akurasi sebesar 0.8462. Pengelompokan FCMdengan seleksi kriteria menghasilkan kriteria kedisiplinan dan kejujuran merupakan dua kriteria yang penting dalam pengelompokan karyawan, dengan akurasi sebesar 1. Dari hasil perbandingan dua macam pengelompokan tersebut didapatkan bahwa kejujuran merupakan kriteria terpenting dalam pengelompokan karyawan berdasarkan kinerjanya.Comparison of Weighted Criteria and Selection Criteria for Employee Performance Grouping with Fuzzy C-Means. The development of information technology makes it easier for companies to do many things and affect company operations. One of the objects affecting the company development is employees. Employees’ performance can be observed from their discipline, honesty, cooperation, and work quality. The purpose of this study is to group the employees based on their performance using fuzzy c-means. There are two kinds of clustering explained in this paper, i.e., clustering with feature weighting and clustering with feature selection. Using the feature weights of 25%, 30%, 25%, and 20% for work discipline, honesty, cooperation, and work quality, respectively, the clustering with feature weighting gives an accuracy rate of0.8462. While using feature selection, the fuzzy c-means give 1, where the work discipline and honesty are the critical features in clustering. Therefore, we find that honesty is the most essential feature to cluster the employees based on their performance from this research

    A Survey on Soft Subspace Clustering

    Full text link
    Subspace clustering (SC) is a promising clustering technology to identify clusters based on their associations with subspaces in high dimensional spaces. SC can be classified into hard subspace clustering (HSC) and soft subspace clustering (SSC). While HSC algorithms have been extensively studied and well accepted by the scientific community, SSC algorithms are relatively new but gaining more attention in recent years due to better adaptability. In the paper, a comprehensive survey on existing SSC algorithms and the recent development are presented. The SSC algorithms are classified systematically into three main categories, namely, conventional SSC (CSSC), independent SSC (ISSC) and extended SSC (XSSC). The characteristics of these algorithms are highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Taming Wild High Dimensional Text Data with a Fuzzy Lash

    Full text link
    The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on a new basis to represent BOW. The recent increase of text data and its challenges imply that DR area still needs new perspectives. Although a wide range of methods based on the UFT strategy has been developed, the fuzzy approach has not been considered for DR based on this strategy. This research investigates the application of fuzzy clustering as a DR method based on the UFT strategy to collapse BOW matrix to provide a lower-dimensional representation of documents instead of the words in a corpus. The quantitative evaluation shows that fuzzy clustering produces superior performance and features to Principal Components Analysis (PCA) and Singular Value Decomposition (SVD), two popular DR methods based on the UFT strategy
    • 

    corecore