15,357 research outputs found
Complete agglomerative hierarchy documentâs clustering based on fuzzy luhnâs gibbs latent dirichlet allocation
Agglomerative hierarchical is a bottom up clustering method, where the distances between documents can be retrieved by extracting feature values using a topic-based latent dirichlet allocation method. To reduce the number of features, term selection can be done using Luhnâs Idea. Those methods can be used to build the better clusters for document. But, there is less research discusses it. Therefore, in this research, the term weighting calculation uses Luhnâs Idea to select the terms by defining upper and lower cut-off, and then extracts the feature of terms using gibbs sampling latent dirichlet allocation combined with term frequency and fuzzy Sugeno method. The feature values used to be the distance between documents, and clustered with single, complete and average link algorithm. The evaluations show the feature extraction with and without lower cut-off have less difference. But, the topic determination of each term based on term frequency and fuzzy Sugeno method is better than Tsukamoto method in finding more relevant documents. The used of lower cut-off and fuzzy Sugeno gibbs latent dirichlet allocation for complete agglomerative hierarchical clustering have consistent metric values. This clustering method suggested as a better method in clustering documents that is more relevant to its gold standard
Perbandingan Pembobotan Kriteria dan Seleksi Kriteria Pada Pengelompokan Kinerja Karyawan Dengan Fuzzy C-Means
Perkembangan teknologi informasi mempermudah perusahaan dalam
melakukan banyak hal dan mempengaruhi operasional perusahaan. Salah satu objek
yang mempengaruhi operasional perusahaan adalah kinerja karyawan. Penilaian kinerja
karyawan didasarkan pada empat kriteria, yaitu kedisiplinan, kejujuran, kerja sama, dan
kualitas kerja, Tujuan penelitian ini untuk melakukan pengelompokan karyawan dengan
fuzzy c-means. Pengelompokan yang dilakukan dalam penelitian ini terdiri dari dua
macam, yaitu pengelompokan dengan pembobotan kriteria dan pengelompokan dengan
seleksi kriteria. Dengan bobot sebesar 25%, 30%, 25%, dan 20% untuk kriteria
kedisiplinan, kejujuran, kerja sama, dan kualitas kerja, pengelompokan dengan
pembobotan kriteria menghasilkan akurasi sebesar 0.8462. Pengelompokan FCMdengan
seleksi kriteria menghasilkan kriteria kedisiplinan dan kejujuran merupakan dua kriteria
yang penting dalam pengelompokan karyawan, dengan akurasi sebesar 1. Dari hasil
perbandingan dua macam pengelompokan tersebut didapatkan bahwa kejujuran
merupakan kriteria terpenting dalam pengelompokan karyawan berdasarkan kinerjanya.Comparison of Weighted Criteria and Selection Criteria for Employee
Performance Grouping with Fuzzy C-Means. The development of information
technology makes it easier for companies to do many things and affect company
operations. One of the objects affecting the company development is employees.
Employeesâ performance can be observed from their discipline, honesty, cooperation, and
work quality. The purpose of this study is to group the employees based on their
performance using fuzzy c-means. There are two kinds of clustering explained in this
paper, i.e., clustering with feature weighting and clustering with feature selection. Using
the feature weights of 25%, 30%, 25%, and 20% for work discipline, honesty,
cooperation, and work quality, respectively, the clustering with feature weighting gives an
accuracy rate of0.8462. While using feature selection, the fuzzy c-means give 1, where the
work discipline and honesty are the critical features in clustering. Therefore, we find that
honesty is the most essential feature to cluster the employees based on their performance
from this research
A Survey on Soft Subspace Clustering
Subspace clustering (SC) is a promising clustering technology to identify
clusters based on their associations with subspaces in high dimensional spaces.
SC can be classified into hard subspace clustering (HSC) and soft subspace
clustering (SSC). While HSC algorithms have been extensively studied and well
accepted by the scientific community, SSC algorithms are relatively new but
gaining more attention in recent years due to better adaptability. In the
paper, a comprehensive survey on existing SSC algorithms and the recent
development are presented. The SSC algorithms are classified systematically
into three main categories, namely, conventional SSC (CSSC), independent SSC
(ISSC) and extended SSC (XSSC). The characteristics of these algorithms are
highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
Taming Wild High Dimensional Text Data with a Fuzzy Lash
The bag of words (BOW) represents a corpus in a matrix whose elements are the
frequency of words. However, each row in the matrix is a very high-dimensional
sparse vector. Dimension reduction (DR) is a popular method to address sparsity
and high-dimensionality issues. Among different strategies to develop DR
method, Unsupervised Feature Transformation (UFT) is a popular strategy to map
all words on a new basis to represent BOW. The recent increase of text data and
its challenges imply that DR area still needs new perspectives. Although a wide
range of methods based on the UFT strategy has been developed, the fuzzy
approach has not been considered for DR based on this strategy. This research
investigates the application of fuzzy clustering as a DR method based on the
UFT strategy to collapse BOW matrix to provide a lower-dimensional
representation of documents instead of the words in a corpus. The quantitative
evaluation shows that fuzzy clustering produces superior performance and
features to Principal Components Analysis (PCA) and Singular Value
Decomposition (SVD), two popular DR methods based on the UFT strategy
- âŠ