3,192 research outputs found
Measures for unsupervised fuzzy-rough feature selection
For supervised learning, feature selection algorithms at-tempt to maximise a given function of predictive accuracy. This function usually considers the ability of feature vectors to reflect decision class labels. It is therefore intuitive to re-tain only those features that are related to or lead to these decision classes. However, in unsupervised learning, deci-sion class labels are not provided, which poses questions such as; which features should be retained? and, why not use all of the information? The problem is that not all fea-tures are important. Some of the features may be redundant, and others may be irrelevant and noisy. In this paper, some new fuzzy-rough set-based approaches to unsupervised fea-ture selection are proposed. These approaches require no thresholding or domain information, can operate on real-valued data, and result in a significant reduction in dimen-sionality whilst retaining the semantics of the data. 1
Taming Wild High Dimensional Text Data with a Fuzzy Lash
The bag of words (BOW) represents a corpus in a matrix whose elements are the
frequency of words. However, each row in the matrix is a very high-dimensional
sparse vector. Dimension reduction (DR) is a popular method to address sparsity
and high-dimensionality issues. Among different strategies to develop DR
method, Unsupervised Feature Transformation (UFT) is a popular strategy to map
all words on a new basis to represent BOW. The recent increase of text data and
its challenges imply that DR area still needs new perspectives. Although a wide
range of methods based on the UFT strategy has been developed, the fuzzy
approach has not been considered for DR based on this strategy. This research
investigates the application of fuzzy clustering as a DR method based on the
UFT strategy to collapse BOW matrix to provide a lower-dimensional
representation of documents instead of the words in a corpus. The quantitative
evaluation shows that fuzzy clustering produces superior performance and
features to Principal Components Analysis (PCA) and Singular Value
Decomposition (SVD), two popular DR methods based on the UFT strategy
- …