Search CORE

25,883 research outputs found

Rough feature selection for intelligent classifiers

Author: Shen Qiang
Publication venue: Springer Nature
Publication date: 01/01/2007
Field of study

Abstract. The last two decades have seen many powerful classification systems being built for large-scale real-world applications. However, for all their accuracy, one of the persistent obstacles facing these systems is that of data dimensionality. To enable such systems to be effective, a redundancy-removing step is usually required to pre-process the given data. Rough set theory offers a useful, and formal, methodology that can be employed to reduce the dimensionality of datasets. It helps select the most information rich features in a dataset, without transforming the data, all the while attempting to minimise information loss during the selection process. Based on this observation, this paper discusses an approach for semantics-preserving dimensionality reduction, or feature selection, that simplifies domains to aid in developing fuzzy or neural classifiers. Computationally, the approach is highly efficient, relying on simple set operations only. The success of this work is illustrated by applying it to addressing two real-world problems: industrial plant monitoring and medical image analysis.

CiteSeerX

Aberystwyth Research Portal

Scalable approximate FRNN-OWA classification

Author: Cornelis Chris
Lenz Oliver Urs
Peralta Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Fuzzy Rough Nearest Neighbour classification with Ordered Weighted Averaging operators (FRNN-OWA) is an algorithm that classifies unseen instances according to their membership in the fuzzy upper and lower approximations of the decision classes. Previous research has shown that the use of OWA operators increases the robustness of this model. However, calculating membership in an approximation requires a nearest neighbour search. In practice, the query time complexity of exact nearest neighbour search algorithms in more than a handful of dimensions is near-linear, which limits the scalability of FRNN-OWA. Therefore, we propose approximate FRNN-OWA, a modified model that calculates upper and lower approximations of decision classes using the approximate nearest neighbours returned by Hierarchical Navigable Small Worlds (HNSW), a recent approximative nearest neighbour search algorithm with logarithmic query time complexity at constant near-100% accuracy. We demonstrate that approximate FRNN-OWA is sufficiently robust to match the classification accuracy of exact FRNN-OWA while scaling much more efficiently. We test four parameter configurations of HNSW, and evaluate their performance by measuring classification accuracy and construction and query times for samples of various sizes from three large datasets. We find that with two of the parameter configurations, approximate FRNN-OWA achieves near-identical accuracy to exact FRNN-OWA for most sample sizes within query times that are up to several orders of magnitude faster

Ghent University Academic Bibliography

Taming Wild High Dimensional Text Data with a Fuzzy Lash

Author: Karami Amir
Publication venue
Publication date: 01/11/2017
Field of study

The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on a new basis to represent BOW. The recent increase of text data and its challenges imply that DR area still needs new perspectives. Although a wide range of methods based on the UFT strategy has been developed, the fuzzy approach has not been considered for DR based on this strategy. This research investigates the application of fuzzy clustering as a DR method based on the UFT strategy to collapse BOW matrix to provide a lower-dimensional representation of documents instead of the words in a corpus. The quantitative evaluation shows that fuzzy clustering produces superior performance and features to Principal Components Analysis (PCA) and Singular Value Decomposition (SVD), two popular DR methods based on the UFT strategy

arXiv.org e-Print Archive

Crossref

Scholar Commons - Institutional Repository of the University of South Carolina