Search CORE

15,245 research outputs found

Taming Wild High Dimensional Text Data with a Fuzzy Lash

Author: Karami Amir
Publication venue
Publication date: 01/11/2017
Field of study

The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on a new basis to represent BOW. The recent increase of text data and its challenges imply that DR area still needs new perspectives. Although a wide range of methods based on the UFT strategy has been developed, the fuzzy approach has not been considered for DR based on this strategy. This research investigates the application of fuzzy clustering as a DR method based on the UFT strategy to collapse BOW matrix to provide a lower-dimensional representation of documents instead of the words in a corpus. The quantitative evaluation shows that fuzzy clustering produces superior performance and features to Principal Components Analysis (PCA) and Singular Value Decomposition (SVD), two popular DR methods based on the UFT strategy

arXiv.org e-Print Archive

Crossref

Scholar Commons - Institutional Repository of the University of South Carolina

Multi-Criterion Mammographic Risk Analysis Supported with Multi-Label Fuzzy-Rough Feature Selection

Author: Qu Yanpeng
Shang Changjing
Shen Qiang
Yang Longzhi
Yue Guanli
Zwiggelaar Reyer
Publication venue
Publication date: 01/09/2019
Field of study

Context and background Breast cancer is one of the most common diseases threatening the human lives globally, requiring effective and early risk analysis for which learning classifiers supported with automated feature selection offer a potential robust solution. Motivation Computer aided risk analysis of breast cancer typically works with a set of extracted mammographic features which may contain significant redundancy and noise, thereby requiring technical developments to improve runtime performance in both computational efficiency and classification accuracy. Hypothesis Use of advanced feature selection methods based on multiple diagnosis criteria may lead to improved results for mammographic risk analysis. Methods An approach for multi-criterion based mammographic risk analysis is proposed, by adapting the recently developed multi-label fuzzy-rough feature selection mechanism. Results A system for multi-criterion mammographic risk analysis is implemented with the aid of multi-label fuzzy-rough feature selection and its performance is positively verified experimentally, in comparison with representative popular mechanisms. Conclusions The novel approach for mammographic risk analysis based on multiple criteria helps improve classification accuracy using selected informative features, without suffering from the redundancy caused by such complex criteria, with the implemented system demonstrating practical efficacy

Northumbria Research Link

Aberystwyth Research Portal

Dealing with imbalanced and weakly labelled data in machine learning using fuzzy and rough set methods

Author: Vluymans Sarah
Publication venue: Ghent University. Faculty of Medicine and Health Sciences ; University of Granada. Department of Computer Science and Artificial Intelligence
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

CERN Document Server

Grooming Detection using Fuzzy-Rough Feature Selection and Text Classification

Author: Anderson Philip
Li Jie
Naik Nitin
Yang Longzhi
Zuo Zheming
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/07/2018
Field of study

Online child grooming detection has recently attracted intensive research interests from both the machine learning community and digital forensics community due to its great social impact. The existing data-driven approaches usually face the challenges of lack of training data and the uncertainty of classes in terms of the classification or decision boundary. This paper proposes a grooming detection approach in an effort to address such uncertainty based on a data set derived from a publicly available profiling data set. In particular, the approach firstly applies the conventional text feature extraction approach in identifying the most significant words in the data set. This is followed by the application of a fuzzy-rough feature selection approach in reducing the high dimensions of the selected words for fast processing, which at the same time addressing the uncertainty of class boundaries. The experimental results demonstrate the efficiency and efficacy

Northumbria Research Link

Crossref

Teeside University's Research Repository

Feature Selection Inspired Classifier Ensemble Reduction

Author: Chao Fei
Diao Ren
Peng Taoxin
Shen Qiang
Snooke Neal
晁飞
Publication venue
Publication date: 15/07/2014
Field of study

Classifier ensembles constitute one of the main research directions in machine learning and data mining. The use of multiple classifiers generally allows better predictive performance than that achievable with a single model. Several approaches exist in the literature that provide means to construct and aggregate such ensembles. However, these ensemble systems contain redundant members that, if removed, may further increase group diversity and produce better results. Smaller ensembles also relax the memory and storage requirements, reducing system's run-time overhead while improving overall efficiency. This paper extends the ideas developed for feature selection problems to support classifier ensemble reduction, by transforming ensemble predictions into training samples, and treating classifiers as features. Also, the global heuristic harmony search is used to select a reduced subset of such artificial features, while attempting to maximize the feature subset evaluation. The resulting technique is systematically evaluated using high dimensional and large sized benchmark datasets, showing a superior classification performance against both original, unreduced ensembles, and randomly formed subsets. ? 2013 IEEE

Crossref

Aberystwyth Research Portal

Repository@Napier

Xiamen University Institutional Repository

Fuzzy-rough-learn 0.1 : a Python library for machine learning with fuzzy rough sets

Author: C Cornelis
D Dubois
F Pedregosa
G Nguyen
G van Rossum
LS Riza
M Hall
N Verbiest
R Jensen
RR Yager
S Vluymans
S Vluymans
S Vluymans
YA Malkov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

We present fuzzy-rough-learn, the first Python library of fuzzy rough set machine learning algorithms. It contains three algorithms previously implemented in R and Java, as well as two new algorithms from the recent literature. We briefly discuss the use cases of fuzzy-rough-learn and the design philosophy guiding its development, before providing an overview of the included algorithms and their parameters

Crossref

Ghent University Academic Bibliography

A Survey on Feature Selection Algorithms

Author: Dr. Amit Kumar Saxena, Vimal Kumar Dubey
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/04/2015
Field of study

One major component of machine learning is feature analysis which comprises of mainly two processes: feature selection and feature extraction. Due to its applications in several areas including data mining, soft computing and big data analysis, feature selection has got a reasonable importance. This paper presents an introductory concept of feature selection with various inherent approaches. The paper surveys historic developments reported in feature selection with supervised and unsupervised methods. The recent developments with the state of the art in the on-going feature selection algorithms have also been summarized in the paper including their hybridizations. DOI: 10.17762/ijritcc2321-8169.16043

International Journal on Recent and Innovation Trends in Computing and Communication