3 research outputs found
An Effective Feature Selection Method Based on Pair-Wise Feature Proximity for High Dimensional Low Sample Size Data
Feature selection has been studied widely in the literature. However, the
efficacy of the selection criteria for low sample size applications is
neglected in most cases. Most of the existing feature selection criteria are
based on the sample similarity. However, the distance measures become
insignificant for high dimensional low sample size (HDLSS) data. Moreover, the
variance of a feature with a few samples is pointless unless it represents the
data distribution efficiently. Instead of looking at the samples in groups, we
evaluate their efficiency based on pairwise fashion. In our investigation, we
noticed that considering a pair of samples at a time and selecting the features
that bring them closer or put them far away is a better choice for feature
selection. Experimental results on benchmark data sets demonstrate the
effectiveness of the proposed method with low sample size, which outperforms
many other state-of-the-art feature selection methods.Comment: European Signal Processing Conference 201
The pertinent single-attribute-based classifier for small datasets classification
Classifying a dataset using machine learning algorithms can be a big challenge when the target is a small dataset. The OneR classifier can be used for such cases due to its simplicity and efficiency. In this paper, we revealed the power of a single attribute by introducing the pertinent single-attribute-based-heterogeneity-ratio classifier (SAB-HR) that used a pertinent attribute to classify small datasets. The SAB-HR’s used feature selection method, which used the Heterogeneity-Ratio (H-Ratio) measure to identify the most homogeneous attribute among the other attributes in the set. Our empirical results on 12 benchmark datasets from a UCI machine learning repository showed that the SAB-HR classifier significantly outperformed the classical OneR classifier for small datasets. In addition, using the H-Ratio as a feature selection criterion for selecting the single attribute was more effectual than other traditional criteria, such as Information Gain (IG) and Gain Ratio (GR)
A Matlab Toolbox for Feature Importance Ranking
More attention is being paid for feature importance ranking (FIR), in
particular when thousands of features can be extracted for intelligent
diagnosis and personalized medicine. A large number of FIR approaches have been
proposed, while few are integrated for comparison and real-life applications.
In this study, a matlab toolbox is presented and a total of 30 algorithms are
collected. Moreover, the toolbox is evaluated on a database of 163 ultrasound
images. To each breast mass lesion, 15 features are extracted. To figure out
the optimal subset of features for classification, all combinations of features
are tested and linear support vector machine is used for the malignancy
prediction of lesions annotated in ultrasound images. At last, the
effectiveness of FIR is analyzed according to performance comparison. The
toolbox is online (https://github.com/NicoYuCN/matFIR). In our future work,
more FIR methods, feature selection methods and machine learning classifiers
will be integrated