8 research outputs found
Value reducts and bireducts: A comparative study
In Rough Set Theory, the notion of bireduct allows to simultaneously reduce the sets of objects and attributes contained in a dataset. In addition, value reducts are used to remove some unnecessary values of certain attributes for a specific object. Therefore, the combination of both notions provides a higher reduction of unnecessary data. This paper is focused on the study of bireducts and value reducts of information and decision tables. We present theoretical results capturing different aspects about the relationship between bireducts and reducts, offering new insights at a conceptual level. We also analyze the relationship between bireducts and value reducts. The studied connections among these notions provide important profits for the efficient information analysis, as well as for the detection of unnecessary or redundant information
From fuzzy-rough to crisp feature selection
A central problem in machine learning and pattern recognition is the process of
recognizing the most important features in a dataset. This process plays a decisive
role in big data processing by reducing the size of datasets. One major drawback of
existing feature selection methods is the high chance of redundant features appearing
in the final subset, where in most cases, finding and removing them can greatly
improve the resulting classification accuracy. To tackle this problem on two different
fronts, we employed fuzzy-rough sets and perturbation theories. On one side, we used
three strategies to improve the performance of fuzzy-rough set-based feature selection
methods. The first strategy was to code both features and samples in one binary
vector and use a shuffled frog leaping algorithm to choose the best combination using
fuzzy dependency degree as the fitness function. In the second strategy, we designed
a measure to evaluate features based on fuzzy-rough dependency degree in a fashion
where redundant features are given less priority to be selected. In the last strategy,
we designed a new binary version of the shuffled frog leaping algorithm that employs a
fuzzy positive region as its similarity measure to work in complete harmony with the
fitness function (i.e. fuzzy-rough dependency degree). To extend the applicability of
fuzzy-rough set-based feature selection to multi-party medical datasets, we designed
a privacy-preserving version of the original method. In addition, we studied the
feasibility and applicability of perturbation theory to feature selection, which to the
best of our knowledge has never been researched. We introduced a new feature
selection based on perturbation theory that is not only capable of detecting and
discarding redundant features but also is very fast and flexible in accommodating the special needs of the application. It employs a clustering algorithm to group likely-behaved
features based on the sensitivity of each feature to perturbation, the angle of
each feature to the outcome and the effect of removing each feature to the outcome,
and it chooses the closest feature to the centre of each cluster and returns all those
features as the final subset. To assess the effectiveness of the proposed methods,
we compared the results of each method with well-known feature selection methods
against a series of artificially generated datasets, and biological, medical and cancer
datasets adopted from the University of California Irvine machine learning repository,
Arizona State University repository and Gene Expression Omnibus repository