8 research outputs found

    Value reducts and bireducts: A comparative study

    Get PDF
    In Rough Set Theory, the notion of bireduct allows to simultaneously reduce the sets of objects and attributes contained in a dataset. In addition, value reducts are used to remove some unnecessary values of certain attributes for a specific object. Therefore, the combination of both notions provides a higher reduction of unnecessary data. This paper is focused on the study of bireducts and value reducts of information and decision tables. We present theoretical results capturing different aspects about the relationship between bireducts and reducts, offering new insights at a conceptual level. We also analyze the relationship between bireducts and value reducts. The studied connections among these notions provide important profits for the efficient information analysis, as well as for the detection of unnecessary or redundant information

    Fuzzy rough and evolutionary approaches to instance selection

    Get PDF

    From fuzzy-rough to crisp feature selection

    Get PDF
    A central problem in machine learning and pattern recognition is the process of recognizing the most important features in a dataset. This process plays a decisive role in big data processing by reducing the size of datasets. One major drawback of existing feature selection methods is the high chance of redundant features appearing in the final subset, where in most cases, finding and removing them can greatly improve the resulting classification accuracy. To tackle this problem on two different fronts, we employed fuzzy-rough sets and perturbation theories. On one side, we used three strategies to improve the performance of fuzzy-rough set-based feature selection methods. The first strategy was to code both features and samples in one binary vector and use a shuffled frog leaping algorithm to choose the best combination using fuzzy dependency degree as the fitness function. In the second strategy, we designed a measure to evaluate features based on fuzzy-rough dependency degree in a fashion where redundant features are given less priority to be selected. In the last strategy, we designed a new binary version of the shuffled frog leaping algorithm that employs a fuzzy positive region as its similarity measure to work in complete harmony with the fitness function (i.e. fuzzy-rough dependency degree). To extend the applicability of fuzzy-rough set-based feature selection to multi-party medical datasets, we designed a privacy-preserving version of the original method. In addition, we studied the feasibility and applicability of perturbation theory to feature selection, which to the best of our knowledge has never been researched. We introduced a new feature selection based on perturbation theory that is not only capable of detecting and discarding redundant features but also is very fast and flexible in accommodating the special needs of the application. It employs a clustering algorithm to group likely-behaved features based on the sensitivity of each feature to perturbation, the angle of each feature to the outcome and the effect of removing each feature to the outcome, and it chooses the closest feature to the centre of each cluster and returns all those features as the final subset. To assess the effectiveness of the proposed methods, we compared the results of each method with well-known feature selection methods against a series of artificially generated datasets, and biological, medical and cancer datasets adopted from the University of California Irvine machine learning repository, Arizona State University repository and Gene Expression Omnibus repository

    Dealing with imbalanced and weakly labelled data in machine learning using fuzzy and rough set methods

    Get PDF
    corecore