14,438 research outputs found

    A comprehensive study of implicator-conjunctor based and noise-tolerant fuzzy rough sets: definitions, properties and robustness analysis

    Get PDF
    © 2014 Elsevier B.V. Both rough and fuzzy set theories offer interesting tools for dealing with imperfect data: while the former allows us to work with uncertain and incomplete information, the latter provides a formal setting for vague concepts. The two theories are highly compatible, and since the late 1980s many researchers have studied their hybridization. In this paper, we critically evaluate most relevant fuzzy rough set models proposed in the literature. To this end, we establish a formally correct and unified mathematical framework for them. Both implicator-conjunctor-based definitions and noise-tolerant models are studied. We evaluate these models on two different fronts: firstly, we discuss which properties of the original rough set model can be maintained and secondly, we examine how robust they are against both class and attribute noise. By highlighting the benefits and drawbacks of the different fuzzy rough set models, this study appears a necessary first step to propose and develop new models in future research.Lynn D’eer has been supported by the Ghent University Special Research Fund, Chris Cornelis was partially supported by the Spanish Ministry of Science and Technology under the project TIN2011-28488 and the Andalusian Research Plans P11-TIC-7765 and P10-TIC-6858, and by project PYR-2014-8 of the Genil Program of CEI BioTic GRANADA and Lluis Godo has been partially supported by the Spanish MINECO project EdeTRI TIN2012-39348-C02-01Peer Reviewe

    Computing fuzzy rough approximations in large scale information systems

    Get PDF
    Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory groups objects together based on the indiscernibility of their attribute values. Fuzzy rough set theory extends rough set theory to data with continuous attributes, and detects degrees of inconsistency in the data. Key to this is turning the indiscernibility relation into a gradual relation, acknowledging that objects can be similar to a certain extent. In very large datasets with millions of objects, computing the gradual indiscernibility relation (or in other words, the soft granules) is very demanding, both in terms of runtime and in terms of memory. It is however required for the computation of the lower and upper approximations of concepts in the fuzzy rough set analysis pipeline. Current non-distributed implementations in R are limited by memory capacity. For example, we found that a state of the art non-distributed implementation in R could not handle 30,000 rows and 10 attributes on a node with 62GB of memory. This is clearly insufficient to scale fuzzy rough set analysis to massive datasets. In this paper we present a parallel and distributed solution based on Message Passing Interface (MPI) to compute fuzzy rough approximations in very large information systems. Our results show that our parallel approach scales with problem size to information systems with millions of objects. To the best of our knowledge, no other parallel and distributed solutions have been proposed so far in the literature for this problem

    Taming Wild High Dimensional Text Data with a Fuzzy Lash

    Full text link
    The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on a new basis to represent BOW. The recent increase of text data and its challenges imply that DR area still needs new perspectives. Although a wide range of methods based on the UFT strategy has been developed, the fuzzy approach has not been considered for DR based on this strategy. This research investigates the application of fuzzy clustering as a DR method based on the UFT strategy to collapse BOW matrix to provide a lower-dimensional representation of documents instead of the words in a corpus. The quantitative evaluation shows that fuzzy clustering produces superior performance and features to Principal Components Analysis (PCA) and Singular Value Decomposition (SVD), two popular DR methods based on the UFT strategy
    • …
    corecore