7,616 research outputs found

    A comprehensive study of implicator-conjunctor based and noise-tolerant fuzzy rough sets: definitions, properties and robustness analysis

    Get PDF
    © 2014 Elsevier B.V. Both rough and fuzzy set theories offer interesting tools for dealing with imperfect data: while the former allows us to work with uncertain and incomplete information, the latter provides a formal setting for vague concepts. The two theories are highly compatible, and since the late 1980s many researchers have studied their hybridization. In this paper, we critically evaluate most relevant fuzzy rough set models proposed in the literature. To this end, we establish a formally correct and unified mathematical framework for them. Both implicator-conjunctor-based definitions and noise-tolerant models are studied. We evaluate these models on two different fronts: firstly, we discuss which properties of the original rough set model can be maintained and secondly, we examine how robust they are against both class and attribute noise. By highlighting the benefits and drawbacks of the different fuzzy rough set models, this study appears a necessary first step to propose and develop new models in future research.Lynn D’eer has been supported by the Ghent University Special Research Fund, Chris Cornelis was partially supported by the Spanish Ministry of Science and Technology under the project TIN2011-28488 and the Andalusian Research Plans P11-TIC-7765 and P10-TIC-6858, and by project PYR-2014-8 of the Genil Program of CEI BioTic GRANADA and Lluis Godo has been partially supported by the Spanish MINECO project EdeTRI TIN2012-39348-C02-01Peer Reviewe

    Data granulation by the principles of uncertainty

    Full text link
    Researches in granular modeling produced a variety of mathematical models, such as intervals, (higher-order) fuzzy sets, rough sets, and shadowed sets, which are all suitable to characterize the so-called information granules. Modeling of the input data uncertainty is recognized as a crucial aspect in information granulation. Moreover, the uncertainty is a well-studied concept in many mathematical settings, such as those of probability theory, fuzzy set theory, and possibility theory. This fact suggests that an appropriate quantification of the uncertainty expressed by the information granule model could be used to define an invariant property, to be exploited in practical situations of information granulation. In this perspective, a procedure of information granulation is effective if the uncertainty conveyed by the synthesized information granule is in a monotonically increasing relation with the uncertainty of the input data. In this paper, we present a data granulation framework that elaborates over the principles of uncertainty introduced by Klir. Being the uncertainty a mesoscopic descriptor of systems and data, it is possible to apply such principles regardless of the input data type and the specific mathematical setting adopted for the information granules. The proposed framework is conceived (i) to offer a guideline for the synthesis of information granules and (ii) to build a groundwork to compare and quantitatively judge over different data granulation procedures. To provide a suitable case study, we introduce a new data granulation technique based on the minimum sum of distances, which is designed to generate type-2 fuzzy sets. We analyze the procedure by performing different experiments on two distinct data types: feature vectors and labeled graphs. Results show that the uncertainty of the input data is suitably conveyed by the generated type-2 fuzzy set models.Comment: 16 pages, 9 figures, 52 reference

    Computing fuzzy rough approximations in large scale information systems

    Get PDF
    Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory groups objects together based on the indiscernibility of their attribute values. Fuzzy rough set theory extends rough set theory to data with continuous attributes, and detects degrees of inconsistency in the data. Key to this is turning the indiscernibility relation into a gradual relation, acknowledging that objects can be similar to a certain extent. In very large datasets with millions of objects, computing the gradual indiscernibility relation (or in other words, the soft granules) is very demanding, both in terms of runtime and in terms of memory. It is however required for the computation of the lower and upper approximations of concepts in the fuzzy rough set analysis pipeline. Current non-distributed implementations in R are limited by memory capacity. For example, we found that a state of the art non-distributed implementation in R could not handle 30,000 rows and 10 attributes on a node with 62GB of memory. This is clearly insufficient to scale fuzzy rough set analysis to massive datasets. In this paper we present a parallel and distributed solution based on Message Passing Interface (MPI) to compute fuzzy rough approximations in very large information systems. Our results show that our parallel approach scales with problem size to information systems with millions of objects. To the best of our knowledge, no other parallel and distributed solutions have been proposed so far in the literature for this problem

    Scalable approximate FRNN-OWA classification

    Get PDF
    Fuzzy Rough Nearest Neighbour classification with Ordered Weighted Averaging operators (FRNN-OWA) is an algorithm that classifies unseen instances according to their membership in the fuzzy upper and lower approximations of the decision classes. Previous research has shown that the use of OWA operators increases the robustness of this model. However, calculating membership in an approximation requires a nearest neighbour search. In practice, the query time complexity of exact nearest neighbour search algorithms in more than a handful of dimensions is near-linear, which limits the scalability of FRNN-OWA. Therefore, we propose approximate FRNN-OWA, a modified model that calculates upper and lower approximations of decision classes using the approximate nearest neighbours returned by Hierarchical Navigable Small Worlds (HNSW), a recent approximative nearest neighbour search algorithm with logarithmic query time complexity at constant near-100% accuracy. We demonstrate that approximate FRNN-OWA is sufficiently robust to match the classification accuracy of exact FRNN-OWA while scaling much more efficiently. We test four parameter configurations of HNSW, and evaluate their performance by measuring classification accuracy and construction and query times for samples of various sizes from three large datasets. We find that with two of the parameter configurations, approximate FRNN-OWA achieves near-identical accuracy to exact FRNN-OWA for most sample sizes within query times that are up to several orders of magnitude faster
    • …
    corecore