14,348 research outputs found

    Computing fuzzy rough approximations in large scale information systems

    Get PDF
    Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory groups objects together based on the indiscernibility of their attribute values. Fuzzy rough set theory extends rough set theory to data with continuous attributes, and detects degrees of inconsistency in the data. Key to this is turning the indiscernibility relation into a gradual relation, acknowledging that objects can be similar to a certain extent. In very large datasets with millions of objects, computing the gradual indiscernibility relation (or in other words, the soft granules) is very demanding, both in terms of runtime and in terms of memory. It is however required for the computation of the lower and upper approximations of concepts in the fuzzy rough set analysis pipeline. Current non-distributed implementations in R are limited by memory capacity. For example, we found that a state of the art non-distributed implementation in R could not handle 30,000 rows and 10 attributes on a node with 62GB of memory. This is clearly insufficient to scale fuzzy rough set analysis to massive datasets. In this paper we present a parallel and distributed solution based on Message Passing Interface (MPI) to compute fuzzy rough approximations in very large information systems. Our results show that our parallel approach scales with problem size to information systems with millions of objects. To the best of our knowledge, no other parallel and distributed solutions have been proposed so far in the literature for this problem

    Decision Analysis via Granulation Based on General Binary Relation

    Get PDF
    Decision theory considers how best to make decisions in the light of uncertainty about data. There are several methodologies that may be used to determine the best decision. In rough set theory, the classification of objects according to approximation operators can be fitted into the Bayesian decision-theoretic model, with respect to three regions (positive, negative, and boundary region). Granulation using equivalence classes is a restriction that limits the decision makers. In this paper, we introduce a generalization and modification of decision-theoretic rough set model by using granular computing on general binary relations. We obtain two new types of approximation that enable us to classify the objects into five regions instead of three regions. The classification of decision region into five areas will enlarge the range of choice for decision makers

    Characteristic of partition-circuit matroid through approximation number

    Full text link
    Rough set theory is a useful tool to deal with uncertain, granular and incomplete knowledge in information systems. And it is based on equivalence relations or partitions. Matroid theory is a structure that generalizes linear independence in vector spaces, and has a variety of applications in many fields. In this paper, we propose a new type of matroids, namely, partition-circuit matroids, which are induced by partitions. Firstly, a partition satisfies circuit axioms in matroid theory, then it can induce a matroid which is called a partition-circuit matroid. A partition and an equivalence relation on the same universe are one-to-one corresponding, then some characteristics of partition-circuit matroids are studied through rough sets. Secondly, similar to the upper approximation number which is proposed by Wang and Zhu, we define the lower approximation number. Some characteristics of partition-circuit matroids and the dual matroids of them are investigated through the lower approximation number and the upper approximation number.Comment: 12 page

    A Detailed Study of the Distributed Rough Set Based Locality Sensitive Hashing Feature Selection Technique

    Get PDF
    International audienceIn the context of big data, granular computing has recently been implemented by some mathematical tools, especially Rough Set Theory (RST). As a key topic of rough set theory, feature selection has been investigated to adapt the related granular concepts of RST to deal with large amounts of data, leading to the development of the distributed RST version. However, despite of its scalability, the distributed RST version faces a key challenge tied to the partitioning of the feature search space in the distributed environment while guaranteeing data dependency. Therefore, in this manuscript, we propose a new distributed RST version based on Locality Sensitive Hashing (LSH), named LSH-dRST, for big data feature selection. LSH-dRST uses LSH to match similar features into the same bucket and maps the generated buckets into partitions to enable the splitting of the universe in a more efficient way. More precisely, in this paper, we perform a detailed analysis of the performance of LSH-dRST by comparing it to the standard distributed RST version, which is based on a random partitioning of the universe. We demonstrate that our LSH-dRST is scalable when dealing with large amounts of data. We also demonstrate * This work is part of a project that has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 702527. 2 Z. Chelly Dagdia, C. Zarges / LSH-RST for an Efficient Big Data Pre-processing that LSH-dRST ensures the partitioning of the high dimensional feature search space in a more reliable way; hence better preserving data dependency in the distributed environment and ensuring a lower computational cost

    Parametric matroid of rough set

    Full text link
    Rough set is mainly concerned with the approximations of objects through an equivalence relation on a universe. Matroid is a combinatorial generalization of linear independence in vector spaces. In this paper, we define a parametric set family, with any subset of a universe as its parameter, to connect rough sets and matroids. On the one hand, for a universe and an equivalence relation on the universe, a parametric set family is defined through the lower approximation operator. This parametric set family is proved to satisfy the independent set axiom of matroids, therefore it can generate a matroid, called a parametric matroid of the rough set. Three equivalent representations of the parametric set family are obtained. Moreover, the parametric matroid of the rough set is proved to be the direct sum of a partition-circuit matroid and a free matroid. On the other hand, since partition-circuit matroids were well studied through the lower approximation number, we use it to investigate the parametric matroid of the rough set. Several characteristics of the parametric matroid of the rough set, such as independent sets, bases, circuits, the rank function and the closure operator, are expressed by the lower approximation number.Comment: 15 page
    corecore