5,346 research outputs found

    Two kinds of average approximation accuracy

    Get PDF
    Rough set theory places great importance on approximation accuracy, which is used to gauge how well a rough set model describes a target concept. However, traditional approximation accuracy has limitations since it varies with changes in the target concept and cannot evaluate the overall descriptive ability of a rough set model. To overcome this, two types of average approximation accuracy that objectively assess a rough set model’s ability to approximate all information granules is proposed. The first is the relative average approximation accuracy, which is based on all sets in the universe and has several basic properties. The second is the absolute average approximation accuracy, which is based on undefinable sets and has yielded significant conclusions. We also explore the relationship between these two types of average approximation accuracy. Finally, the average approximation accuracy has practical applications in addressing missing attribute values in incomplete information tables

    Computing fuzzy rough approximations in large scale information systems

    Get PDF
    Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory groups objects together based on the indiscernibility of their attribute values. Fuzzy rough set theory extends rough set theory to data with continuous attributes, and detects degrees of inconsistency in the data. Key to this is turning the indiscernibility relation into a gradual relation, acknowledging that objects can be similar to a certain extent. In very large datasets with millions of objects, computing the gradual indiscernibility relation (or in other words, the soft granules) is very demanding, both in terms of runtime and in terms of memory. It is however required for the computation of the lower and upper approximations of concepts in the fuzzy rough set analysis pipeline. Current non-distributed implementations in R are limited by memory capacity. For example, we found that a state of the art non-distributed implementation in R could not handle 30,000 rows and 10 attributes on a node with 62GB of memory. This is clearly insufficient to scale fuzzy rough set analysis to massive datasets. In this paper we present a parallel and distributed solution based on Message Passing Interface (MPI) to compute fuzzy rough approximations in very large information systems. Our results show that our parallel approach scales with problem size to information systems with millions of objects. To the best of our knowledge, no other parallel and distributed solutions have been proposed so far in the literature for this problem

    Granular Partition and Concept Lattice Division Based on Quotient Space

    Get PDF
    In this paper, we investigate the relationship between the concept lattice and quotient space by granularity. A new framework of knowledge representation - granular quotient space - is constructed and it demonstrates that concept lattice classing is linked to quotient space. The covering of the formal context is firstly given based on this granule, then the granular concept lattice model and its construction are discussed on the sub-context which is formed by the granular classification set. We analyze knowledge reduction and give the description of granular entropy techniques, including some novel formulas. Lastly, a concept lattice constructing algorithm is proposed based on multi-granular feature selection in quotient space. Examples and experiments show that the algorithm can obtain a minimal reduct and is much more efficient than classical incremental concept formation methods

    On partial defaults in portfolio credit risk: Comparing economic and regulatory view

    Get PDF
    Most credit portfolio models calculate the loss distribution of a portfolio consisting solely of performing counterparts. We develop two models that account for defaulted counterparts in the calculation of the economic capital. First, we model the portfolio of non-performing counterparts standalone. The second approach derives the integrated loss distribution for the non-performing and the performing portfolio. Both calculations are supplemented by formulae for contributions of the single counterpart to the economic capital. Calibrating the models allows for an impact study and a comparison with Basel II. --

    A GIS-based multi-criteria evaluation framework for uncertainty reduction in earthquake disaster management using granular computing

    Get PDF
    One of the most important steps in earthquake disaster management is the prediction of probable damages which is called earthquake vulnerability assessment. Earthquake vulnerability assessment is a multicriteria problem and a number of multi-criteria decision making models have been proposed for the problem. Two main sources of uncertainty including uncertainty associated with experts‘ point of views and the one associated with attribute values exist in the earthquake vulnerability assessment problem. If the uncertainty in these two sources is not handled properly the resulted seismic vulnerability map will be unreliable. The main objective of this research is to propose a reliable model for earthquake vulnerability assessment which is able to manage the uncertainty associated with the experts‘ opinions. Granular Computing (GrC) is able to extract a set of if-then rules with minimum incompatibility from an information table. An integration of Dempster-Shafer Theory (DST) and GrC is applied in the current research to minimize the entropy in experts‘ opinions. The accuracy of the model based on the integration of the DST and GrC is 83%, while the accuracy of the single-expert model is 62% which indicates the importance of uncertainty management in seismic vulnerability assessment problem. Due to limited accessibility to current data, only six criteria are used in this model. However, the model is able to take into account both qualitative and quantitative criteria

    Concept learning consistency under three‑way decision paradigm

    Get PDF
    Concept Mining is one of the main challenges both in Cognitive Computing and in Machine Learning. The ongoing improvement of solutions to address this issue raises the need to analyze whether the consistency of the learning process is preserved. This paper addresses a particular problem, namely, how the concept mining capability changes under the reconsideration of the hypothesis class. The issue will be raised from the point of view of the so-called Three-Way Decision (3WD) paradigm. The paradigm provides a sound framework to reconsider decision-making processes, including those assisted by Machine Learning. Thus, the paper aims to analyze the influence of 3WD techniques in the Concept Learning Process itself. For this purpose, we introduce new versions of the Vapnik-Chervonenkis dimension. Likewise, to illustrate how the formal approach can be instantiated in a particular model, the case of concept learning in (Fuzzy) Formal Concept Analysis is considered.This work is supported by State Investigation Agency (Agencia Estatal de Investigación), project PID2019-109152GB-100/AEI/10.13039/501100011033. We acknowledge the reviewers for their suggestions and guidance on additional references that have enriched our paper. Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature

    Positive region: An enhancement of partitioning attribute based rough set for categorical data

    Get PDF
    Datasets containing multi-value attributes are often involved in several domains, like pattern recognition, machine learning and data mining. Data partition is required in such cases. Partitioning attributes is the clustering process for the whole data set which is specified for further processing. Recently, there are already existing prominent rough set-based approaches available for group objects and for handling uncertainty data that use indiscernibility attribute and mean roughness measure to perform attribute partitioning. Nevertheless, most of the partitioning attribute methods for selecting partitioning attribute algorithm for categorical data in clustering datasets are incapable of optimal partitioning. This indiscernibility and mean roughness measures, however, require the calculation of the lower approximation, which has less accuracy and it is an expensive task to compute. This reduces the growth of the set of attributes and neglects the data found within the boundary region. This paper presents a new concept called the "Positive Region Based Mean Dependency (PRD)”, that calculates the attribute dependency. In order to determine the mean dependency of the attributes, that is acceptable for categorical datasets, using a positive region-based mean dependency measure, PRD defines the method. By avoiding the lower approximation, PRD is an optimal substitute for the conventional dependency measure in partitioning attribute selection. Contrary to traditional RST partitioning methods, the proposed method can be employed as a measure of data output uncertainty and as a tailback for larger and multiple data clustering. The performance of the method presented is evaluated and compared with the algorithmes of Information-Theoretical Dependence Roughness (ITDR) and Maximum Indiscernible Attribute (MIA)

    Rough set and rule-based multicriteria decision aiding

    Get PDF
    The aim of multicriteria decision aiding is to give the decision maker a recommendation concerning a set of objects evaluated from multiple points of view called criteria. Since a rational decision maker acts with respect to his/her value system, in order to recommend the most-preferred decision, one must identify decision maker's preferences. In this paper, we focus on preference discovery from data concerning some past decisions of the decision maker. We consider the preference model in the form of a set of "if..., then..." decision rules discovered from the data by inductive learning. To structure the data prior to induction of rules, we use the Dominance-based Rough Set Approach (DRSA). DRSA is a methodology for reasoning about data, which handles ordinal evaluations of objects on considered criteria and monotonic relationships between these evaluations and the decision. We review applications of DRSA to a large variety of multicriteria decision problems
    corecore