4,745 research outputs found
Determinants of Long-term Economic Development: An Empirical Cross-country Study Involving Rough Sets Theory and Rule Induction
Empirical findings on determinants of long-term economic growth are numerous, sometimes inconsistent, highly exciting and still incomplete. The empirical analysis was almost exclusively carried out by standard econometrics. This study compares results gained by cross-country regressions as reported in the literature with those gained by the rough sets theory and rule induction. The main advantages of using rough sets are being able to classify classes and to discretize. Thus, we do not have to deal with distributional, independence, (log-)linearity, and many other assumptions, but can keep the data as they are. The main difference between regression results and rough sets is that most education and human capital indicators can be labeled as robust attributes. In addition, we find that political indicators enter in a non-linear fashion with respect to growth.Economic growth, Rough sets, Rule induction
Computing fuzzy rough approximations in large scale information systems
Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory groups objects together based on the indiscernibility of their attribute values. Fuzzy rough set theory extends rough set theory to data with continuous attributes, and detects degrees of inconsistency in the data. Key to this is turning the indiscernibility relation into a gradual relation, acknowledging that objects can be similar to a certain extent. In very large datasets with millions of objects, computing the gradual indiscernibility relation (or in other words, the soft granules) is very demanding, both in terms of runtime and in terms of memory. It is however required for the computation of the lower and upper approximations of concepts in the fuzzy rough set analysis pipeline. Current non-distributed implementations in R are limited by memory capacity. For example, we found that a state of the art non-distributed implementation in R could not handle 30,000 rows and 10 attributes on a node with 62GB of memory. This is clearly insufficient to scale fuzzy rough set analysis to massive datasets. In this paper we present a parallel and distributed solution based on Message Passing Interface (MPI) to compute fuzzy rough approximations in very large information systems. Our results show that our parallel approach scales with problem size to information systems with millions of objects. To the best of our knowledge, no other parallel and distributed solutions have been proposed so far in the literature for this problem
The Development of the Generalization Algorithm Based on the Rough Set Theory
This paper considers the problem of concept generalization in decision-making systems where such
features of real-world databases as large size, incompleteness and inconsistence of the stored information are
taken into account. The methods of the rough set theory (like lower and upper approximations, positive regions
and reducts) are used for the solving of this problem. The new discretization algorithm of the continuous attributes
is proposed. It essentially increases an overall performance of generalization algorithms and can be applied to
processing of real value attributes in large data tables. Also the search algorithm of the significant attributes
combined with a stage of discretization is developed. It allows avoiding splitting of continuous domains of
insignificant attributes into intervals
The reduction subset based on rough sets applied to texture classification
The rough set is a new mathematical approach to imprecision, vagueness and uncertainty. The concept of reduction of the decision table based on the rough sets is very useful for feature selection. The paper describes an application of rough sets method to feature selection and reduction in texture images recognition. The methods applied include continuous data discretization based on Fuzzy c-means and, and rough set method for feature selection and reduction. The trees extractions in the aerial images were applied. The experiments show that the methods presented in this paper are practical and effective.<br /
Improved Heterogeneous Distance Functions
Instance-based learning techniques typically handle continuous and linear
input values well, but often do not handle nominal input attributes
appropriately. The Value Difference Metric (VDM) was designed to find
reasonable distance values between nominal attribute values, but it largely
ignores continuous attributes, requiring discretization to map continuous
values into nominal values. This paper proposes three new heterogeneous
distance functions, called the Heterogeneous Value Difference Metric (HVDM),
the Interpolated Value Difference Metric (IVDM), and the Windowed Value
Difference Metric (WVDM). These new distance functions are designed to handle
applications with nominal attributes, continuous attributes, or both. In
experiments on 48 applications the new distance metrics achieve higher
classification accuracy on average than three previous distance functions on
those datasets that have both nominal and continuous attributes.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
- …