4,804 research outputs found

    Attribute Reduction for Credit Evaluation using Rough Set Approach

    Get PDF
    Generation of an Integrated Model is an important technique in the research area. It is a powerful technique to improve the accuracy of classifiers. This approach has been applied to different types of real time data. The unprocessed data leads to give wrong results by using some of the machine learning techniques. For generation of an integrated model attribute reduction and re-sampling technique is necessary. For attribute reduction Rough set is the best approach as it requires less execution time, high Interpretability, high reduction rate and high accurac

    New Learning Models for Generating Classification Rules Based on Rough Set Approach

    Get PDF
    Data sets, static or dynamic, are very important and useful for presenting real life features in different aspects of industry, medicine, economy, and others. Recently, different models were used to generate knowledge from vague and uncertain data sets such as induction decision tree, neural network, fuzzy logic, genetic algorithm, rough set theory, and others. All of these models take long time to learn for a huge and dynamic data set. Thus, the challenge is how to develop an efficient model that can decrease the learning time without affecting the quality of the generated classification rules. Huge information systems or data sets usually have some missing values due to unavailable data that affect the quality of the generated classification rules. Missing values lead to the difficulty of extracting useful information from that data set. Another challenge is how to solve the problem of missing data. Rough set theory is a new mathematical tool to deal with vagueness and uncertainty. It is a useful approach for uncovering classificatory knowledge and building a classification rules. So, the application of the theory as part of the learning models was proposed in this thesis. Two different models for learning in data sets were proposed based on two different reduction algorithms. The split-condition-merge-reduct algorithm ( SCMR) was performed on three different modules: partitioning the data set vertically into subsets, applying rough set concepts of reduction to each subset, and merging the reducts of all subsets to form the best reduct. The enhanced-split-condition-merge-reduct algorithm (E SCMR) was performed on the above three modules followed by another module that applies the rough set reduction concept again to the reduct generated by SCMR in order to generate the best reduct, which plays the same role as if all attributes in this subset existed. Classification rules were generated based on the best reduct. For the problem of missing data, a new approach was proposed based on data partitioning and function mode. In this new approach, the data set was partitioned horizontally into different subsets. All objects in each subset of data were described by only one classification value. The mode function was applied to each subset of data that has missing values in order to find the most frequently occurring value in each attribute. Missing values in that attribute were replaced by the mode value. The proposed approach for missing values produced better results compared to other approaches. Also, the proposed models for learning in data sets generated the classification rules faster than other methods. The accuracy of the classification rules by the proposed models was high compared to other models

    A Scalable and Effective Rough Set Theory based Approach for Big Data Pre-processing

    Get PDF
    International audienceA big challenge in the knowledge discovery process is to perform data pre-processing, specifically feature selection, on a large amount of data and high dimensional attribute set. A variety of techniques have been proposed in the literature to deal with this challenge with different degrees of success as most of these techniques need further information about the given input data for thresholding, need to specify noise levels or use some feature ranking procedures. To overcome these limitations, rough set theory (RST) can be used to discover the dependency within the data and reduce the number of attributes enclosed in an input data set while using the data alone and requiring no supplementary information. However, when it comes to massive data sets, RST reaches its limits as it is highly computationally expensive. In this paper, we propose a scalable and effective rough set theory-based approach for large-scale data pre-processing, specifically for feature selection, under the Spark framework. In our detailed experiments, data sets with up to 10,000 attributes have been considered, revealing that our proposed solution achieves a good speedup and performs its feature selection task well without sacrificing performance. Thus, making it relevant to big data

    Interval-Valued Neutrosophic Soft Rough Sets

    Get PDF

    Cross-sectoral resource management: How forest management alternatives affect the provision of biomass and other ecosystem services

    Get PDF
    Integrated forest management is faced with the challenge that the contribution of forests to economic and ecological planning targets must be assessed in a socio-ecological system context. This paper introduces a way to model spatio-temporal dynamics of biomass production at a regional scale in order to derive land use strategies that enhance biomass provision and avoid trade-offs for other ecosystem services. The software platform GISCAME was employed to bridge the gap between local land management decisions and regional planning by linking growth and yield models with an integrative mesoscale modeling and assessment approach. The model region is located in Saxony, Germany. Five scenarios were simulated, which aimed at testing different alternatives for adapted land use in the context of climate change and increasing biomass demand. The results showed, for example, that forest conversion towards climate-change-adapted forest types had positive effects on ecological integrity and landscape aesthetics. In contrast, negative impacts on landscape aesthetics must be expected if agricultural sites were converted into short rotation coppices. Uncertainties with stem from assumptions regarding growth and yield models were discussed. Future developmental steps which consider, for example, accessibility of the resources were identified

    Feature Grouping-based Feature Selection

    Get PDF

    Mobile analytics database summarization using rough set

    Get PDF
    The mobile device is a device that supports the mobility activities and more portable. However, mobile devices have the limited resources and storage capacity. This deficiency should be considered in order to maximize the functionality of this mobile device. Hence, this study provides a formulation in data management to support a process of storing data with large scale by using Rough Set approach to select the data with relevant and useful information. Additionally, the features are combining analytics method to complete analysis of the data storage processing, making users more easily understand how to read the analysis results. Testing is done by utilizing data from the Malaysia’s Open Government Data about Air Pollutant Index (API) to determine the condition of the air pollution level to the health and safety of the population. The testing has successfully created a summary of the API data with the Rough Set approach to select significant data from the main database based on generated rule. The analysis results of the selected API data are stored as a mobile database and presented in the chart intended to make the data meaningful and easier to understand the analysis results of API conditions using the mobile device
    corecore