200,204 research outputs found

    Getting the knowledge to the agent : the Rough Sets approach

    Get PDF
    For a query in a Research Information System (CRIS) to return adequate results, it is necessary that the system can “understand” the intention of the enquiring agent. One possible approach to guarantee the success of this communication is to create an intermediate module, responsible for the knowledge discovery processes, that can define concepts translatable in the languages used by the different agents involved in the use of a CRIS, enhance the queries, construct information over the available information and construct knowledge about the knowledge available in the CRIS and about its use. The Rough Set theory is a powerful tool that can set a path to achieve this goal. This paper describes in what way that is achievable, while describing the approach that is being followed by the portuguese CRIS, Degóis

    Obtaining Approximation with Data Cube using Map-Reduce

    Get PDF
    Data mining is a field that has an important contribution to data analysis, discovery of new meaningful knowledge, and autonomous decision making. Whereas, the rough set theory offers a viable approach for decision rule extraction from data. With the data cube we tried to put data in multidimensional way and accessed that data via map reduce. The adequate quantity or supply of data, coupled with the need for powerful data analysis tools, i.e. where data is rich but information is in poor situation. The proposed algorithm is been compared with other different rough set approximation approaches. Our algorithm to achieve approximation for decision rules has better performance. This proposed algorithm has been more efficient to obtain approximation. DOI: 10.17762/ijritcc2321-8169.150710

    A rough set approach for the discovery of classification rules in interval-valued information systems

    Get PDF
    A novel rough set approach is proposed in this paper to discover classification rules through a process of knowledge induction which selects optimal decision rules with a minimal set of features necessary and sufficient for classification of real-valued data. A rough set knowledge discovery framework is formulated for the analysis of interval-valued information systems converted from real-valued raw decision tables. The optimal feature selection method for information systems with interval-valued features obtains all classification rules hidden in a system through a knowledge induction process. Numerical examples are employed to substantiate the conceptual arguments

    Exploring the Boundary Region of Tolerance Rough Sets for Feature Selection

    Get PDF
    Of all of the challenges which face the effective application of computational intelli-gence technologies for pattern recognition, dataset dimensionality is undoubtedly one of the primary impediments. In order for pattern classifiers to be efficient, a dimensionality reduction stage is usually performed prior to classification. Much use has been made of Rough Set Theory for this purpose as it is completely data-driven and no other information is required; most other methods require some additional knowledge. However, traditional rough set-based methods in the literature are restricted to the requirement that all data must be discrete. It is therefore not possible to consider real-valued or noisy data. This is usually addressed by employing a discretisation method, which can result in information loss. This paper proposes a new approach based on the tolerance rough set model, which has the abil-ity to deal with real-valued data whilst simultaneously retaining dataset semantics. More significantly, this paper describes the underlying mechanism for this new approach to utilise the information contained within the boundary region or region of uncertainty. The use of this information can result in the discovery of more compact feature subsets and improved classification accuracy. These results are supported by an experimental evaluation which compares the proposed approach with a number of existing feature selection techniques. Key words: feature selection, attribute reduction, rough sets, classification

    A Rough Set Approach to Spatio-temporal Outlier Detection

    Get PDF
    Abstract. Detecting outliers which are grossly different from or inconsistent with the remaining spatio-temporal dataset is a major challenge in real-world knowledge discovery and data mining applications. In this paper, we deal with the outlier detection problem in spatio-temporal data and we describe a rough set approach that finds the top outliers in an unlabeled spatio-temporal dataset. The proposed method, called Rough Outlier Set Extraction (ROSE), relies on a rough set theoretic representation of the outlier set using the rough set approximations, i.e. lower and upper approximations. It is also introduced a new set, called Kernel set, a representative subset of the original dataset, significative to outlier detection. Experimental results on real world datasets demonstrate its superiority over results obtained by various clustering algorithms. It is also shown that the kernel set is able to detect the same outliers set but with such less computational time

    Predicting Accuracy of Income a Year Using Rough Set Theory

    Get PDF
    The main objective of the experiments is to predict the accuracy of Adult dataset whether the income exceeds 50Kperyearorbelow50K per year or below 50K. Specifically, the objectives are to determine the best discretization method, split factor, reduction method, classifier and to build the classification model. In the experiments, the prediction of accuracy of the Adult dataset is developed by using rough set theory and Rosetta software while Knowledge Data Discovery (KDD) is used as the methodology. The Adult dataset that had been used in the experiments is comprises of 48,842 instances but only 24,999 instances is used along the experiments. Then, the data was randomly split into training data and testing data by using nine splits factor, which are 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 and 0.9. The result obtained from the experiments showed that the best discretization method is Naive Algorithm, the best split factor is 0.6, the best reduction method is Johnson's Algorithm and the best classifier is Standard Voting. The highest percentage of accuracy achieved by the classification model developed using the rough set theory is 87.12%. The experiments showed that rough set theory is a useful approach to analyze the Adult dataset because the accuracy achieved in the experiments exceeds the previous methods that have been used before

    A Scalable and Effective Rough Set Theory based Approach for Big Data Pre-processing

    Get PDF
    International audienceA big challenge in the knowledge discovery process is to perform data pre-processing, specifically feature selection, on a large amount of data and high dimensional attribute set. A variety of techniques have been proposed in the literature to deal with this challenge with different degrees of success as most of these techniques need further information about the given input data for thresholding, need to specify noise levels or use some feature ranking procedures. To overcome these limitations, rough set theory (RST) can be used to discover the dependency within the data and reduce the number of attributes enclosed in an input data set while using the data alone and requiring no supplementary information. However, when it comes to massive data sets, RST reaches its limits as it is highly computationally expensive. In this paper, we propose a scalable and effective rough set theory-based approach for large-scale data pre-processing, specifically for feature selection, under the Spark framework. In our detailed experiments, data sets with up to 10,000 attributes have been considered, revealing that our proposed solution achieves a good speedup and performs its feature selection task well without sacrificing performance. Thus, making it relevant to big data

    Geographic Information Systems and Decision Processes for Urban Planning: A Case Study of Rough Set Analysis on the Residential Areas of the City of Cagliari, Italy

    Get PDF
    In Italy, urban planning is based on the city Masterplan. This plan identifies the future urban organization and a system of zoning rules. Land-use policies are based on these rules. The zoning rules should synthesize environmental and spatial knowledge and policy decisions concerning the possible futures, with reference to the different urban functions. In this essay, a procedure of analysis of the city Masterplan of Cagliari, the regional capital city of Sardinia (Italy), is discussed and applied. This procedure is referred to the residential areas. The procedure tries to explain the urban organization of the housing areas using a system of variables based on the integration of different branches of knowledge concerning the urban environment. The decisions on the urban futures that the zoning rules entail are critically analyzed in terms of consistency with this knowledge system. The procedure consists of two phases. In the first phase, the urban environment is analyzed and described. This is done by defining and developing a geographic information system. This system utilizes a spatial analysis approach to figure out the integration of the residential areas into the urban fabric. The second phase is inferential. Based on the geographic information system developed in the first phase, a knowledge discovery in databases (KDD) technique, the rough set analysis (RSA), is applied. This technique allows to recognize the connection patterns between the urban knowledge system and the city planning decisions. The patterns, the decision rules, which come from the RSA implementation are important starting points for further investigation on the development of decision models concerning urban planning.

    Novel Rule Base Development from IED-Resident Big Data for Protective Relay Analysis Expert System

    Get PDF
    Many Expert Systems for intelligent electronic device (IED) performance analyses such as those for protective relays have been developed to ascertain operations, maximize availability, and subsequently minimize misoperation risks. However, manual handling of overwhelming volume of relay resident big data and heavy dependence on the protection experts’ contrasting knowledge and inundating relay manuals have hindered the maintenance of the Expert Systems. Thus, the objective of this chapter is to study the design of an Expert System called Protective Relay Analysis System (PRAY), which is imbedded with a rule base construction module. This module is to provide the facility of intelligently maintaining the knowledge base of PRAY through the prior discovery of relay operations (association) rules from a novel integrated data mining approach of Rough-Set-Genetic-Algorithm-based rule discovery and Rule Quality Measure. The developed PRAY runs its relay analysis by, first, validating whether a protective relay under test operates correctly as expected by way of comparison between hypothesized and actual relay behavior. In the case of relay maloperations or misoperations, it diagnoses presented symptoms by identifying their causes. This study illustrates how, with the prior hybrid-data-mining-based knowledge base maintenance of an Expert System, regular and rigorous analyses of protective relay performances carried out by power utility entities can be conveniently achieved
    corecore