190 research outputs found

    Attribute Set Weighting and Decomposition Approaches for Reduct Computation

    Get PDF
    This research is mainly in the Rough Set theory based knowledge reduction for data classification within the data mining framework. To facilitate the Rough Set based classification, two main knowledge reduction models are proposed. The first model is an approximate approach for object reducts computation used particularly for the data classification purposes. This approach emphasizes on assigning weights for each attribute in the attributes set. The weights give indication for the importance of an attribute to be considered in the reduct. This proposed approach is named Object Reduct by Attribute Weighting (ORAW). A variation of this approach is proposed to compute full reduct and named Full Reduct by Attribute Weighting (FRAW).The second proposed approach deals with large datasets particularly with large number of attributes. This approach utilizes the principle of incremental attribute set decomposition to generate an approximate reduct to represent the entire dataset. This proposed approach is termed for Reduct by Attribute Set Decomposition (RASD).The proposed reduct computation approaches are extensively experimented and evaluated. The evaluation is mainly in two folds: first is to evaluate the proposed approaches as Rough Set based methods where the classification accuracy is used as an evaluation measure. The well known IO-fold cross validation method is used to estimate the classification accuracy. The second fold is to evaluate the approaches as knowledge reduction methods where the size of the reduct is used as a reduction measure. The approaches are compared to other reduct computation methods and to other none Rough Set based classification methods. The proposed approaches are applied to various standard domains datasets from the UCI repository. The results of the experiments showed a very good performance for the proposed approaches as classification methods and as knowledge reduction methods. The accuracy of the ORAW approach outperformed the Johnson approach over all the datasets. It also produces better accuracy over the Exhaustive and the Standard Integer Programming (SIP) approaches for the majority of the datasets used in the experiments. For the RASD approach, it is compared to other classification methods and it shows very competitive results in term of classification accuracy and reducts size. As a conclusion, the proposed approaches have shown competitive and even better accuracy in most tested domains. The experiment results indicate that the proposed approaches as Rough classifiers give good performance across different classification problems and they can be promising methods in solving classification problems. Moreover, the experiments proved that the incremental vertical decomposition framework is an appealing method for knowledge reduction over large datasets within the framework of Rough Set based classification

    Identifying Effective Features and Classifiers for Short Term Rainfall Forecast Using Rough Sets Maximum Frequency Weighted Feature Reduction Technique

    Get PDF
    Precise rainfall forecasting is a common challenge across the globe in meteorological predictions. As rainfall forecasting involves rather complex dynamic parameters, an increasing demand for novel approaches to improve the forecasting accuracy has heightened. Recently, Rough Set Theory (RST) has attracted a wide variety of scientific applications and is extensively adopted in decision support systems. Although there are several weather prediction techniques in the existing literature, identifying significant input for modelling effective rainfall prediction is not addressed in the present mechanisms. Therefore, this investigation has examined the feasibility of using rough set based feature selection and data mining methods, namely Naïve Bayes (NB), Bayesian Logistic Regression (BLR), Multi-Layer Perceptron (MLP), J48, Classification and Regression Tree (CART), Random Forest (RF), and Support Vector Machine (SVM), to forecast rainfall. Feature selection or reduction process is a process of identifying a significant feature subset, in which the generated subset must characterize the information system as a complete feature set. This paper introduces a novel rough set based Maximum Frequency Weighted (MFW) feature reduction technique for finding an effective feature subset for modelling an efficient rainfall forecast system. The experimental analysis and the results indicate substantial improvements of prediction models when trained using the selected feature subset. CART and J48 classifiers have achieved an improved accuracy of 83.42% and 89.72%, respectively. From the experimental study, relative humidity2 (a4) and solar radiation (a6) have been identified as the effective parameters for modelling rainfall prediction

    Fuzzy rough granular neural networks, fuzzy granules, and classification

    Get PDF
    AbstractWe introduce a fuzzy rough granular neural network (FRGNN) model based on the multilayer perceptron using a back-propagation algorithm for the fuzzy classification of patterns. We provide the development strategy of the network mainly based upon the input vector, initial connection weights determined by fuzzy rough set theoretic concepts, and the target vector. While the input vector is described in terms of fuzzy granules, the target vector is defined in terms of fuzzy class membership values and zeros. Crude domain knowledge about the initial data is represented in the form of a decision table, which is divided into subtables corresponding to different classes. The data in each decision table is converted into granular form. The syntax of these decision tables automatically determines the appropriate number of hidden nodes, while the dependency factors from all the decision tables are used as initial weights. The dependency factor of each attribute and the average degree of the dependency factor of all the attributes with respect to decision classes are considered as initial connection weights between the nodes of the input layer and the hidden layer, and the hidden layer and the output layer, respectively. The effectiveness of the proposed FRGNN is demonstrated on several real-life data sets

    A combined data mining approach using rough set theory and case-based reasoning in medical datasets

    Get PDF
    Case-based reasoning (CBR) is the process of solving new cases by retrieving the most relevant ones from an existing knowledge-base. Since, irrelevant or redundant features not only remarkably increase memory requirements but also the time complexity of the case retrieval, reducing the number of dimensions is an issue worth considering. This paper uses rough set theory (RST) in order to reduce the number of dimensions in a CBR classifier with the aim of increasing accuracy and efficiency. CBR exploits a distance based co-occurrence of categorical data to measure similarity of cases. This distance is based on the proportional distribution of different categorical values of features. The weight used for a feature is the average of co-occurrence values of the features. The combination of RST and CBR has been applied to real categorical datasets of Wisconsin Breast Cancer, Lymphography, and Primary cancer. The 5-fold cross validation method is used to evaluate the performance of the proposed approach. The results show that this combined approach lowers computational costs and improves performance metrics including accuracy and interpretability compared to other approaches developed in the literature

    Adaptive quick reduct for feature drift detection

    Get PDF
    Data streams are ubiquitous and related to the proliferation of low-cost mobile devices, sensors, wireless networks and the Internet of Things. While it is well known that complex phenomena are not stationary and exhibit a concept drift when observed for a sufficiently long time, relatively few studies have addressed the related problem of feature drift. In this paper, a variation of the QuickReduct algorithm suitable to process data streams is proposed and tested: it builds an evolving reduct that dynamically selects the relevant features in the stream, removing the redundant ones and adding the newly relevant ones as soon as they become such. Tests on five publicly available datasets with an artificially injected drift have confirmed the effectiveness of the proposed method
    corecore