8,813 research outputs found

    A heuristic method for discovering biomarker candidates based on rough set theory

    Get PDF
    We apply a combined method of heuristic attribute reduction and evaluation of relative reducts in rough set theory to gene expression data analysis. Our method extracts as many relative reducts as possible from the gene-expression data and selects the best relative reduct from the viewpoint of constructing useful decision rules. Using a breast cancer dataset and a leukemia dataset, we evaluated the classification accuracy for the test samples and biological meanings of the rules. As a result, our method presented superior classification accuracy comparable to existing salient classifiers. Moreover, our method extracted interesting rules including a novel biomarker gene identified in recent studies. These results indicate the possibility that our method can serve as a useful tool for gene expression data analysis

    A breast cancer diagnosis system: a combined approach using rough sets and probabilistic neural networks

    Get PDF
    In this paper, we present a medical decision support system based on a hybrid approach utilising rough sets and a probabilistic neural network. We utilised the ability of rough sets to perform dimensionality reduction to eliminate redundant attributes from a biomedical dataset. We then utilised a probabilistic neural network to perform supervised classification. Our results indicate that rough sets was able to reduce the number of attributes in the dataset by 67% without sacrificing classification accuracy. Our classification accuracy results yielded results on the order of 93%

    Twofold Integer Programming Model for Improving Rough Set Classification Accuracy in Data Mining.

    Get PDF
    The fast growing size of databases has resulted in a great demand for tools capable of analyzing data with the aim of discovering new knowledge and patterns. These tools will hopefully close the gap between the steady growth of information and the escalating demand to understand and discover the value of such knowledge. These tools are known as Data Mining (DM). One aims of DM is to discover decision rules for extracting meaningful knowledge. These rules consist of conditions over attribute value pairs called the descriptions, and decision attributes. Therefore generating a good decision model or classification model is a major component in many data mining researches. The classification approach basically produces a function that maps data item into one of several predefined classes, by way of inputting training dataset and building a model of the class attribute based on the rest of the attributes.This research undertakes three main tasks. The first task is to introduce a new rough model for minimum reduct selection and default rules generation, which is known as a Twofold Integer Programming (TIP). The second task is to enhance rules accuracy based on the first task, while the third task is to classify new objects or cases. The TIP model is based on translation of the discernibility relation of a Decision System (DS) into an Integer Programming (IP) model, resolved by using the branch and bound search method in order to generate the full reduct of the DS. The TIP model is then applied to the reduct to generate the default rules, which in turn are used to classify unseen objects with a satisfying accuracy. Apart from introducing the TIP model, this research also addressed the issues of missing values, discretization and extracting minimum rules. The treatment of missing values and discretization are being carried out during the preprocessing stage. The extraction of minimum rules operation is conducted after the default rules have been generated in order to obtain the most useful discovered rules. Eight datasets from machine learning repositories and domain theories are tested by the TIP model. Total rules number, rules length and rules accuracy for the generation rules are recorded. The accuracy for rules and classification resulted from the TIP method are compared with other methods such as Standard Integer Programming (SIP) and Decision Related Integer Programming (DRIP) from Rough Set, Genetic Algorithm (GA), Johnson reducer, HoltelR method, Multiple Regression (MR), Neural Network (NN), Induction of Decision Tree Algorithm (ID3) and Base Learning Algorithm (C4.5); all other classifiers that are mostly used in the classification tasks. Based on the experiment results, the classification method using the TIP approach has successfully performed rules generation and classification tasks as required during a classification operation. The outcome of a considerably good accuracy is mainly due to the right selection of relevant attributes. This research has proven that the TIP method has shown the ability to cater for different kinds of datasets and obtained a good rough classification model with promising results as compared with other commonly used classifiers. This research opens a wide range of future work to be considered, which includes applying the proposed method in other areas such as web mining, text mining or multimedia mining; and extending the proposed approach to work in parallel computing in data mining

    P2DM-RGCD: PPDM Centric Classification Rule Generation Scheme

    Get PDF
    In present day applications the approach of data mining and associated privacy preservation plays a significant role for ensuring optimal mining function. The approach of privacy preserving data mining (PPDM) emphasizes on ensuring security of private information of the participants. On the contrary majority of present mining applications employ the vertically partitioned data for mining utilities. In such scenario when the overall rule is divided among participants, some of the parties remain with fewer rules sets and thus the classification accuracy achieved by them always remain questionable. On the other hand, the consideration of private information associated with any part will violate the approach of PPDM. Therefore, in order to eliminate such situations and to provide a facility of rule regeneration in this paper, a highly robust and efficient rule regeneration scheme has been proposed ensures optimal classification accuracy without using any critical user information for rule generation. The proposed system developed a rule generation function called cumulative dot product (P2DM-RGCD) rule regeneration scheme. The developed algorithm generates two possible optimal rule generation and update functions based on cumulative updates and dot product. The proposed system has exhibited optimal response in terms of higher classification accuracy, minimum information loss and optimal training efficiency
    corecore