4,581 research outputs found

    A Novel Variable Precision Reduction Approach to Comprehensive Knowledge Systems

    Get PDF

    Diagnosis of Coronary Artery Disease Using Artificial Intelligence Based Decision Support System

    Get PDF
    Heart disease is any disease that affects the normal condition and functionality of heart. Coronary Artery Disease (CAD) is the most common. It is caused by the accumulation of plaques within the walls of the coronary arteries that supply blood to the heart muscles. It may lead to continued temporary oxygen deprivation that will result in the damage of heart muscles. CAD caused more than 7,000,000 deaths every year in the worldwide. It is the second cause of death in Malaysia and the major cause of death in the world. To diagnose CAD, cardiologists usually perform many diagnostic steps. Unfortunately, the results of the diagnostic tests are difficult to interpret which do not always provide defmite answer, but may lead to different opinion. To help cardiologists providing correct diagnosis of CAD in less expensive and non- invasive manner, many researchers had developed decision support system to diagnose CAD. A fuzzy decision support system for the diagnosis of coronary artery disease based on rough set theory is proposed in this thesis. The objective is to develop an evidence based fuzzy decision support system for the diagnosis of coronary artery disease. This proposed system is based on evidences or raw medical data sets, which are taken from University California Irvine (UCI) database. The proposed system is designed to be able to handle the uncertainty, incompleteness and heterogeneity of data sets. Artificial Neural Network with Rough Set Theory attribute reduction (ANNRST) is proposed is the imputation method to solve the incompleteness of data sets. Evaluations of ANNRST based on classifiers performance and rule filtering are proposed by comparing ANNRST and other methods using classifiers and during rule filtering process. RST rule inq'u ction is applied to ANNRST imputed data sets. Numerical values are discretized using Boolean reasoning method. Rule selection based on quality and importance is proposed. RST rule importance measure is used to select the most important high quality rules. The selected rules are used to build fuzzy decision support systems. Fuzzification based on discretization cuts and fuzzy rule weighing based on rule quality are proposed. Mamdani inference method is used to provide the decision with centroid defuziification to give numerical results, which represent the possibility of blocking in coronary, arteries. The results show that proposed ANNRST has similar performance to ANN and outperforms k-Nearest Neighbour (k-NN) and Concept Most Common attribute valueFilling (CMCF). ANNRST is simpler than ANN because it has fewer input attributes and more suitable to be applied for missing data imputation problem. ANNRST also provides strong relationship between original and imputed data sets. It is shown that ANNRST provide better RST rule based classifier than CMCF and k-NN during rule filtering process. Proposed RST based rule selection also performs better than other filtering methods. Developed Fuzzy Decision Support System (FOSS) provides better performance compared to multi layer perceptron ANN, k-NN, rule induction method called C4.5 and Repeated Incremental Pruning to Produce Error Reduction (RIPPER) applied on UCI CAD data sets and Ipoh Specialist Hospital's patients. FOSS has transparent knowledge representation, heterogeneous and incomplete input data handling capability. FOSS is able to give the approximate percentage of blocking of coronary artery based on 13 standard attributes based on historical, simple blood test and ECG data, etc, where coronary angiography or cardiologist can not give the percentage. The results of FOSS were evaluated by three local cardiologists and considered to be efficient and useful

    Fuzzy Logic - Retrieval of Data from Database

    Get PDF
    There has been a complicated relationship associated with fuzzy logic and probability theory. All thetechniques in fuzzy logic discuss possibility theory and probability theory which measure two kinds ofuncertainty. In classical probability theory, a probability measure is a number between 0 and 1. Fuzzy rulebased system consists of a set of fuzzy rules with partially overlapping conditions. This paper demonstrates newmethodologies for predicting an output, when a particular input triggers with multiple fuzzy rules. This paperanalyzes the behavior of width of an interval to represent imprecision of the probability estimates. We alsopropose new applications of possibility theory and probability theory as can be applied to Fuzzy logic

    Dynamic Rule Covering Classification in Data Mining with Cyber Security Phishing Application

    Get PDF
    Data mining is the process of discovering useful patterns from datasets using intelligent techniques to help users make certain decisions. A typical data mining task is classification, which involves predicting a target variable known as the class in previously unseen data based on models learnt from an input dataset. Covering is a well-known classification approach that derives models with If-Then rules. Covering methods, such as PRISM, have a competitive predictive performance to other classical classification techniques such as greedy, decision tree and associative classification. Therefore, Covering models are appropriate decision-making tools and users favour them carrying out decisions. Despite the use of Covering approach in data processing for different classification applications, it is also acknowledged that this approach suffers from the noticeable drawback of inducing massive numbers of rules making the resulting model large and unmanageable by users. This issue is attributed to the way Covering techniques induce the rules as they keep adding items to the rule’s body, despite the limited data coverage (number of training instances that the rule classifies), until the rule becomes with zero error. This excessive learning overfits the training dataset and also limits the applicability of Covering models in decision making, because managers normally prefer a summarised set of knowledge that they are able to control and comprehend rather a high maintenance models. In practice, there should be a trade-off between the number of rules offered by a classification model and its predictive performance. Another issue associated with the Covering models is the overlapping of training data among the rules, which happens when a rule’s classified data are discarded during the rule discovery phase. Unfortunately, the impact of a rule’s removed data on other potential rules is not considered by this approach. However, When removing training data linked with a rule, both frequency and rank of other rules’ items which have appeared in the removed data are updated. The impacted rules should maintain their true rank and frequency in a dynamic manner during the rule discovery phase rather just keeping the initial computed frequency from the original input dataset. In response to the aforementioned issues, a new dynamic learning technique based on Covering and rule induction, that we call Enhanced Dynamic Rule Induction (eDRI), is developed. eDRI has been implemented in Java and it has been embedded in WEKA machine learning tool. The developed algorithm incrementally discovers the rules using primarily frequency and rule strength thresholds. These thresholds in practice limit the search space for both items as well as potential rules by discarding any with insufficient data representation as early as possible resulting in an efficient training phase. More importantly, eDRI substantially cuts down the number of training examples scans by continuously updating potential rules’ frequency and strength parameters in a dynamic manner whenever a rule gets inserted into the classifier. In particular, and for each derived rule, eDRI adjusts on the fly the remaining potential rules’ items frequencies as well as ranks specifically for those that appeared within the deleted training instances of the derived rule. This gives a more realistic model with minimal rules redundancy, and makes the process of rule induction efficient and dynamic and not static. Moreover, the proposed technique minimises the classifier’s number of rules at preliminary stages by stopping learning when any rule does not meet the rule’s strength threshold therefore minimising overfitting and ensuring a manageable classifier. Lastly, eDRI prediction procedure not only priorities using the best ranked rule for class forecasting of test data but also restricts the use of the default class rule thus reduces the number of misclassifications. The aforementioned improvements guarantee classification models with smaller size that do not overfit the training dataset, while maintaining their predictive performance. The eDRI derived models particularly benefit greatly users taking key business decisions since they can provide a rich knowledge base to support their decision making. This is because these models’ predictive accuracies are high, easy to understand, and controllable as well as robust, i.e. flexible to be amended without drastic change. eDRI applicability has been evaluated on the hard problem of phishing detection. Phishing normally involves creating a fake well-designed website that has identical similarity to an existing business trustful website aiming to trick users and illegally obtain their credentials such as login information in order to access their financial assets. The experimental results against large phishing datasets revealed that eDRI is highly useful as an anti-phishing tool since it derived manageable size models when compared with other traditional techniques without hindering the classification performance. Further evaluation results using other several classification datasets from different domains obtained from University of California Data Repository have corroborated eDRI’s competitive performance with respect to accuracy, number of knowledge representation, training time and items space reduction. This makes the proposed technique not only efficient in inducing rules but also effective

    Attribute extraction and classification using rough sets on a lymphoma dataset

    Get PDF

    Data mining a prostate cancer dataset using rough sets

    Get PDF
    Prostate cancer remains one of the leading causes of cancer death worldwide, with a reported incidence rate of 650,000 cases per annum worldwide. The causal factors of prostate cancer still remain to be determined. In this paper, we investigate a medical dataset containing clinical information on 502 prostate cancer patients using the machine learning technique of rough sets. Our preliminary results yield a classification accuracy of 90%, with high sensitivity and specificity (both at approximately 91%). Our results yield a predictive positive value (PPN) of 81% and a predictive negative value (PNV) of 95%. In addition to the high classification accuracy of our system, the rough set approach also provides a rule-based inference mechanism for information extraction that is suitable for integration into a rule-based system. The generated rules relate directly to the attributes and their values and provide a direct mapping between them

    Deposit subscribe Prediction using Data Mining Techniques based Real Marketing Dataset

    Full text link
    Recently, economic depression, which scoured all over the world, affects business organizations and banking sectors. Such economic pose causes a severe attrition for banks and customer retention becomes impossible. Accordingly, marketing managers are in need to increase marketing campaigns, whereas organizations evade both expenses and business expansion. In order to solve such riddle, data mining techniques is used as an uttermost factor in data analysis, data summarizations, hidden pattern discovery, and data interpretation. In this paper, rough set theory and decision tree mining techniques have been implemented, using a real marketing data obtained from Portuguese marketing campaign related to bank deposit subscription [Moro et al., 2011]. The paper aims to improve the efficiency of the marketing campaigns and helping the decision makers by reducing the number of features, that describes the dataset and spotting on the most significant ones, and predict the deposit customer retention criteria based on potential predictive rules
    • …
    corecore