4,581 research outputs found
Diagnosis of Coronary Artery Disease Using Artificial Intelligence Based Decision Support System
Heart disease is any disease that affects the normal condition and functionality of heart.
Coronary Artery Disease (CAD) is the most common. It is caused by the accumulation of
plaques within the walls of the coronary arteries that supply blood to the heart muscles. It
may lead to continued temporary oxygen deprivation that will result in the damage of
heart muscles. CAD caused more than 7,000,000 deaths every year in the worldwide. It is
the second cause of death in Malaysia and the major cause of death in the world. To
diagnose CAD, cardiologists usually perform many diagnostic steps. Unfortunately, the
results of the diagnostic tests are difficult to interpret which do not always provide
defmite answer, but may lead to different opinion. To help cardiologists providing correct
diagnosis of CAD in less expensive and non- invasive manner, many researchers had
developed decision support system to diagnose CAD.
A fuzzy decision support system for the diagnosis of coronary artery disease based on
rough set theory is proposed in this thesis. The objective is to develop an evidence based
fuzzy decision support system for the diagnosis of coronary artery disease. This proposed
system is based on evidences or raw medical data sets, which are taken from University
California Irvine (UCI) database. The proposed system is designed to be able to handle
the uncertainty, incompleteness and heterogeneity of data sets. Artificial Neural Network
with Rough Set Theory attribute reduction (ANNRST) is proposed is the imputation
method to solve the incompleteness of data sets. Evaluations of ANNRST based on
classifiers performance and rule filtering are proposed by comparing ANNRST and other
methods using classifiers and during rule filtering process. RST rule inq'u ction is applied
to ANNRST imputed data sets. Numerical values are discretized using Boolean reasoning
method. Rule selection based on quality and importance is proposed. RST rule
importance measure is used to select the most important high quality rules. The selected
rules are used to build fuzzy decision support systems. Fuzzification based on
discretization cuts and fuzzy rule weighing based on rule quality are proposed. Mamdani
inference method is used to provide the decision with centroid defuziification to give
numerical results, which represent the possibility of blocking in coronary, arteries.
The results show that proposed ANNRST has similar performance to ANN and
outperforms k-Nearest Neighbour (k-NN) and Concept Most Common attribute valueFilling (CMCF). ANNRST is simpler than ANN because it has fewer input attributes and
more suitable to be applied for missing data imputation problem. ANNRST also provides
strong relationship between original and imputed data sets. It is shown that ANNRST
provide better RST rule based classifier than CMCF and k-NN during rule filtering
process. Proposed RST based rule selection also performs better than other filtering
methods. Developed Fuzzy Decision Support System (FOSS) provides better
performance compared to multi layer perceptron ANN, k-NN, rule induction method
called C4.5 and Repeated Incremental Pruning to Produce Error Reduction (RIPPER)
applied on UCI CAD data sets and Ipoh Specialist Hospital's patients. FOSS has
transparent knowledge representation, heterogeneous and incomplete input data handling
capability. FOSS is able to give the approximate percentage of blocking of coronary
artery based on 13 standard attributes based on historical, simple blood test and ECG
data, etc, where coronary angiography or cardiologist can not give the percentage. The
results of FOSS were evaluated by three local cardiologists and considered to be efficient
and useful
Fuzzy Logic - Retrieval of Data from Database
There has been a complicated relationship associated with fuzzy logic and probability theory. All thetechniques in fuzzy logic discuss possibility theory and probability theory which measure two kinds ofuncertainty. In classical probability theory, a probability measure is a number between 0 and 1. Fuzzy rulebased system consists of a set of fuzzy rules with partially overlapping conditions. This paper demonstrates newmethodologies for predicting an output, when a particular input triggers with multiple fuzzy rules. This paperanalyzes the behavior of width of an interval to represent imprecision of the probability estimates. We alsopropose new applications of possibility theory and probability theory as can be applied to Fuzzy logic
Dynamic Rule Covering Classification in Data Mining with Cyber Security Phishing Application
Data mining is the process of discovering useful patterns from datasets using intelligent techniques to help users make certain decisions. A typical data mining task is classification, which involves predicting a target variable known as the class in previously unseen data based on models learnt from an input dataset. Covering is a well-known classification approach that derives models with If-Then rules. Covering methods, such as PRISM, have a competitive predictive performance to other classical classification techniques such as greedy, decision tree and associative classification. Therefore, Covering models are appropriate decision-making tools and users favour them carrying out decisions.
Despite the use of Covering approach in data processing for different classification applications, it is also acknowledged that this approach suffers from the noticeable drawback of inducing massive numbers of rules making the resulting model large and unmanageable by users. This issue is attributed to the way Covering techniques induce the rules as they keep adding items to the rule’s body, despite the limited data coverage (number of training instances that the rule classifies), until the rule becomes with zero error. This excessive learning overfits the training dataset and also limits the applicability of Covering models in decision making, because managers normally prefer a summarised set of knowledge that they are able to control and comprehend rather a high maintenance models. In practice, there should be a trade-off between the number of rules offered by a classification model and its predictive performance. Another issue associated with the Covering models is the overlapping of training data among the rules, which happens when a rule’s classified data are discarded during the rule discovery phase. Unfortunately, the impact of a rule’s removed data on other potential rules is not considered by this approach. However, When removing training data linked with a rule, both frequency and rank of other rules’ items which have appeared in the removed data are updated. The impacted rules should maintain their true rank and frequency in a dynamic manner during the rule discovery phase rather just keeping the initial computed frequency from the original input dataset.
In response to the aforementioned issues, a new dynamic learning technique based on Covering and rule induction, that we call Enhanced Dynamic Rule Induction (eDRI), is developed. eDRI has been implemented in Java and it has been embedded in WEKA machine learning tool. The developed algorithm incrementally discovers the rules using primarily frequency and rule strength thresholds. These thresholds in practice limit the search space for both items as well as potential rules by discarding any with insufficient data representation as early as possible resulting in an efficient training phase. More importantly, eDRI substantially cuts down the number of training examples scans by continuously updating potential rules’ frequency and strength parameters in a dynamic manner whenever a rule gets inserted into the classifier. In particular, and for each derived rule, eDRI adjusts on the fly the remaining potential rules’ items frequencies as well as ranks specifically for those that appeared within the deleted training instances of the derived rule. This gives a more realistic model with minimal rules redundancy, and makes the process of rule induction efficient and dynamic and not static. Moreover, the proposed technique minimises the classifier’s number of rules at preliminary stages by stopping learning when any rule does not meet the rule’s strength threshold therefore minimising overfitting and ensuring a manageable classifier. Lastly, eDRI prediction procedure not only priorities using the best ranked rule for class forecasting of test data but also restricts the use of the default class rule thus reduces the number of misclassifications.
The aforementioned improvements guarantee classification models with smaller size that do not overfit the training dataset, while maintaining their predictive performance. The eDRI derived models particularly benefit greatly users taking key business decisions since they can provide a rich knowledge base to support their decision making. This is because these models’ predictive accuracies are high, easy to understand, and controllable as well as robust, i.e. flexible to be amended without drastic change. eDRI applicability has been evaluated on the hard problem of phishing detection. Phishing normally involves creating a fake well-designed website that has identical similarity to an existing business trustful website aiming to trick users and illegally obtain their credentials such as login information in order to access their financial assets. The experimental results against large phishing datasets revealed that eDRI is highly useful as an anti-phishing tool since it derived manageable size models when compared with other traditional techniques without hindering the classification performance. Further evaluation results using other several classification datasets from different domains obtained from University of California Data Repository have corroborated eDRI’s competitive performance with respect to accuracy, number of knowledge representation, training time and items space reduction. This makes the proposed technique not only efficient in inducing rules but also effective
Data mining a prostate cancer dataset using rough sets
Prostate cancer remains one of the leading causes of cancer death worldwide, with a reported incidence rate of 650,000 cases per annum worldwide. The causal factors of prostate cancer still remain to be determined. In this paper, we investigate a medical dataset containing clinical information on 502 prostate cancer patients using the machine learning technique of rough sets. Our preliminary results yield a classification accuracy of 90%, with high sensitivity and specificity (both at approximately 91%). Our results yield a predictive positive value (PPN) of 81% and a predictive negative value (PNV) of 95%. In addition to the high classification accuracy of our system, the rough set approach also provides a rule-based inference mechanism for information extraction that is suitable for integration into a rule-based system. The generated rules relate directly to the attributes and their values and provide a direct mapping between them
Deposit subscribe Prediction using Data Mining Techniques based Real Marketing Dataset
Recently, economic depression, which scoured all over the world, affects
business organizations and banking sectors. Such economic pose causes a severe
attrition for banks and customer retention becomes impossible. Accordingly,
marketing managers are in need to increase marketing campaigns, whereas
organizations evade both expenses and business expansion. In order to solve
such riddle, data mining techniques is used as an uttermost factor in data
analysis, data summarizations, hidden pattern discovery, and data
interpretation. In this paper, rough set theory and decision tree mining
techniques have been implemented, using a real marketing data obtained from
Portuguese marketing campaign related to bank deposit subscription [Moro et
al., 2011]. The paper aims to improve the efficiency of the marketing campaigns
and helping the decision makers by reducing the number of features, that
describes the dataset and spotting on the most significant ones, and predict
the deposit customer retention criteria based on potential predictive rules
- …