5 research outputs found

    Feature Analysis and Classification of Inflammatory Bowel Disease and Hidradenitis Suppurativa Using Data Mining

    Get PDF
    Inflammatory Bowel Disease (IBD) refers to a group of conditions that primarily affect the gut and cause inflammation. In contrast, Hidradenitis Suppurativa (HS) is a chronic immune-mediated condition characterized by boils in a person's underarms, groyne, and/or under their breasts. In recent years, the research on HS has been gaining a growing level of interest in light of reliable recognition of these two diseases (i.e., IBD and HS) becoming crucial in clinical settings. In this study, multiple machine learning and data mining algorithms will be investigated to shed light on HS versus IBD distinction, methods such as Decision Tree, Random Forest, Naive Bayes, and k-Nearest Neighbor algorithms. These potential solution to recognize HS-IBD boundaries are used to classify IBD and HS disease based on multiple features such as age, illness history, and clinical observations. The thesis conducts a comparative study on the various classification strategies which can be achieved through the use of machine learning in order to recognize these two diseases. These methods have been applied to the IBD/HS dataset that was collected by the medical professionals at the Mayo clinic, Rochester, MN, USA. The information consists of 198 data records and 52 attributes; however, data cleaning process was necessary before employing the machine learning. During the evaluation, the performance of approaches were compared with respect to their accuracy as the commonly used metric. Based on the findings of the conducted comparisons, it was discovered that the \emph{random forest} approach performed the best, achieving an accuracy of (93.8%) for a reduced dataset that contained 20 features for each patient. The detailed results analysis is supported by several visualization techniques such as t-SNE. In addition, the thesis makes an effort to determine a precise set of criteria and identify the features that are the most significant in separating these two diseases from one another. The results of this study provide medical professionals with the opportunity to investigate aspects that previously were assumed to not play a significant role in clinical practice. To the best of author’s knowledge, this is the first applied study to utilize machine learning and data mining techniques for the IBD and HS classification

    From 'tree' based Bayesian networks to mutual information classifiers : deriving a singly connected network classifier using an information theory based technique

    Get PDF
    For reasoning under uncertainty the Bayesian network has become the representation of choice. However, except where models are considered 'simple' the task of construction and inference are provably NP-hard. For modelling larger 'real' world problems this computational complexity has been addressed by methods that approximate the model. The Naive Bayes classifier, which has strong assumptions of independence among features, is a common approach, whilst the class of trees is another less extreme example. In this thesis we propose the use of an information theory based technique as a mechanism for inference in Singly Connected Networks. We call this a Mutual Information Measure classifier, as it corresponds to the restricted class of trees built from mutual information. We show that the new approach provides for both an efficient and localised method of classification, with performance accuracies comparable with the less restricted general Bayesian networks. To improve the performance of the classifier, we additionally investigate the possibility of expanding the class Markov blanket by use of a Wrapper approach and further show that the performance can be improved by focusing on the class Markov blanket and that the improvement is not at the expense of increased complexity. Finally, the two methods are applied to the task of diagnosing the 'real' world medical domain, Acute Abdominal Pain. Known to be both a different and challenging domain to classify, the objective was to investigate the optiniality claims, in respect of the Naive Bayes classifier, that some researchers have argued, for classifying in this domain. Despite some loss of representation capabilities we show that the Mutual Information Measure classifier can be effectively applied to the domain and also provides a recognisable qualitative structure without violating 'real' world assertions. In respect of its 'selective' variant we further show that the improvement achieves a comparable predictive accuracy to the Naive Bayes classifier and that the Naive Bayes classifier's 'overall' performance is largely due the contribution of the majority group Non-Specific Abdominal Pain, a group of exclusion

    Identifying Markov blankets with decision tree induction

    No full text
    The Markov Blanket of a target variable is the minimum conditioning set of variables that makes the target independent of all other variables. Markov Blankets inform feature selection, aid in causal discovery and serve as a basis for scalable methods of constructing Bayesian networks. This paper applies decision tree induction to the task of Markov Blanket identification. Notably, we compare (a) C5.0, a widely used algorithm for decision rule induction, (b) C5C, which postprocesses C5.0’s rule set to retain the most frequently referenced variables and (c) PC, a standard method for Bayesian Network induction. C5C performs as well as or better than C5.0 and PC across a number of data sets. Our modest variation of an inexpensive, accurate, off-theshelf induction engine mitigates the need for specialized procedures, and establishes baseline performance against which specialized algorithms can be compared. 1
    corecore