4 research outputs found

    Information Gain Based Dimensionality Selection for Classifying Text Documents

    Get PDF
    Abstract-Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods

    Information gain based dimensionality selection for classifying text documents

    Full text link
    Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods

    IMPROVING UNDERSTANDABILITY AND UNCERTAINTY MODELING OF DATA USING FUZZY LOGIC SYSTEMS

    Get PDF
    The need for automation, optimality and efficiency has made modern day control and monitoring systems extremely complex and data abundant. However, the complexity of the systems and the abundance of raw data has reduced the understandability and interpretability of data which results in a reduced state awareness of the system. Furthermore, different levels of uncertainty introduced by sensors and actuators make interpreting and accurately manipulating systems difficult. Classical mathematical methods lack the capability to capture human knowledge and increase understandability while modeling such uncertainty. Fuzzy Logic has been shown to alleviate both these problems by introducing logic based on vague terms that rely on human understandable terms. The use of linguistic terms and simple consequential rules increase the understandability of system behavior as well as data. Use of vague terms and modeling data from non-discrete prototypes enables modeling of uncertainty. However, due to recent trends, the primary research of fuzzy logic have been diverged from the basic concept of understandability. Furthermore, high computational costs to achieve robust uncertainty modeling have led to restricted use of such fuzzy systems in real-world applications. Thus, the goal of this dissertation is to present algorithms and techniques that improve understandability and uncertainty modeling using Fuzzy Logic Systems. In order to achieve this goal, this dissertation presents the following major contributions: 1) a novel methodology for generating Fuzzy Membership Functions based on understandability, 2) Linguistic Summarization of data using if-then type consequential rules, and 3) novel Shadowed Type-2 Fuzzy Logic Systems for uncertainty modeling. Finally, these presented techniques are applied to real world systems and data to exemplify their relevance and usage

    Intelligent Systems Approach for Classification and Management of Patients with Headache

    Get PDF
    Primary headache disorders are the most common complaints worldwide. The socioeconomic and personal impact of headache disorders is enormous, as it is the leading cause of workplace absence. Headache patients’ consultations are increasing as the population has increased in size, live longer and many people have multiple conditions, however, access to specialist services across the UK is currently inequitable because the numbers of trained consultant neurologists in the UK are 10 times lower than other European countries. Additionally, more than two third of headache cases presented to primary care were labelled with unspecified headache. Therefore, an alternative pathway to diagnose and manage patients with primary headache could be crucial to reducing the need for specialist assessment and increase capacity within the current service model. Several recent studies have targeted this issue through the development of clinical decision support systems, which can help non-specialist doctors and general practitioners to diagnose patients with primary headache disorders in primary clinics. However, the majority of these studies were following a rule-based system style, in which the rules were summarised and expressed by a computer engineer. This style carries many downsides, and we will discuss them later on in this dissertation. In this study, we are adopting a completely different approach. The use of machine learning is recruited for the classification of primary headache disorders, for which a dataset of 832 records of patients with primary headaches was considered, originating from three medical centres located in Turkey. Three main types of primary headaches were derived from the data set including Tension Type Headache in both episodic and chronic forms, Migraine with and without Aura, followed by Trigeminal Autonomic Cephalalgia that further subdivided into Cluster headache, paroxysmal hemicrania and short-lasting unilateral neuralgiform headache attacks with conjunctival injection and tearing. Six popular machine-learning based classifiers, including linear and non-linear ensemble learning, in addition to one regression based procedure, have been evaluated for the classification of primary headaches within a supervised learning setting, achieving highest aggregate performance outcomes of AUC 0.923, sensitivity 0.897, and overall classification accuracy of 0.843. This study also introduces the proposed HydroApp system, which is an M-health based personalised application for the follow-up of patients with long-term conditions such as chronic headache and hydrocephalus. We managed to develop this system with the supervision of headache specialists at Ashford hospital, London, and neurology experts at Walton Centre and Alder Hey hospital Liverpool. We have successfully investigated the acceptance of using such an M-health based system via an online questionnaire, where 86% of paediatric patients and 60% of adult patients were interested in using HydroApp system to manage their conditions. Features and functions offered by HydroApp system such as recording headache score, recording of general health and well-being as well as alerting the treating team, have been perceived as very or extremely important aspects from patients’ point of view. The study concludes that the advances in intelligent systems and M-health applications represent a promising atmosphere through which to identify alternative solutions, which in turn increases the capacity in the current service model and improves diagnostic capability in the primary headache domain and beyond
    corecore