Search CORE

4 research outputs found

Information Gain Based Dimensionality Selection for Classifying Text Documents

Author: Dumidu Wijayasekara
Miles Mcqueen
Milos Manic
Publication venue
Publication date: 03/04/2020
Field of study

Abstract-Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods

CiteSeerX

Information gain based dimensionality selection for classifying text documents

Author: Manic Milos
McQueen Miles
Wijayasekara Dumidu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2013
Field of study

Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods

Crossref

UNT Digital Library

IMPROVING UNDERSTANDABILITY AND UNCERTAINTY MODELING OF DATA USING FUZZY LOGIC SYSTEMS

Author: Wijayasekara Dumidu S
Publication venue: VCU Scholars Compass
Publication date: 01/01/2016
Field of study

The need for automation, optimality and efficiency has made modern day control and monitoring systems extremely complex and data abundant. However, the complexity of the systems and the abundance of raw data has reduced the understandability and interpretability of data which results in a reduced state awareness of the system. Furthermore, different levels of uncertainty introduced by sensors and actuators make interpreting and accurately manipulating systems difficult. Classical mathematical methods lack the capability to capture human knowledge and increase understandability while modeling such uncertainty. Fuzzy Logic has been shown to alleviate both these problems by introducing logic based on vague terms that rely on human understandable terms. The use of linguistic terms and simple consequential rules increase the understandability of system behavior as well as data. Use of vague terms and modeling data from non-discrete prototypes enables modeling of uncertainty. However, due to recent trends, the primary research of fuzzy logic have been diverged from the basic concept of understandability. Furthermore, high computational costs to achieve robust uncertainty modeling have led to restricted use of such fuzzy systems in real-world applications. Thus, the goal of this dissertation is to present algorithms and techniques that improve understandability and uncertainty modeling using Fuzzy Logic Systems. In order to achieve this goal, this dissertation presents the following major contributions: 1) a novel methodology for generating Fuzzy Membership Functions based on understandability, 2) Linguistic Summarization of data using if-then type consequential rules, and 3) novel Shadowed Type-2 Fuzzy Logic Systems for uncertainty modeling. Finally, these presented techniques are applied to real world systems and data to exemplify their relevance and usage

VCU Scholars Compass

Intelligent Systems Approach for Classification and Management of Patients with Headache

Author: Kaky AJM
Publication venue
Publication date
Field of study

Primary headache disorders are the most common complaints worldwide. The socioeconomic and personal impact of headache disorders is enormous, as it is the leading cause of workplace absence. Headache patients’ consultations are increasing as the population has increased in size, live longer and many people have multiple conditions, however, access to specialist services across the UK is currently inequitable because the numbers of trained consultant neurologists in the UK are 10 times lower than other European countries. Additionally, more than two third of headache cases presented to primary care were labelled with unspecified headache. Therefore, an alternative pathway to diagnose and manage patients with primary headache could be crucial to reducing the need for specialist assessment and increase capacity within the current service model. Several recent studies have targeted this issue through the development of clinical decision support systems, which can help non-specialist doctors and general practitioners to diagnose patients with primary headache disorders in primary clinics. However, the majority of these studies were following a rule-based system style, in which the rules were summarised and expressed by a computer engineer. This style carries many downsides, and we will discuss them later on in this dissertation. In this study, we are adopting a completely different approach. The use of machine learning is recruited for the classification of primary headache disorders, for which a dataset of 832 records of patients with primary headaches was considered, originating from three medical centres located in Turkey. Three main types of primary headaches were derived from the data set including Tension Type Headache in both episodic and chronic forms, Migraine with and without Aura, followed by Trigeminal Autonomic Cephalalgia that further subdivided into Cluster headache, paroxysmal hemicrania and short-lasting unilateral neuralgiform headache attacks with conjunctival injection and tearing. Six popular machine-learning based classifiers, including linear and non-linear ensemble learning, in addition to one regression based procedure, have been evaluated for the classification of primary headaches within a supervised learning setting, achieving highest aggregate performance outcomes of AUC 0.923, sensitivity 0.897, and overall classification accuracy of 0.843. This study also introduces the proposed HydroApp system, which is an M-health based personalised application for the follow-up of patients with long-term conditions such as chronic headache and hydrocephalus. We managed to develop this system with the supervision of headache specialists at Ashford hospital, London, and neurology experts at Walton Centre and Alder Hey hospital Liverpool. We have successfully investigated the acceptance of using such an M-health based system via an online questionnaire, where 86% of paediatric patients and 60% of adult patients were interested in using HydroApp system to manage their conditions. Features and functions offered by HydroApp system such as recording headache score, recording of general health and well-being as well as alerting the treating team, have been perceived as very or extremely important aspects from patients’ point of view. The study concludes that the advances in intelligent systems and M-health applications represent a promising atmosphere through which to identify alternative solutions, which in turn increases the capacity in the current service model and improves diagnostic capability in the primary headache domain and beyond

LJMU Research Online (Liverpool John Moores University)