5 research outputs found

    A modified whale optimization algorithm for enhancing the features selection process in machine learning

    Get PDF
    In recent years, when there is an abundance of large datasets in various fields, the importance of feature selection problem has become critical for researchers. The real world applications rely on large datasets, which implies that datasets have hundreds of instances and attributes. Finding a better way of optimum feature selection could significantly improve the machine learning predictions. Recently, metaheuristics have gained momentous popularity for solving feature selection problem. Whale Optimization Algorithm has gained significant attention by the researcher community searching to solve the feature selection problem. However, the exploration problem in whale optimization algorithm still exists and remains to be researched as various parameters within the whale algorithm have been ignored and not introduced into machine learning models. This paper proposes a new and improved version of the whale algorithm entitled Modified Whale Optimization Algorithm (MWOA) that hybrid with the machine learning models such as logistic regression, decision tree, random forest, K-nearest neighbour, support vector machine, naïve Bayes model. To test this new approach and the performance, the breast cancer datasets were used for MWOA evaluation. The test results revealed the superiority of this model when compared to the results obtained by machine learning models

    Effective Features and Machine Learning Methods for Document Classification

    Get PDF
    Document classification has been involved in a variety of applications, such as phishing and fraud detection, news categorisation, and information retrieval. This thesis aims to provide novel solutions to several important problems presented by document classification. First, an improved Principal Components Analysis (PCA), based on similarity and correlation criteria instead of covariance, is proposed, which aims to capture low-dimensional feature subset that facilitates improved performance in text classification. The experimental results have demonstrated the advantages and usefulness of the proposed method for text classification in high-dimensional feature space in terms of the number of features required to achieve the best classification accuracy. Second, two hybrid feature-subset selection methods are proposed based on the combination (via either union or intersection) of the results of both supervised (in one method) and unsupervised (in the other method) filter approaches prior to the use of a wrapper, leading to low-dimensional feature subset that can achieve both high classification accuracy and good interpretability, and spend less processing time than most current methods. The experimental results have demonstrated the effectiveness of the proposed methods for feature subset selection in high-dimensional feature space in terms of the number of selected features and the processing time spent to achieve the best classification accuracy. Third, a class-specific (supervised) pre-trained approach based on a sparse autoencoder is proposed for acquiring low-dimensional interesting structure of relevant features, which can be used for high-performance document classification. The experimental results have demonstrated the merit of this proposed method for document classification in high-dimensional feature space, in terms of the limited number of features required to achieve good classification accuracy. Finally, deep classifier structures associated with a stacked autoencoder (SAE) for higher-level feature extraction are investigated, aiming to overcome the difficulties experienced in training deep neural networks with limited training data in high-dimensional feature space, such as overfitting and vanishing/exploding gradients. This investigation has resulted in a three-stage learning algorithm for training deep neural networks. In comparison with support vector machines (SVMs) combined with SAE and Deep Multilayer Perceptron (DMLP) with random weight initialisation, the experimental results have shown the advantages and effectiveness of the proposed three-stage learning algorithm

    A Novel Hybrid Algorithm for Feature Selection Based on Whale Optimization Algorithm

    No full text

    Bajwa Hospital Eye Diseases

    No full text
    This Dataset is for ' A Novel Hybrid Algorithm for Feature Selection Based On Optimization' paper. The Dataset is collected from 'Bajwa Hospital, Dinanagar' on 20.04.2022. It has four categories of images of eyes namely Normal, Cataract, Glaucoma, and Retina disease. There are 300 Normal images whereas 100 images each for Cataract, Glaucoma, and Retina disease. The data collected is Anonymized and no personal information is shared. In this research, we evaluated the idea that employing a hybridization technique to construct feature selection algorithms is superior to using a single optimization algorithm. We are employing the same idea on eye disease images to get better optimization results and hence to get better classification of images based on the type of disease.THIS DATASET IS ARCHIVED AT DANS/EASY, BUT NOT ACCESSIBLE HERE. TO VIEW A LIST OF FILES AND ACCESS THE FILES IN THIS DATASET CLICK ON THE DOI-LINK ABOV

    Bajwa Hospital (Multi Eye Disease Dataset)

    No full text
    This Multi Disease Dataset was created for the study entitled "A Novel Hybrid Algorithm for Feature Selection Based On Optimization." The dataset was collected and compiled on April 20, 2022, at the "Bajwa Hospital, Dina Nagar, Gurdaspur, India". The dataset was created using image assets from three different eye modalities. It contains images of the eye categorised into four distinct categories: normal, cataract, glaucoma, and retinal disease. The dataset consists of 100 fundus images of cataract, glaucoma, retinal diseases each. The fourth class of images are healthy fundus eye images and 300 is the total number. The process of data fusion was used to finally construct the full dataset volume. Since, Image data is anonymized, and none of the individuals' private information is distributed.THIS DATASET IS ARCHIVED AT DANS/EASY, BUT NOT ACCESSIBLE HERE. TO VIEW A LIST OF FILES AND ACCESS THE FILES IN THIS DATASET CLICK ON THE DOI-LINK ABOV
    corecore