4 research outputs found

    Aco-based feature selection algorithm for classification

    Get PDF
    Dataset with a small number of records but big number of attributes represents a phenomenon called “curse of dimensionality”. The classification of this type of dataset requires Feature Selection (FS) methods for the extraction of useful information. The modified graph clustering ant colony optimisation (MGCACO) algorithm is an effective FS method that was developed based on grouping the highly correlated features. However, the MGCACO algorithm has three main drawbacks in producing a features subset because of its clustering method, parameter sensitivity, and the final subset determination. An enhanced graph clustering ant colony optimisation (EGCACO) algorithm is proposed to solve the three (3) MGCACO algorithm problems. The proposed improvement includes: (i) an ACO feature clustering method to obtain clusters of highly correlated features; (ii) an adaptive selection technique for subset construction from the clusters of features; and (iii) a genetic-based method for producing the final subset of features. The ACO feature clustering method utilises the ability of various mechanisms such as intensification and diversification for local and global optimisation to provide highly correlated features. The adaptive technique for ant selection enables the parameter to adaptively change based on the feedback of the search space. The genetic method determines the final subset, automatically, based on the crossover and subset quality calculation. The performance of the proposed algorithm was evaluated on 18 benchmark datasets from the University California Irvine (UCI) repository and nine (9) deoxyribonucleic acid (DNA) microarray datasets against 15 benchmark metaheuristic algorithms. The experimental results of the EGCACO algorithm on the UCI dataset are superior to other benchmark optimisation algorithms in terms of the number of selected features for 16 out of the 18 UCI datasets (88.89%) and the best in eight (8) (44.47%) of the datasets for classification accuracy. Further, experiments on the nine (9) DNA microarray datasets showed that the EGCACO algorithm is superior than the benchmark algorithms in terms of classification accuracy (first rank) for seven (7) datasets (77.78%) and demonstrates the lowest number of selected features in six (6) datasets (66.67%). The proposed EGCACO algorithm can be utilised for FS in DNA microarray classification tasks that involve large dataset size in various application domains

    An enhanced binary bat and Markov clustering algorithms to improve event detection for heterogeneous news text documents

    Get PDF
    Event Detection (ED) works on identifying events from various types of data. Building an ED model for news text documents greatly helps decision-makers in various disciplines in improving their strategies. However, identifying and summarizing events from such data is a non-trivial task due to the large volume of published heterogeneous news text documents. Such documents create a high-dimensional feature space that influences the overall performance of the baseline methods in ED model. To address such a problem, this research presents an enhanced ED model that includes improved methods for the crucial phases of the ED model such as Feature Selection (FS), ED, and summarization. This work focuses on the FS problem by automatically detecting events through a novel wrapper FS method based on Adapted Binary Bat Algorithm (ABBA) and Adapted Markov Clustering Algorithm (AMCL), termed ABBA-AMCL. These adaptive techniques were developed to overcome the premature convergence in BBA and fast convergence rate in MCL. Furthermore, this study proposes four summarizing methods to generate informative summaries. The enhanced ED model was tested on 10 benchmark datasets and 2 Facebook news datasets. The effectiveness of ABBA-AMCL was compared to 8 FS methods based on meta-heuristic algorithms and 6 graph-based ED methods. The empirical and statistical results proved that ABBAAMCL surpassed other methods on most datasets. The key representative features demonstrated that ABBA-AMCL method successfully detects real-world events from Facebook news datasets with 0.96 Precision and 1 Recall for dataset 11, while for dataset 12, the Precision is 1 and Recall is 0.76. To conclude, the novel ABBA-AMCL presented in this research has successfully bridged the research gap and resolved the curse of high dimensionality feature space for heterogeneous news text documents. Hence, the enhanced ED model can organize news documents into distinct events and provide policymakers with valuable information for decision making

    Feature extraction and selection algorithm based on self adaptive ant colony system for sky image classification

    Get PDF
    Sky image classification is crucial in meteorology to forecast weather and climatic conditions. The fine-grained cloud detection and recognition (FGCDR) algorithm is use to extract colour, inside texture and neighbour texture features from multiview of superpixels sky images. However, the FGCDR produced a substantial amount of redundant and insignificant features. The ant colony optimisation (ACO) algorithm have been used to select feature subset. However, the ACO suffers from premature convergence which leads to poor feature subset. Therefore, an improved feature extraction and selection for sky image classification (FESSIC) algorithm is proposed. This algorithm consists of (i) Gaussian smoothness standard deviation method that formulates informative features within sky images; (ii) nearest-threshold based technique that converts feature map into a weighted directed graph to represent relationship between features; and (iii) an ant colony system with self-adaptive parameter technique for local pheromone update. The performance of FESSIC was evaluated against ten benchmark image classification algorithms and six classifiers on four ground-based sky image datasets. The Friedman test result is presented for the performance rank of six benchmark feature selection algorithms and FESSIC algorithm. The Man-Whitney U test is then performed to statistically evaluate the significance difference of the second rank and FESSIC algorithms. The experimental results for the proposed algorithm are superior to the benchmark image classification algorithms in terms of similarity value on Kiel, SWIMCAT and MGCD datasets. FESSIC outperforms other algorithms for average classification accuracy for the KSVM, MLP, RF and DT classifiers. The Friedman test has shown that the FESSIC has the first rank for all classifiers. Furthermore, the result of Man-Whitney U test indicates that FESSIC is significantly better than the second rank benchmark algorithm for all classifiers. In conclusion, the FESSIC can be utilised for image classification in various applications such as disaster management, medical diagnosis, industrial inspection, sports management, and content-based image retrieval

    Enhanced Harris's Hawk algorithm for continuous multi-objective optimization problems

    Get PDF
    Multi-objective swarm intelligence-based (MOSI-based) metaheuristics were proposed to solve multi-objective optimization problems (MOPs) with conflicting objectives. Harris’s hawk multi-objective optimizer (HHMO) algorithm is a MOSIbased algorithm that was developed based on the reference point approach. The reference point is determined by the decision maker to guide the search process to a particular region in the true Pareto front. However, HHMO algorithm produces a poor approximation to the Pareto front because lack of information sharing in its population update strategy, equal division of convergence parameter and randomly generated initial population. A two-step enhanced non-dominated sorting HHMO (2SENDSHHMO) algorithm has been proposed to solve this problem. The algorithm includes (i) a population update strategy which improves the movement of hawks in the search space, (ii) a parameter adjusting strategy to control the transition between exploration and exploitation, and (iii) a population generating method in producing the initial candidate solutions. The population update strategy calculates a new position of hawks based on the flush-and-ambush technique of Harris’s hawks, and selects the best hawks based on the non-dominated sorting approach. The adjustment strategy enables the parameter to adaptively changed based on the state of the search space. The initial population is produced by generating quasi-random numbers using Rsequence followed by adapting the partial opposition-based learning concept to improve the diversity of the worst half in the population of hawks. The performance of the 2S-ENDSHHMO has been evaluated using 12 MOPs and three engineering MOPs. The obtained results were compared with the results of eight state-of-the-art multi-objective optimization algorithms. The 2S-ENDSHHMO algorithm was able to generate non-dominated solutions with greater convergence and diversity in solving most MOPs and showed a great ability in jumping out of local optima. This indicates the capability of the algorithm in exploring the search space. The 2S-ENDSHHMO algorithm can be used to improve the search process of other MOSI-based algorithms and can be applied to solve MOPs in applications such as structural design and signal processing
    corecore