Search CORE

353 research outputs found

Short Text Classification Using An Enhanced Term Weighting Scheme And Filter-Wrapper Feature Selection

Author: Alsmadi Issa Mohammad Ibrahim
Publication venue
Publication date: 01/12/2018
Field of study

Social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Social networks, such as Twitter, are common mechanisms through which people can share information. The utilization of data that are available through social media for many applications is gradually increasing. Redundancy and noise in short texts are common problems in social media and in different applications that use short text. However, the shortness and high sparsity of short text lead to poor classification performance. Employing a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. This research aims to investigate and develop solutions for feature discrimination and selection in short texts classification. For feature discrimination, we introduce a term weighting approach namely, simple supervised weight (SW), which considers the special nature of short text in terms of term strength and distribution. To address the drawbacks of using existing feature selection with short text, this thesis proposes a filter-wrapper feature selection approach. In the first stage, we propose an adaptive filter-based feature selection method that is derived from the odd ratio method, used in reducing the dimensionality of feature space. In the second stage, grey wolf optimization (GWO) algorithm, a new heuristic search algorithm, uses the SVM accuracy as a fitness function to find the optimal subset feature

Repository@USM

Sentiment Analysis in Social Networks Using Social Spider Optimization Algorithm

Author: Alatas* Bilal
Baydogan Cem
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2021
Field of study

In this study, a new swarm intelligence-based algorithm called Social Spider Algorithm (SSA), which is based on a simulation of the collaborative behaviours of spiders, was adapted for the first time for sentiment analysis (SA) within data obtained from Twitter. The SA problem was modelled as a search problem, with datasets considered as the search space and SSA modelled as a search strategy by determining an appropriate encoding scheme and objective function. The success of the SSA was compared with different Machine Learning (ML) algorithms within the same real datasets based on different metrics. Although this study is the first usage of SSA for the SA problem and there is no optimization for it, the attained results were promising and could provide new direction to related research about the use of optimized different artificial intelligence search algorithms for these types of online social network analysis problems. This study also introduced a new application domain for the optimization algorithms

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Improved feature selection using a hybrid side-blotched lizard algorithm and genetic algorithm approach

Author: Abdel-aal Amr
El-Henawy Ibrahim
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2023
Field of study

Feature selection entails choosing the significant features among a wide collection of original features that are essential for predicting test data using a classifier. Feature selection is commonly used in various applications, such as bioinformatics, data mining, and the analysis of written texts, where the dataset contains tens or hundreds of thousands of features, making it difficult to analyze such a large feature set. Removing irrelevant features improves the predictor performance, making it more accurate and cost-effective. In this research, a novel hybrid technique is presented for feature selection that aims to enhance classification accuracy. A hybrid binary version of side-blotched lizard algorithm (SBLA) with genetic algorithm (GA), namely SBLAGA, which combines the strengths of both algorithms is proposed. We use a sigmoid function to adapt the continuous variables values into a binary one, and evaluate our proposed algorithm on twenty-three standard benchmark datasets. Average classification accuracy, average number of selected features and average fitness value were the evaluation criteria. According to the experimental results, SBLAGA demonstrated superior performance compared to SBLA and GA with regards to these criteria. We further compare SBLAGA with four wrapper feature selection methods that are widely used in the literature, and find it to be more efficient

Institute of Advanced Engineering and Science

Statistical Validation of ACO-KNN Algorithm for Sentiment Analysis

Author: Abu Bakar Azuraliza
Ahmad Siti Rohaidah
Mohd Yusop Nurhafizah Moziyana
Yaakub Mohd Ridzwan
Publication venue: Journal of Telecommunication, Electronic and Computer Engineering (JTEC)
Publication date: 15/09/2017
Field of study

This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbour (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, the IG-GA, and the IG-RSAR algorithms. The dependency relation algorithm was used to identify actual features commented by customers by linking the dependency relation between product feature and sentiment words in customers sentences. This study evaluated the performance of the ACOKNN algorithm using precision, recall, and F-score, which was validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data

Universiti Teknikal Malaysia Melaka: UTeM Open Journal System

Automated Optimization Deep Learning Model for Assessment and Guidance System Through Natural Language Processing with Reduction of Anxiety Among Students

Author: Macwes Marilyn W.
Peng Jing
Publication venue: Auricle Global Society of Education and Research
Publication date: 30/07/2023
Field of study

The Assisted Assessment and Guidance System serves as a valuable tool in supporting individuals' learning, growth, and development. The Assisted Assessment and Guidance System with Natural Language Processing (NLP) is an innovative software application designed to provide personalized and intelligent support for assessment and guidance processes in various domains. NLP techniques are employed to analyze and understand human language, allowing the system to extract valuable insights from text-based data and provide tailored feedback and guidance. This paper proposed an Integrated Optimization Directional Clustering Classification (IODCc) for assessment of the foreign language anxiety. Additionally, the paper introduces an Integrated Optimization Directional Clustering Classification (IODCc) approach for assessing foreign language anxiety. This approach incorporates two optimization models, namely Black Widow Optimization (BWO) and Seahorse Optimization (SHO). BWO and SHO are metaheuristic optimization algorithms that simulate the behaviors of black widow spiders and seahorses, respectively, to improve the accuracy of the assessment process. The integration of these optimization models within the IODCc approach aims to enhance the accuracy and effectiveness of the foreign language anxiety assessment. Simulation analysis is performed for the data collected from the 1000 foreign language students. The experimental analysis expressed that the proposed IODCc model achieves an accuracy of 99% for the classification. The findings suggested that through pre-training of languages, the anxiety of the students will be reduced

International Journal on Recent and Innovation Trends in Computing and Communication

Migrating Birds Optimization-Based Feature Selection for Text Classification

Author: Kaya Cem
Kaya Murat
Kilimci Zeynep Hilal
Uysal Mitat
Publication venue
Publication date: 04/01/2024
Field of study

This research introduces a novel approach, MBO-NB, that leverages Migrating Birds Optimization (MBO) coupled with Naive Bayes as an internal classifier to address feature selection challenges in text classification having large number of features. Focusing on computational efficiency, we preprocess raw data using the Information Gain algorithm, strategically reducing the feature count from an average of 62221 to 2089. Our experiments demonstrate MBO-NB's superior effectiveness in feature reduction compared to other existing techniques, emphasizing an increased classification accuracy. The successful integration of Naive Bayes within MBO presents a well-rounded solution. In individual comparisons with Particle Swarm Optimization (PSO), MBO-NB consistently outperforms by an average of 6.9% across four setups. This research offers valuable insights into enhancing feature selection methods, providing a scalable and effective solution for text classificatio

arXiv.org e-Print Archive

A systematic literature review on meta-heuristic based feature selection techniques for text classification

Author: Aamir Muhammad
Al-shalif Sarah Abdulkarem
Ghaban Wad
Ibrahim Noraini
Saeed Faisal
Senan Norhalina
Sharif Wareesa
Publication venue: Peer J
Publication date: 12/06/2024
Field of study

Feature selection (FS) is a critical step in many data science-based applications, especially in text classification, as it includes selecting relevant and important features from an original feature set. This process can improve learning accuracy, streamline learning duration, and simplify outcomes. In text classification, there are often many excessive and unrelated features that impact performance of the applied classifiers, and various techniques have been suggested to tackle this problem, categorized as traditional techniques and meta-heuristic (MH) techniques. In order to discover the optimal subset of features, FS processes require a search strategy, and MH techniques use various strategies to strike a balance between exploration and exploitation. The goal of this research article is to systematically analyze the MH techniques used for FS between 2015 and 2022, focusing on 108 primary studies from three different databases such as Scopus, Science Direct, and Google Scholar to identify the techniques used, as well as their strengths and weaknesses. The findings indicate that MH techniques are efficient and outperform traditional techniques, with the potential for further exploration of MH techniques such as Ringed Seal Search (RSS) to improve FS in several applications

BCU Open Access

Hybrid feature selection based on principal component analysis and grey wolf optimizer algorithm for Arabic news article classification

Author: Afyouni Imad
Alomari Osama Ahmad
Elnagar Ashraf
Hashem Ibrahim Abaker
Nassif Ali Bou
Shahin Ismail
Tubishat Mohammad
Publication venue: ZU Scholars
Publication date: 17/11/2022
Field of study

The rapid growth of electronic documents has resulted from the expansion and development of internet technologies. Text-documents classification is a key task in natural language processing that converts unstructured data into structured form and then extract knowledge from it. This conversion generates a high dimensional data that needs further analusis using data mining techniques like feature extraction, feature selection, and classification to derive meaningful insights from the data. Feature selection is a technique used for reducing dimensionality in order to prune the feature space and, as a result, lowering the computational cost and enhancing classification accuracy. This work presents a hybrid filter-wrapper method based on Principal Component Analysis (PCA) as a filter approach to select an appropriate and informative subset of features and Grey Wolf Optimizer (GWO) as wrapper approach (PCA-GWO) to select further informative features. Logistic Regression (LR) is used as an elevator to test the classification accuracy of candidate feature subsets produced by GWO. Three Arabic datasets, namely Alkhaleej, Akhbarona, and Arabiya, are used to assess the efficiency of the proposed method. The experimental results confirm that the proposed method based on PCA-GWO outperforms the baseline classifiers with/without feature selection and other feature selection approaches in terms of classification accuracy

ZU Scholars (Zayed University)

Evolving CNN-LSTM Models for Time Series Prediction Using Enhanced Grey Wolf Optimizer

Author: Lim Chee Peng
Xie Hailun
Zhang Li
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

In this research, we propose an enhanced Grey Wolf Optimizer (GWO) for designing the evolving Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) networks for time series analysis. To overcome the probability of stagnation at local optima and a slow convergence rate of the classical GWO algorithm, the newly proposed variant incorporates four distinctive search mechanisms. They comprise a nonlinear exploration scheme for dynamic search territory adjustment, a chaotic leadership dispatching strategy among the dominant wolves, a rectified spiral local exploitation action, as well as probability distribution-based leader enhancement. The evolving CNN-LSTM models are subsequently devised using the proposed GWO variant, where the network topology and learning hyperparameters are optimized for time series prediction and classification tasks. Evaluated using a number of benchmark problems, the proposed GWO-optimized CNN-LSTM models produce statistically significant results over those from several classical search methods and advanced GWO and Particle Swarm Optimization variants. Comparing with the baseline methods, the CNN-LSTM networks devised by the proposed GWO variant offer better representational capacities to not only capture the vital feature interactions, but also encapsulate the sophisticated dependencies in complex temporal contexts for undertaking time-series tasks

Northumbria University Research Portal

Deakin Research Online