26 research outputs found
An Unsupervised Approach for Sentiment Analysis on Social Media Short Text Classification in Roman Urdu
During the last two decades, sentiment analysis, also known as opinion mining, has become one of the most explored research areas in Natural Language Processing (NIP) and data mining. Sentiment analysis focuses on the sentiments or opinions of consumers expressed over social media or different web sites. Due to exposure on the Internet, sentiment analysis has attracted vast numbers of researchers over the globe. A large amount of research has been conducted in English, Chinese, and other languages used worldwide. However, Roman Urdu has been neglected despite being the third most used language for communication in the world, covering millions of users around the globe. Although some techniques have been proposed for sentiment analysis in Roman Urdu, these techniques are limited to a specific domain or developed incorrectly due to the unavailability of language resources available for Roman Urdu. Therefore, in this article, we are proposing an unsupervised approach for sentiment analysis in Roman Urdu. First, the proposed model normalizes the text to overcome spelling variations of different words. After normalizing text, we have used Roman Urdu and English opinion lexicons to correctly identify users\u27 opinions from the text. We have also incorporated negation terms and stemming to assign polarities to each extracted opinion. Furthermore, our model assigns a score to each sentence on the basis of the polarities of extracted opinions and classifies each sentence as positive, negative, or neutral. In order to verify our approach, we have conducted experiments on two publicly available datasets for Roman Urdu and compared our approach with the existing model. Results have demonstrated that our approach outperforms existing models for sentiment analysis tasks in Roman Urdu. Furthermore, our approach does not suffer from domain dependency
Enhancing Rice Leaf Disease Classification: A Customized Convolutional Neural Network Approach
In modern agriculture, correctly identifying rice leaf diseases is crucial for maintaining crop health and promoting sustainable food production. This study presents a detailed methodology to enhance the accuracy of rice leaf disease classification. We achieve this by employing a Convolutional Neural Network (CNN) model specifically designed for rice leaf images. The proposed method achieved an accuracy of 0.914 during the final epoch, demonstrating highly competitive performance compared to other models, with low loss and minimal overfitting. A comparison was conducted with Transfer Learning Inception-v3 and Transfer Learning EfficientNet-B2 models, and the proposed method showed superior accuracy and performance. With the increasing demand for precision agriculture, models like the proposed one show great potential in accurately detecting and managing diseases, ultimately leading to improved crop yields and ecological sustainability
Dynamic generalized normal distribution optimization for feature selection
High dimensionality of data represents a major problem that affects the accuracy of the classification. This problem related with classification is mainly resulted from the availability of irrelevant features. Feature selection represents a solution to a problem by selecting the most informative features and discard the irrelevant features. Generalized normal distribution optimization (GNDO) represents a newly developed optimization that confirmed its outperformance in comparison with well-known optimization algorithms on parameter extraction for photovoltaic models. As an optimization algorithm, however, GNDO suffers from degraded performance when dealing with a problem with a high dimensionality. The main problems of GNDO include exploitation problem by falling into local optima problem. Also, GNDO has solutions diversity problem when it deals with data with high dimensionality. To alleviate the drawbacks of this algorithm and solve feature selection problems, a local search algorithm (LSA) is used. The new algorithm is called dynamic generalized normal distribution optimization (DGNDO), which includes the following main improvements to GNDO: it can improve the best solution to solve the local optima problem, it can improve solution diversity by improving the randomly selected solution, and it can improve both exploration and exploitation combined. To confirm the outperformance and efficiency of the new DGNDO algorithm, DGNDO algorithm is applied on 20 benchmarked datasets from UCI repository of data. In addition, DGNDO algorithm results are compared with seven well-known optimization algorithms using number of evaluation metrics including classification, accuracy, fitness, the number of selected features, statistical results using Wilcoxon test and convergence curves. The obtained results reveal the superiority of DGNDO algorithm over all other competing algorithms
Improved Reptile Search Optimization Algorithm using Chaotic map and Simulated Annealing for Feature Selection in Medical Filed
The increased volume of medical datasets has produced high dimensional features, negatively affecting machine learning (ML) classifiers. In ML, the feature selection process is fundamental for selecting the most relevant features and reducing redundant and irrelevant ones. The optimization algorithms demonstrate its capability to solve feature selection problems. Reptile Search Algorithm (RSA) is a new nature-inspired optimization algorithm that stimulates Crocodiles’ encircling and hunting behavior. The unique search of the RSA algorithm obtains promising results compared to other optimization algorithms. However, when applied to high-dimensional feature selection problems, RSA suffers from population diversity and local optima limitations. An improved metaheuristic optimizer, namely the Improved Reptile Search Algorithm (IRSA), is proposed to overcome these limitations and adapt the RSA to solve the feature selection problem. Two main improvements adding value to the standard RSA; the first improvement is to apply the chaos theory at the initialization phase of RSA to enhance its exploration capabilities in the search space. The second improvement is to combine the Simulated Annealing (SA) algorithm with the exploitation search to avoid the local optima problem. The IRSA performance was evaluated over 20 medical benchmark datasets from the UCI machine learning repository. Also, IRSA is compared with the standard RSA and state-of-the-art optimization algorithms, including Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Grasshopper Optimization algorithm (GOA) and Slime Mould Optimization (SMO). The evaluation metrics include the number of selected features, classification accuracy, fitness value, Wilcoxon statistical test (p-value), and convergence curve. Based on the results obtained, IRSA confirmed its superiority over the original RSA algorithm and other optimized algorithms on the majority of the medical datasets
Hybrid feature selection based on principal component analysis and grey wolf optimizer algorithm for Arabic news article classification
The rapid growth of electronic documents has resulted from the expansion and development of internet technologies. Text-documents classification is a key task in natural language processing that converts unstructured data into structured form and then extract knowledge from it. This conversion generates a high dimensional data that needs further analusis using data mining techniques like feature extraction, feature selection, and classification to derive meaningful insights from the data. Feature selection is a technique used for reducing dimensionality in order to prune the feature space and, as a result, lowering the computational cost and enhancing classification accuracy. This work presents a hybrid filter-wrapper method based on Principal Component Analysis (PCA) as a filter approach to select an appropriate and informative subset of features and Grey Wolf Optimizer (GWO) as wrapper approach (PCA-GWO) to select further informative features. Logistic Regression (LR) is used as an elevator to test the classification accuracy of candidate feature subsets produced by GWO. Three Arabic datasets, namely Alkhaleej, Akhbarona, and Arabiya, are used to assess the efficiency of the proposed method. The experimental results confirm that the proposed method based on PCA-GWO outperforms the baseline classifiers with/without feature selection and other feature selection approaches in terms of classification accuracy
Improved sine cosine algorithm with simulated annealing and singer chaotic map for Hadith classification
Feature selection (FS) represents an important task in classification. Hadith represents an example in which we can apply FS on it. Hadiths are the second major source of Islam after the Quran. Thousands of Hadiths are available in Islam, and these Hadiths are grouped into a number of classes. In the literature, there are many studies conducted for Hadiths classification. Sine Cosine Algorithm (SCA) is a new metaheuristic optimization algorithm. SCA algorithm is mainly based on exploring the search space using sine and cosine mathematical formulas to find the optimal solution. However, SCA, like other Optimization Algorithm (OA), suffers from the problem of local optima and solution diversity. In this paper, to overcome SCA problems and use it for the FS problem, two major improvements were introduced to the standard SCA algorithm. The first improvement includes the use of singer chaotic map within SCA to improve solutions diversity. The second improvement includes the use of the Simulated Annealing (SA) algorithm as a local search operator within SCA to improve its exploitation. In addition, the Gini Index (GI) is used to filter the resulted selected features to reduce the number of features to be explored by SCA. Furthermore, three new Hadith datasets were created. To evaluate the proposed Improved SCA (ISCA), the new three Hadiths datasets were used in our experiments. Furthermore, to confirm the generality of ISCA, we also applied it on 14 benchmark datasets from the UCI repository. The ISCA results were compared with the original SCA and the state-of-the-art algorithms such as Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Grasshopper Optimization Algorithm (GOA), and the most recent optimization algorithm, Harris Hawks Optimizer (HHO). The obtained results confirm the clear outperformance of ISCA in comparison with other optimization algorithms and Hadith classification baseline works. From the obtained results, it is inferred that ISCA can simultaneously improve the classification accuracy while it selects the most informative features
Improving explicit aspects extraction in sentiment analysis using optimized ruleset / Mohammad Ahmad Jomah Tubishat
Aspect extraction, also known as opinion target extraction, is the fine-grained identification of users’ opinion targets, such as the extraction of opinionated product aspects from customer reviews. Aspect extraction is considered as the core task in aspect-based sentiment analysis and other applications. Currently, many studies were conducted using dependency relation rules which give promising results. However, these dependency-based extraction approaches perform better on formal text as its accuracy is based on the dependency parser which gives correct results if the text follow the English rules and grammars. On the other hand, there are also many studies were conducted using sequential syntactic patterns which mimic and follow the ways users expressed their opinion without giving attention to the language rules but give better results on informal text. However, customer reviews normally are a mixed of both types of reviews including formal and informal text. In addition, extraction rules including either pattern-based or dependency-based rules should be selected in a correct way to remove the irrelevant rules and minimize the extraction errors Thus, in this study, to select the most effective extraction rules, an improved version of Whale Optimization Algorithm (IWOA) is developed and applied to a full set of rules. This set of rules includes combination of new created extraction rules with dependency-based rules and pattern-based rules from the previous studies. In addition, the improved WOA is developed by using Cauchy mutation and local search algorithm to solve its local optima problem and improve population diversity. The algorithm was then applied to the full set of 126 rules. Finally, after the aspects list was obtained from the selected rules, a pruning algorithm (PA) is developed to remove the incorrect aspects and retain the correct aspects. Our results from the conducted experiments revealed that the proposed algorithm outperform the state-of-the-art aspect extraction algorithms and optimization algorithms. The IWOA algorithm outperforms other optimization algorithms includes native WOA, PSO, MFO, FFA, GWO, MVO, SSA, and SCA and achieved 86% precision, 94% recall, and 90% F-measure respectively. IWOA superiority resulted because of its ability to escape from local optima and balance between exploitation and exploration. In addition, after application of PA, IWOA+PA outperforms other state-of-the-art aspect extraction works and achieved 92% precision, 93% recall, and 92% F-measure respectively
Implicit aspect extraction in sentiment analysis: Review, taxonomy, oppportunities, and open challenges
Sentiment analysis is a text classification branch, which is defined as the process of extracting sentiment terms (i.e. feature/aspect, or opinion) and determining their opinion semantic orientation. At aspect level, aspect extraction is the core task for sentiment analysis which can either be implicit or explicit aspects. The growth of sentiment analysis has resulted in the emergence of various techniques for both explicit and implicit aspect extraction. However, majority of the research attempts targeted explicit aspect extraction, which indicates that there is a lack of research on implicit aspect extraction. This research provides a review of implicit aspect/features extraction techniques from different perspectives. The first perspective is making a comparison analysis for the techniques available for implicit term extraction with a brief summary of each technique. The second perspective is classifying and comparing the performance, datasets, language used, and shortcomings of the available techniques. In this study, over 50 articles have been reviewed, however, only 45 articles on implicit aspect extraction that span from 2005 to 2016 were analyzed and discussed. Majority of the researchers on implicit aspects extraction rely heavily on unsupervised methods in their research, which makes about 64% of the 45 articles, followed by supervised methods of about 27%, and lastly semi-supervised of 9%. In addition, 25 articles conducted the research work solely on product reviews, and 5 articles conducted their research work using product reviews jointly with other types of data, which makes product review datasets the most frequently used data type compared to other types. Furthermore, research on implicit aspect features extraction has focused on English and Chinese languages compared to other languages. Finally, this review also provides recommendations for future research directions and open problems
Sentiment Analysis of Using ChatGPT in Education
This paper presents a study on the use of the Chat Generative Pretrained Transformer (ChatGPT) in education. In this work, we propose a sentiment analysis model of tweets related to the use of the ChatGPT in education. The purpose of this research is to identify common sentiments, topics, and perspectives that are expressed towards ChatGPT in the education field based on the data collected from Twitter. Twitter was used to collect 11830 tweets about the use of ChatGPT in education. Topics and emotions expressed in the tweets were extracted using NLP algorithms and organized into distinct groups. Also, the most frequent words in the positive and negative opinion words are determined. The findings of the paper indicate that most tweets about ChatGPT are either positive or neutral, with a small percentage expressing negative sentiments. In addition, the study analyzes the sentiments expressed in tweets about the employment of ChatGPT in education using four different classifiers: Naive Bayes (NB), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Random Forest (RF). According to the results, the SVM classifier has the highest accuracy of 81.4 percent
Improved Salp Swarm Algorithm based on opposition based learning and novel local search algorithm for feature selection
Many fields such as data science, data mining suffered from the rapid growth of data volume and high data dimensionality. The main problems which are faced by these fields include the high computational cost, memory cost, and low accuracy performance. These problems will occur because these fields are mainly used machine learning classifiers. However, machine learning accuracy is affected by the noisy and irrelevant features. In addition, the computational and memory cost of the machine learning is mainly affected by the size of the used datasets. Thus, to solve these problems, feature selection can be used to select optimal subset of features and reduce the data dimensionality. Feature selection represents an important preprocessing step in many intelligent and expert systems such as intrusion detection, disease prediction, and sentiment analysis. An improved version of Salp Swarm Algorithm (ISSA) is proposed in this study to solve feature selection problems and select the optimal subset of features in wrapper-mode. Two main improvements were included into the original SSA algorithm to alleviate its drawbacks and adapt it for feature selection problems. The first improvement includes the use of Opposition Based Learning (OBL) at initialization phase of SSA to improve its population diversity in the search space. The second improvement includes the development and use of new Local Search Algorithm with SSA to improve its exploitation. To confirm and validate the performance of the proposed improved SSA (ISSA), ISSA was applied on 18 datasets from UCI repository. In addition, ISSA was compared with four well-known optimization algorithms such as Genetic Algorithm, Particle Swarm Optimization, Grasshopper Optimization Algorithm, and Ant Lion Optimizer. In these experiments four different assessment criteria were used. The rdemonstrate that ISSA outperforms all baseline algorithms in terms of fitness values, accuracy, convergence curves, and feature reduction in most of the used datasets. The wrapper feature selection mode can be used in different application areas of expert and intelligent systems and this is confirmed from the obtained results over different types of datasets. © 2019 Elsevier Lt