26 research outputs found
Evolutionary Multiobjective Feature Selection for Sentiment Analysis
AuthorSentiment analysis is one of the prominent research areas in data mining and knowledge discovery, which has proven to be an effective technique for monitoring public opinion. The big data era with a high volume of data generated by a variety of sources has provided enhanced opportunities for utilizing sentiment analysis in various domains. In order to take best advantage of the high volume of data for accurate sentiment analysis, it is essential to clean the data before the analysis, as irrelevant or redundant data will hinder extracting valuable information. In this paper, we propose a hybrid feature selection algorithm to improve the performance of sentiment analysis tasks. Our proposed sentiment analysis approach builds a binary classification model based on two feature selection techniques: an entropy-based metric and an evolutionary algorithm. We have performed comprehensive experiments in two different domains using a benchmark dataset, Stanford Sentiment Treebank, and a real-world dataset we have created based on World Health Organization (WHO) public speeches regarding COVID-19. The proposed feature selection model is shown to achieve significant performance improvements in both datasets, increasing classification accuracy for all utilized machine learning and text representation technique combinations. Moreover, it achieves over 70% reduction in feature size, which provides efficiency in computation time and space
An Intelligent Hybrid Sentiment Analyzer for Personal Protective Medical Equipments Based on Word Embedding Technique: The COVID-19 Era
Due to the accelerated growth of symmetrical sentiment data across different platforms,
experimenting with different sentiment analysis (SA) techniques allows for better decision-making
and strategic planning for different sectors. Specifically, the emergence of COVID-19 has enriched
the data of people’s opinions and feelings about medical products. In this paper, we analyze people’s
sentiments about the products of a well-known e-commerce website named Alibaba.com. People’s
sentiments are experimented with using a novel evolutionary approach by applying advanced
pre-trained word embedding for word presentations and combining them with an evolutionary
feature selection mechanism to classify these opinions into different levels of ratings. The proposed
approach is based on harmony search algorithm and different classification techniques including
random forest, k-nearest neighbor, AdaBoost, bagging, SVM, and REPtree to achieve competitive
results with the least possible features. The experiments are conducted on five different datasets
including medical gloves, hand sanitizer, medical oxygen, face masks, and a combination of all these
datasets. The results show that the harmony search algorithm successfully reduced the number of
features by 94.25%, 89.5%, 89.25%, 92.5%, and 84.25% for the medical glove, hand sanitizer, medical
oxygen, face masks, and whole datasets, respectively, while keeping a competitive performance in
terms of accuracy and root mean square error (RMSE) for the classification techniques and decreasing
the computational time required for classification
Feature Selection for Document Classification : Case Study of Meta-heuristic Intelligence and Traditional Approaches
Doctor of Philosophy (Computer Engineering), 2020Nowadays, the culture for accessing news around the world is changed from paper to electronic format and the rate of publication for newspapers and magazines on website are increased dramatically. Meanwhile, text feature selection for the automatic document classification (ADC) is becoming a big challenge because of the unstructured nature of text feature, which is called “multi-dimension feature problem”. On the other hand, various powerful schemes dealing with text feature selection are being developed continuously nowadays, but there still exists a research gap for “optimization of feature selection problem (OFSP)”, which can be looked for the global optimal features. Meanwhile, the capacity of meta-heuristic intelligence for knowledge discovery process (KDP) is also become the critical role to overcome NP-hard problem of OFSP by providing effective performance and efficient computation time. Therefore, the idea of meta-heuristic based approach for optimization of feature selection is proposed in this research to search the global optimal features for ADC.
In this thesis, case study of meta-heuristic intelligence and traditional approaches for feature selection optimization process in document classification is observed. It includes eleven meta-heuristic algorithms such as Ant Colony search, Artificial Bee Colony search, Bat search, Cuckoo search, Evolutionary search, Elephant search, Firefly search, Flower search, Genetic search, Rhinoceros search, and Wolf search, for searching the optimal feature subset for document classification. Then, the results of proposed model are compared with three traditional search algorithms like Best First search (BFS), Greedy Stepwise (GS), and Ranker search (RS). In addition, the framework of data mining is applied. It involves data preprocessing, feature engineering, building learning model and evaluating the performance of proposed meta-heuristic intelligence-based feature selection using various performance and computation complexity evaluation schemes. In data processing, tokenization, stop-words handling, stemming and lemmatizing, and normalization are applied. In feature engineering process, n-gram TF-IDF feature extraction is used for implementing feature vector and both filter and wrapper approach are applied for observing different cases. In addition, three different classifiers like J48, Naïve Bayes, and Support Vector Machine, are used for building the document classification model. According to the results, the proposed system can reduce the number of selected features dramatically that can deteriorate learning model performance. In addition, the selected global subset features can yield better performance than traditional search according to single objective function of proposed model
A conditional opposition-based particle swarm optimization for feature selection
© 2021 The Authors. Published by Taylor & Francis. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.1080/09540091.2021.2002266Because of the existence of irrelevant, redundant, and noisy attributes in large datasets, the accuracy of a classification model has degraded. Hence, feature selection is a necessary pre-processing stage to select the important features that may considerably increase the efficiency of underlying classification algorithms. As a popular metaheuristic algorithm, particle swarm optimization has successfully applied to various feature selection approaches. Nevertheless, particle swarm optimization tends to suffer from immature convergence and low convergence rate. Besides, the imbalance between exploration and exploitation is another key issue that can significantly affect the performance of particle swarm optimization. In this paper, a conditional opposition-based particle swarm optimization is proposed and used to develop a wrapper feature selection. Two schemes, namely opposition-based learning and conditional strategy are introduced to enhance the performance of the particle swarm optimization. Twenty-four benchmark datasets are used to validate the performance of the proposed approach. Furthermore, nine metaheuristics are chosen for performance verification. The findings show the supremacy of the proposed approach not only in obtaining high prediction accuracy but also in small feature sizes
Discriminatory Expressions to Produce Interpretable Models in Short Documents
Social Networking Sites (SNS) are one of the most important ways of
communication. In particular, microblogging sites are being used as analysis
avenues due to their peculiarities (promptness, short texts...). There are
countless researches that use SNS in novel manners, but machine learning has
focused mainly in classification performance rather than interpretability
and/or other goodness metrics. Thus, state-of-the-art models are black boxes
that should not be used to solve problems that may have a social impact. When
the problem requires transparency, it is necessary to build interpretable
pipelines. Although the classifier may be interpretable, resulting models are
too complex to be considered comprehensible, making it impossible for humans to
understand the actual decisions. This paper presents a feature selection
mechanism that is able to improve comprehensibility by using less but more
meaningful features while achieving good performance in microblogging contexts
where interpretability is mandatory. Moreover, we present a ranking method to
evaluate features in terms of statistical relevance and bias. We conducted
exhaustive tests with five different datasets in order to evaluate
classification performance, generalisation capacity and complexity of the
model. Results show that our proposal is better and the most stable one in
terms of accuracy, generalisation and comprehensibility
Advances in Artificial Intelligence: Models, Optimization, and Machine Learning
The present book contains all the articles accepted and published in the Special Issue “Advances in Artificial Intelligence: Models, Optimization, and Machine Learning” of the MDPI Mathematics journal, which covers a wide range of topics connected to the theory and applications of artificial intelligence and its subfields. These topics include, among others, deep learning and classic machine learning algorithms, neural modelling, architectures and learning algorithms, biologically inspired optimization algorithms, algorithms for autonomous driving, probabilistic models and Bayesian reasoning, intelligent agents and multiagent systems. We hope that the scientific results presented in this book will serve as valuable sources of documentation and inspiration for anyone willing to pursue research in artificial intelligence, machine learning and their widespread applications
Applied Metaheuristic Computing
For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC
Applied Methuerstic computing
For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC