43 research outputs found

    Feature selection methods in Persian sentiment analysis

    Get PDF
    With the enormous growth of digital content in internet, various types of online reviews such as product and movie reviews present a wealth of subjective information that can be very helpful for potential users. Sentiment analysis aims to use automated tools to detect subjective information from reviews. Up to now as there are few researches conducted on feature selection in sentiment analysis, there are very rare works for Persian sentiment analysis. This paper considers the problem of sentiment classification using different feature selection methods for online customer reviews in Persian language. Three of the challenges of Persian text are using of a wide variety of declensional suffixes, different word spacing and many informal or colloquial words. In this paper we study these challenges by proposing a model for sentiment classification of Persian review documents. The proposed model is based on stemming and feature selection and is employed Naive Bayes algorithm for classification. We evaluate the performance of the model on a collection of cellphone reviews, where the results show the effectiveness of the proposed approache

    Feature Selection based Sentiment Analysis on US Airline Twitter Data

    Get PDF
    Review document emotions are classified using sentiment analysis. Researchers grade features to remove non-informative and noisy attributes with low grades to improve classification accuracy. This paper utilizes six different NLP models to predict user sentiments based on Twitter reviews about airlines, using the Twitter US Airline Sentiment dataset. The best-performing models from both machine learning (K-nearest neighbor, Random forest, and Multinomial Naive Bayes) and deep learning (Artificial Neural Networks, LSTM, and Bidirectional LSTM with Glove embeddings) were implemented through Anaconda and Google Colab platforms. This paper introduces a new type of feature dimensionality technique termed "inquiry extension grade (IEG)," inspired by the inquiry extension term weighting technique. Additionally, we modified the traditional TF-IDF method, referred to as "improved TF-IIDF (IFFIDF)," specifically tailored for processing unbalanced text collections. To assess the effectiveness of the proposed methods, a series of simulations were conducted. The results indicate that the combination of IEG-ITFIDF Vectorization and Bi-LSTM with Glove embeddings yielded the best accuracy of 94.26% in sentiment classification for the Twitter US Airline Sentiment dataset

    A Clustering and Associativity Analysis Based Probabilistic Method for Web Page Prediction

    Get PDF
    Today all the information, resources are available online through websites and web page. To access any instant information about any product, institution or organization, users can access the online available web pages. In this work, a three stage model is provided for more intelligent web page prediction. The method used the clustering and associativity analysis with rule formulation to improve the prediction results. The CMeans clustering is applied in this prior stage to identify the sessions with high and low usage of web pages. Once the clustering is done, the rule is defined to identify the sessions with page occurrence more than average. In the final stage, the neuro-fuzzy is applied to perform the web page prediction. The result shows that the model has provided the effective derivation on web page visits

    Statistical Validation of ACO-KNN Algorithm for Sentiment Analysis

    Get PDF
    This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbour (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, the IG-GA, and the IG-RSAR algorithms. The dependency relation algorithm was used to identify actual features commented by customers by linking the dependency relation between product feature and sentiment words in customers sentences. This study evaluated the performance of the ACOKNN algorithm using precision, recall, and F-score, which was validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data

    Potential of ChatGPT in predicting stock market trends based on Twitter Sentiment Analysis

    Full text link
    The rise of ChatGPT has brought a notable shift to the AI sector, with its exceptional conversational skills and deep grasp of language. Recognizing its value across different areas, our study investigates ChatGPT's capacity to predict stock market movements using only social media tweets and sentiment analysis. We aim to see if ChatGPT can tap into the vast sentiment data on platforms like Twitter to offer insightful predictions about stock trends. We focus on determining if a tweet has a positive, negative, or neutral effect on two big tech giants Microsoft and Google's stock value. Our findings highlight a positive link between ChatGPT's evaluations and the following days stock results for both tech companies. This research enriches our view on ChatGPT's adaptability and emphasizes the growing importance of AI in shaping financial market forecasts.Comment: total 11 pages including references, 4 figures and one tabl

    Evolutionary Multiobjective Feature Selection for Sentiment Analysis

    Get PDF
    AuthorSentiment analysis is one of the prominent research areas in data mining and knowledge discovery, which has proven to be an effective technique for monitoring public opinion. The big data era with a high volume of data generated by a variety of sources has provided enhanced opportunities for utilizing sentiment analysis in various domains. In order to take best advantage of the high volume of data for accurate sentiment analysis, it is essential to clean the data before the analysis, as irrelevant or redundant data will hinder extracting valuable information. In this paper, we propose a hybrid feature selection algorithm to improve the performance of sentiment analysis tasks. Our proposed sentiment analysis approach builds a binary classification model based on two feature selection techniques: an entropy-based metric and an evolutionary algorithm. We have performed comprehensive experiments in two different domains using a benchmark dataset, Stanford Sentiment Treebank, and a real-world dataset we have created based on World Health Organization (WHO) public speeches regarding COVID-19. The proposed feature selection model is shown to achieve significant performance improvements in both datasets, increasing classification accuracy for all utilized machine learning and text representation technique combinations. Moreover, it achieves over 70% reduction in feature size, which provides efficiency in computation time and space

    Feature Selection with IG-R for Improving Performance of Intrusion Detection System

    Get PDF
    As the popularity of the internet computer continued to grow and become an indispensable in human life, the security of computer network has become an important issue in computer security field. The Intrusion Detection System (IDS) is a system used in computer security for network security. The feature selection stage of IDS is considered to be the most critical stage in IDS. This stage is very costly both in efforts and time. However, many machine learning approaches have been presented to improve this stage in order to improve the performance of an IDS. However, these approaches did not give desirable results with respect to the detection accuracy in the IDS. A novel technique is proposed in this paper combining the Information Gain and Ranker (IG+R) method as the feature selection strategy with Naïve Bayes (NB), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) as the classifiers. The performance of these IG+R-NB, IG+R-SVM, and IG+R-KNN was evaluated on NSLKDD dataset. The experimental results of our proposed method gave high accuracy and low false alarm rate. The results obtained was compared and benchmarked with existing works. The results of this paper outperformed the existing approaches in terms of the detection accuracy

    Implementation of Particle Swarm Optimization on Sentiment Analysis of Cyberbullying using Random Forest

    Get PDF
    Social media has exerted a significant influence on the lives of the majority of individuals in the contemporary era. It not only enables communication among people within specific environments but also facilitates user connectivity in the virtual realm. Instagram is a social media platform that plays a pivotal role in the sharing of information and fostering communication among its users through the medium of photos and videos, which can be commented on by other users. The utilization of Instagram is consistently growing each year, thereby potentially yielding both positive and negative consequences. One prevalent negative consequence that frequently arises is cyberbullying. Conducting sentiment analysis on cyberbullying data can provide insights into the effectiveness of the employed methodology. This research was conducted as an experimental research, aiming to compare the performance of Random Forest and Random Forest after applying the Particle Swarm Optimization feature selection technique on three distinct data split compositions, namely 70:30, 80:20, and 90:10. The evaluation results indicate that the highest accuracy scores were achieved in the 90:10 data split configuration. Specifically, the Random Forest model yielded an accuracy of 87.50%, while the Random Forest model, after undergoing feature selection using the Particle Swarm Optimization algorithm, achieved an accuracy of 92.19%. Therefore, the implementation of Particle Swarm Optimization as a feature selection technique demonstrates the potential to enhance the accuracy of the Random Forest method
    corecore