25,924 research outputs found
Research Directions, Challenges and Issues in Opinion Mining
Rapid growth of Internet and availability of user reviews on the web for any product has provided a need for an effective system to analyze the web reviews. Such reviews are useful to some extent, promising both the customers and product manufacturers. For any popular product, the number of reviews can be in hundreds or even thousands. This creates difficulty for a customer to analyze them and make important decisions on whether to purchase the product or to not. Mining such product reviews or opinions is termed as opinion mining which is broadly classified into two main categories namely facts and opinions. Though there are several approaches for opinion mining, there remains a challenge to decide on the recommendation provided by the system. In this paper, we analyze the basics of opinion mining, challenges, pros & cons of past opinion mining systems and provide some directions for the future research work, focusing on the challenges and issues
Evolutionary Multiobjective Feature Selection for Sentiment Analysis
AuthorSentiment analysis is one of the prominent research areas in data mining and knowledge discovery, which has proven to be an effective technique for monitoring public opinion. The big data era with a high volume of data generated by a variety of sources has provided enhanced opportunities for utilizing sentiment analysis in various domains. In order to take best advantage of the high volume of data for accurate sentiment analysis, it is essential to clean the data before the analysis, as irrelevant or redundant data will hinder extracting valuable information. In this paper, we propose a hybrid feature selection algorithm to improve the performance of sentiment analysis tasks. Our proposed sentiment analysis approach builds a binary classification model based on two feature selection techniques: an entropy-based metric and an evolutionary algorithm. We have performed comprehensive experiments in two different domains using a benchmark dataset, Stanford Sentiment Treebank, and a real-world dataset we have created based on World Health Organization (WHO) public speeches regarding COVID-19. The proposed feature selection model is shown to achieve significant performance improvements in both datasets, increasing classification accuracy for all utilized machine learning and text representation technique combinations. Moreover, it achieves over 70% reduction in feature size, which provides efficiency in computation time and space
An empirical study on the various stock market prediction methods
Investment in the stock market is one of the much-admired investment actions. However, prediction of the stock market has remained a hard task because of the non-linearity exhibited. The non-linearity is due to multiple affecting factors such as global economy, political situations, sector performance, economic numbers, foreign institution investment, domestic institution investment, and so on. A proper set of such representative factors must be analyzed to make an efficient prediction model. Marginal improvement of prediction accuracy can be gainful for investors. This review provides a detailed analysis of research papers presenting stock market prediction techniques. These techniques are assessed in the time series analysis and sentiment analysis section. A detailed discussion on research gaps and issues is presented. The reviewed articles are analyzed based on the use of prediction techniques, optimization algorithms, feature selection methods, datasets, toolset, evaluation matrices, and input parameters. The techniques are further investigated to analyze relations of prediction methods with feature selection algorithm, datasets, feature selection methods, and input parameters. In addition, major problems raised in the present techniques are also discussed. This survey will provide researchers with deeper insight into various aspects of current stock market prediction methods
Sentiment Analysis of Customers' Reviews Using a Hybrid Evolutionary SVM-Based Approach in an Imbalanced Data Distribution
Online media has an increasing presence on the restaurants' activities through social media
websites, coinciding with an increase in customers' reviews of these restaurants. These reviews become
the main source of information for both customers and decision-makers in this field. Any customer who
is seeking such places will check their reviews first, which usually affect their final choice. In addition,
customers' experiences can be enhanced by utilizing other customers' suggestions. Consequently, customers'
reviews can influence the success of restaurant business since it is considered the final judgment of the overall
quality of any restaurant. Thus, decision-makers need to analyze their customers' underlying sentiments in
order to meet their expectations and improve the restaurants' services, in terms of food quality, ambiance,
price range, and customer service. The number of reviews available for various products and services
has dramatically increased these days and so has the need for automated methods to collect and analyze
these reviews. Sentiment Analysis (SA) is a field of machine learning that helps analyze and predict the
sentiments underlying these reviews. Usually, SA for customers' reviews face imbalanced datasets challenge,
as the majority of these sentiments fall into supporters or resistors of the product or service. This work
proposes a hybrid approach by combining the SupportVector Machine (SVM) algorithm with Particle Swarm
Optimization (PSO) and different oversampling techniques to handle the imbalanced data problem. SVM is
applied as a machine learning classi cation technique to predict the sentiments of reviews by optimizing the
dataset, which contains different reviews of several restaurants in Jordan. Data were collected from Jeeran,
a well-known social network for Arabic reviews. A PSO technique is used to optimize the weights of the
features, as well as four different oversampling techniques, namely, the Synthetic Minority Oversampling
Technique (SMOTE), SVM-SMOTE, Adaptive Synthetic Sampling (ADASYN) and borderline-SMOTE
were examined to produce an optimized dataset and solve the imbalanced problem of the dataset. This study
shows that the proposed PSO-SVM approach produces the best results compared to different classiffication
techniques in terms of accuracy, F-measure, G-mean and Area Under the Curve (AUC), for different versions
of the datasets
Measuring academic influence: Not all citations are equal
The importance of a research article is routinely measured by counting how
many times it has been cited. However, treating all citations with equal weight
ignores the wide variety of functions that citations perform. We want to
automatically identify the subset of references in a bibliography that have a
central academic influence on the citing paper. For this purpose, we examine
the effectiveness of a variety of features for determining the academic
influence of a citation. By asking authors to identify the key references in
their own work, we created a data set in which citations were labeled according
to their academic influence. Using automatic feature selection with supervised
machine learning, we found a model for predicting academic influence that
achieves good performance on this data set using only four features. The best
features, among those we evaluated, were those based on the number of times a
reference is mentioned in the body of a citing paper. The performance of these
features inspired us to design an influence-primed h-index (the hip-index).
Unlike the conventional h-index, it weights citations by how many times a
reference is mentioned. According to our experiments, the hip-index is a better
indicator of researcher performance than the conventional h-index
- …