4,231 research outputs found
Sentiment Analysis of Persian Language: Review of Algorithms, Approaches and Datasets
Sentiment analysis aims to extract people's emotions and opinion from their
comments on the web. It widely used in businesses to detect sentiment in social
data, gauge brand reputation, and understand customers. Most of articles in
this area have concentrated on the English language whereas there are limited
resources for Persian language. In this review paper, recent published articles
between 2018 and 2022 in sentiment analysis in Persian Language have been
collected and their methods, approach and dataset will be explained and
analyzed. Almost all the methods used to solve sentiment analysis are machine
learning and deep learning. The purpose of this paper is to examine 40
different approach sentiment analysis in the Persian Language, analysis
datasets along with the accuracy of the algorithms applied to them and also
review strengths and weaknesses of each. Among all the methods, transformers
such as BERT and RNN Neural Networks such as LSTM and Bi-LSTM have achieved
higher accuracy in the sentiment analysis. In addition to the methods and
approaches, the datasets reviewed are listed between 2018 and 2022 and
information about each dataset and its details are provided
Constructing Colloquial Dataset for Persian Sentiment Analysis of Social Microblogs
Introduction: Microblogging websites have massed rich data sources for
sentiment analysis and opinion mining. In this regard, sentiment classification
has frequently proven inefficient because microblog posts typically lack
syntactically consistent terms and representatives since users on these social
networks do not like to write lengthy statements. Also, there are some
limitations to low-resource languages. The Persian language has exceptional
characteristics and demands unique annotated data and models for the sentiment
analysis task, which are distinctive from text features within the English
dialect. Method: This paper first constructs a user opinion dataset called
ITRC-Opinion by collaborative environment and insource way. Our dataset
contains 60,000 informal and colloquial Persian texts from social microblogs
such as Twitter and Instagram. Second, this study proposes a new deep
convolutional neural network (CNN) model for more effective sentiment analysis
of colloquial text in social microblog posts. The constructed datasets are used
to evaluate the presented model. Furthermore, some models, such as LSTM,
CNN-RNN, BiLSTM, and BiGRU with different word embeddings, including Fasttext,
Glove, and Word2vec, investigated our dataset and evaluated the results.
Results: The results demonstrate the benefit of our dataset and the proposed
model (72% accuracy), displaying meaningful improvement in sentiment
classification performance
PersoNER: Persian named-entity recognition
© 1963-2018 ACL. Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present and provide ArmanPerosNERCorpus, the first manually-annotated Persian NER corpus. Then, we introduce PersoNER, an NER pipeline for Persian that leverages a word embedding and a sequential max-margin classifier. The experimental results show that the proposed approach is capable of achieving interesting MUC7 and CoNNL scores while outperforming two alternatives based on a CRF and a recurrent neural network
A FRAMEWORK FOR ARABIC SENTIMENT ANALYSIS USING MACHINE LEARNING CLASSIFIERS
International audienceIn recent years, the use of Internet and online comments, expressed in natural language text, have increased significantly. However, it is difficult for humans to read all these comments and classify them appropriately. Consequently, an automatic approach is required to classify the unstructured data. In this paper, we propose a framework for Arabic language comprising of three steps: pre-processing, feature extraction and machine learning classification. The main aim of the proposed framework is to exploit the combination of different Arabic linguistic features. We evaluate the framework using two benchmark Arabic tweets datasets (ASTD, ATA), which enable sentiment polarity detection in general Arabic and Jordanian dialects. Comparative simulation results show that machine learning classifiers such as Support Vector Machine (SVM), Naive Bayes, MultiLayer Perceptron (MLP) and Logistic Regression-based produce the best performance by using a combination of n-gram features from Arabic tweets datasets. Finally, we evaluate the performance of our proposed framework using an Ensemble classifier approach, with promising results
DeepSentiPers: Novel Deep Learning Models Trained Over Proposed Augmented Persian Sentiment Corpus
This paper focuses on how to extract opinions over each Persian
sentence-level text. Deep learning models provided a new way to boost the
quality of the output. However, these architectures need to feed on big
annotated data as well as an accurate design. To best of our knowledge, we do
not merely suffer from lack of well-annotated Persian sentiment corpus, but
also a novel model to classify the Persian opinions in terms of both multiple
and binary classification. So in this work, first we propose two novel deep
learning architectures comprises of bidirectional LSTM and CNN. They are a part
of a deep hierarchy designed precisely and also able to classify sentences in
both cases. Second, we suggested three data augmentation techniques for the
low-resources Persian sentiment corpus. Our comprehensive experiments on three
baselines and two different neural word embedding methods show that our data
augmentation methods and intended models successfully address the aims of the
research
- …