221 research outputs found
Constructing Colloquial Dataset for Persian Sentiment Analysis of Social Microblogs
Introduction: Microblogging websites have massed rich data sources for
sentiment analysis and opinion mining. In this regard, sentiment classification
has frequently proven inefficient because microblog posts typically lack
syntactically consistent terms and representatives since users on these social
networks do not like to write lengthy statements. Also, there are some
limitations to low-resource languages. The Persian language has exceptional
characteristics and demands unique annotated data and models for the sentiment
analysis task, which are distinctive from text features within the English
dialect. Method: This paper first constructs a user opinion dataset called
ITRC-Opinion by collaborative environment and insource way. Our dataset
contains 60,000 informal and colloquial Persian texts from social microblogs
such as Twitter and Instagram. Second, this study proposes a new deep
convolutional neural network (CNN) model for more effective sentiment analysis
of colloquial text in social microblog posts. The constructed datasets are used
to evaluate the presented model. Furthermore, some models, such as LSTM,
CNN-RNN, BiLSTM, and BiGRU with different word embeddings, including Fasttext,
Glove, and Word2vec, investigated our dataset and evaluated the results.
Results: The results demonstrate the benefit of our dataset and the proposed
model (72% accuracy), displaying meaningful improvement in sentiment
classification performance
Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology
Every culture and language is unique. Our work expressly focuses on the
uniqueness of culture and language in relation to human affect, specifically
sentiment and emotion semantics, and how they manifest in social multimedia. We
develop sets of sentiment- and emotion-polarized visual concepts by adapting
semantic structures called adjective-noun pairs, originally introduced by Borth
et al. (2013), but in a multilingual context. We propose a new
language-dependent method for automatic discovery of these adjective-noun
constructs. We show how this pipeline can be applied on a social multimedia
platform for the creation of a large-scale multilingual visual sentiment
concept ontology (MVSO). Unlike the flat structure in Borth et al. (2013), our
unified ontology is organized hierarchically by multilingual clusters of
visually detectable nouns and subclusters of emotionally biased versions of
these nouns. In addition, we present an image-based prediction task to show how
generalizable language-specific models are in a multilingual context. A new,
publicly available dataset of >15.6K sentiment-biased visual concepts across 12
languages with language-specific detector banks, >7.36M images and their
metadata is also released.Comment: 11 pages, to appear at ACM MM'1
Sentiment Analysis of Persian Language: Review of Algorithms, Approaches and Datasets
Sentiment analysis aims to extract people's emotions and opinion from their
comments on the web. It widely used in businesses to detect sentiment in social
data, gauge brand reputation, and understand customers. Most of articles in
this area have concentrated on the English language whereas there are limited
resources for Persian language. In this review paper, recent published articles
between 2018 and 2022 in sentiment analysis in Persian Language have been
collected and their methods, approach and dataset will be explained and
analyzed. Almost all the methods used to solve sentiment analysis are machine
learning and deep learning. The purpose of this paper is to examine 40
different approach sentiment analysis in the Persian Language, analysis
datasets along with the accuracy of the algorithms applied to them and also
review strengths and weaknesses of each. Among all the methods, transformers
such as BERT and RNN Neural Networks such as LSTM and Bi-LSTM have achieved
higher accuracy in the sentiment analysis. In addition to the methods and
approaches, the datasets reviewed are listed between 2018 and 2022 and
information about each dataset and its details are provided
PersoNER: Persian named-entity recognition
© 1963-2018 ACL. Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present and provide ArmanPerosNERCorpus, the first manually-annotated Persian NER corpus. Then, we introduce PersoNER, an NER pipeline for Persian that leverages a word embedding and a sequential max-margin classifier. The experimental results show that the proposed approach is capable of achieving interesting MUC7 and CoNNL scores while outperforming two alternatives based on a CRF and a recurrent neural network
The application of Deep Learning in Persian Documents Sentiment Analysis
Nowadays the amount of textual information on the web is grown rapidly. The huge textual data needs more accurate classification algorithms. Sentiment analysis is a branch of text classification that is used to classify user opinions in case of market decisions, product evaluations or measuring consumer confidence. With the rise of the production rate of Persian text data in a commercial area, improvement of the efficiency of algorithms in Persian is a must. The structure of the Persian language such as word and sentence structures poses some challenges in this area. Deep learning algorithms are recently used in NLP and especially sentiment text classification for many dominant languages like Persian. The goal is to improve the performance of classification using deep learning issues. In this work, the authors proposed a hybrid method by a combination of structural correspondence learning (SCL) and convolutional neural network (CNN). The SCL method selects the most effective pivot features so the adaptation from one domain to similar ones cannot drop the efficiency drastically. The results showed that the proposed hybrid method that is learned from one domain can act efficiently in a similar domain. The result showed that applying a combination of SCL+CNN can improve the result of sentiment classification for two domains more than 10 percent
Sentiment Analysis Based on Deep Learning: A Comparative Study
The study of public opinion can provide us with valuable information. The
analysis of sentiment on social networks, such as Twitter or Facebook, has
become a powerful means of learning about the users' opinions and has a wide
range of applications. However, the efficiency and accuracy of sentiment
analysis is being hindered by the challenges encountered in natural language
processing (NLP). In recent years, it has been demonstrated that deep learning
models are a promising solution to the challenges of NLP. This paper reviews
the latest studies that have employed deep learning to solve sentiment analysis
problems, such as sentiment polarity. Models using term frequency-inverse
document frequency (TF-IDF) and word embedding have been applied to a series of
datasets. Finally, a comparative study has been conducted on the experimental
results obtained for the different models and input feature
A FRAMEWORK FOR ARABIC SENTIMENT ANALYSIS USING MACHINE LEARNING CLASSIFIERS
International audienceIn recent years, the use of Internet and online comments, expressed in natural language text, have increased significantly. However, it is difficult for humans to read all these comments and classify them appropriately. Consequently, an automatic approach is required to classify the unstructured data. In this paper, we propose a framework for Arabic language comprising of three steps: pre-processing, feature extraction and machine learning classification. The main aim of the proposed framework is to exploit the combination of different Arabic linguistic features. We evaluate the framework using two benchmark Arabic tweets datasets (ASTD, ATA), which enable sentiment polarity detection in general Arabic and Jordanian dialects. Comparative simulation results show that machine learning classifiers such as Support Vector Machine (SVM), Naive Bayes, MultiLayer Perceptron (MLP) and Logistic Regression-based produce the best performance by using a combination of n-gram features from Arabic tweets datasets. Finally, we evaluate the performance of our proposed framework using an Ensemble classifier approach, with promising results
- …