140 research outputs found
Sentiment translation for low resourced languages: experiments on Irish general election Tweets
This paper presents two main methods of Sentiment Analysis
(SA) of User-Generated Content for a low-resource language: Irish. The
first method, automatic sentiment translation, applies existing English
SA resources to both manually- and automatically-translated tweets. We
obtained an accuracy of 70% using this approach. The second method involved the manual creation of an Irish-language sentiment lexicon: SentiFoclóir. This lexicon was used to build the first Irish SA system, SentiFocalTweet, which produced superior results to the first method, with
an accuracy of 76%. This demonstrates that translation from Irish to
English has a minor effect on the preservation of sentiment; it is also
shown that the SentiFocalTweet system is a successful baseline system
for Irish sentiment analysis
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)
We present the results and main findings of SemEval-2020 Task 12 on
Multilingual Offensive Language Identification in Social Media (OffensEval
2020). The task involves three subtasks corresponding to the hierarchical
taxonomy of the OLID schema (Zampieri et al., 2019a) from OffensEval 2019. The
task featured five languages: English, Arabic, Danish, Greek, and Turkish for
Subtask A. In addition, English also featured Subtasks B and C. OffensEval 2020
was one of the most popular tasks at SemEval-2020 attracting a large number of
participants across all subtasks and also across all languages. A total of 528
teams signed up to participate in the task, 145 teams submitted systems during
the evaluation period, and 70 submitted system description papers.Comment: Proceedings of the International Workshop on Semantic Evaluation
(SemEval-2020
A semi-supervised approach for sentiment analysis of arab (ic+ izi) messages: Application to the algerian dialect
In this paper, we propose a semi-supervised approach for sentiment analysis of Arabic and its dialects. This approach is based on a sentiment corpus, constructed automatically and reviewed manually by Algerian dialect native speakers. This approach consists of constructing and applying a set of deep learning algorithms to classify the sentiment of Arabic messages as positive or negative. It was applied on Facebook messages written in Modern Standard Arabic (MSA) as well as in Algerian dialect (DALG, which is a low resourced-dialect, spoken by more than 40 million people) with both scripts Arabic and Arabizi. To handle Arabizi, we consider both options: transliteration (largely used in the research literature for handling Arabizi) and translation (never used in the research literature for handling Arabizi). For highlighting the effectiveness of a semi-supervised approach, we carried out different experiments using both corpora for the training (i.e. the corpus constructed automatically and the one that was reviewed manually). The experiments were done on many test corpora dedicated to MSA/DALG, which were proposed and evaluated in the research literature. Both classifiers are used, shallow and deep learning classifiers such as Random Forest (RF), Logistic Regression(LR) Convolutional Neural Network (CNN) and Long short-term memory (LSTM). These classifiers are combined with word embedding models such as Word2vec and fastText that were used for sentiment classification. Experimental results (F1 score up to 95% for intrinsic experiments and up to 89% for extrinsic experiments) showed that the proposed system outperforms the existing state-of-the-art methodologies (the best improvement is up to 25%)
- …