4,832 research outputs found
Metaphorical Expressions in Automatic Arabic Sentiment Analysis
Over the recent years, Arabic language resources and NLP tools have been under rapid development. One of the important tasks for Arabic natural language processing is the sentiment analysis. While a significant improvement has been achieved in this research area, the existing computational models and tools still suffer from the lack of capability of dealing with Arabic metaphorical expressions. Metaphors have an important role in Arabic language due to its unique history and culture. Metaphors provide a linguistic mechanism for expressing ideas and notions that can be different from their surface form. Therefore, in order to efficiently identify true sentiment of Arabic language data, a computational model needs to be able to âread between linesâ. In this paper, we examine the issue of metaphors in automatic Arabic sentiment analysis by carrying out an experiment, in which we observe the performance of a state-of-art Arabic sentiment tool on metaphors and analyse the result to gain a deeper insight into the issue. Our experiment evidently shows that metaphors have a significant impact on the performance of current Arabic sentiment tools, and hence it is an important task to develop Arabic language resources and computational models for Arabic metaphors
Productivity Measurement of Call Centre Agents using a Multimodal Classification Approach
Call centre channels play a cornerstone role in business communications and transactions, especially in challenging business situations. Operationsâ efficiency, service quality, and resource productivity are core aspects of call centresâ competitive advantage in rapid market competition. Performance evaluation in call centres is challenging due to human subjective evaluation, manual assortment to massive calls, and inequality in evaluations because of different raters. These challenges impact these operations' efficiency and lead to frustrated customers. This study aims to automate performance evaluation in call centres using various deep learning approaches. Calls recorded in a call centre are modelled and classified into high- or low-performance evaluations categorised as productive or nonproductive calls.
The proposed conceptual model considers a deep learning network approach to model the recorded calls as text and speech. It is based on the following: 1) focus on the technical part of agent performance, 2) objective evaluation of the corpus, 3) extension of features for both text and speech, and 4) combination of the best accuracy from text and speech data using a multimodal structure. Accordingly, the diarisation algorithm extracts that part of the call where the agent is talking from which the customer is doing so. Manual annotation is also necessary to divide the modelling corpus into productive and nonproductive (supervised training). Krippendorffâs alpha was applied to avoid subjectivity in the manual annotation. Arabic speech recognition is then developed to transcribe the speech into text. The text features are the words embedded using the embedding layer. The speech features make several attempts to use the Mel Frequency Cepstral Coefficient (MFCC) upgraded with Low-Level Descriptors (LLD) to improve classification accuracy. The data modelling architectures for speech and text are based on CNNs, BiLSTMs, and the attention layer. The multimodal approach follows the generated models to improve performance accuracy by concatenating the text and speech models using the joint representation methodology.
The main contributions of this thesis are:
⢠Developing an Arabic Speech recognition method for automatic transcription of speech into text.
⢠Drawing several DNN architectures to improve performance evaluation using speech features based on MFCC and LLD.
⢠Developing a Max Weight Similarity (MWS) function to outperform the SoftMax function used in the attention layer.
⢠Proposing a multimodal approach for combining the text and speech models for best performance evaluation
Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companiesâ Customers
The flexibility in mobile communications allows customers to quickly switch from one service provider to
another, making customer churn one of the most critical challenges for the data and voice telecommunication
service industry. In 2019, the percentage of post-paid telecommunication customers in Saudi Arabia
decreased; this represents a great deal of customer dissatisfaction and subsequent corporate fiscal losses.
Many studies correlate customer satisfaction with customer churn. The Telecom companies have depended
on historical customer data to measure customer churn. However, historical data does not reveal current
customer satisfaction or future likeliness to switch between telecom companies. Current methods of analysing
churn rates are inadequate and faced some issues, particularly in the Saudi market.
This research was conducted to realize the relationship between customer satisfaction and customer churn
and how to use social media mining to measure customer satisfaction and predict customer churn.
This research conducted a systematic review to address the churn prediction models problems and their
relation to Arabic Sentiment Analysis. The findings show that the current churn models lack integrating
structural data frameworks with real-time analytics to target customers in real-time. In addition, the findings
show that the specific issues in the existing churn prediction models in Saudi Arabia relate to the Arabic
language itself, its complexity, and lack of resources.
As a result, I have constructed the first gold standard corpus of Saudi tweets related to telecom companies,
comprising 20,000 manually annotated tweets. It has been generated as a dialect sentiment lexicon extracted
from a larger Twitter dataset collected by me to capture text characteristics in social media. I developed a
new ASA prediction model for telecommunication that fills the detected gaps in the ASA literature and fits
the telecommunication field. The proposed model proved its effectiveness for Arabic sentiment analysis and
churn prediction. This is the first work using Twitter mining to predict potential customer loss (churn) in
Saudi telecom companies, which has not been attempted before. Different fields, such as education, have
different features, making applying the proposed model is interesting because it based on text-mining
Sentiment and behaviour annotation in a corpus of dialogue summaries
This paper proposes a scheme for sentiment annotation. We show how the task can be made tractable by focusing on one of the many aspects of sentiment: sentiment as it is recorded in behaviour reports of people and their interactions. Together with a number of measures for supporting the reliable application of the scheme, this allows us to obtain sufficient to good agreement scores (in terms of Krippendorf's alpha) on three key dimensions: polarity, evaluated party and type of clause. Evaluation of the scheme is carried out through the annotation of an existing corpus of dialogue summaries (in English and Portuguese) by nine annotators. Our contribution to the field is twofold: (i) a reliable multi-dimensional annotation scheme for sentiment in behaviour reports; and (ii) an annotated corpus that was used for testing the reliability of the scheme and which is made available to the research community
Supervised sentiment analysis in multilingual environments
Š 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/. This version of the article: Vilares, D., Alonso, M.A. and GĂłmez-RodrĂguez, C. (2017) âSupervised sentiment analysis in multilingual environmentsâ has been accepted for publication in Information Processing & Management, 53(3), pp. 595â607. The Version of Record is available online at https://doi.org/10.1016/j.ipm.2017.01.004.[Abstract]: This article tackles the problem of performing multilingual polarity classification on Twitter, comparing three techniques: (1) a multilingual model trained on a multilingual dataset, obtained by fusing existing monolingual resources, that does not need any language recognition step, (2) a dual monolingual model with perfect language detection on monolingual texts and (3) a monolingual model that acts based on the decision provided by a language identification tool. The techniques were evaluated on monolingual, synthetic multilingual and code-switching corpora of English and Spanish tweets. In the latter case we introduce the first code-switching Twitter corpus with sentiment labels. The samples are labelled according to two well-known criteria used for this purpose: the SentiStrength scale and a trinary scale (positive, neutral and negative categories). The experimental results show the robustness of the multilingual approach (1) and also that it outperforms the monolingual models on some monolingual datasets.This research was supported by the Ministerio de EconomĂa y Competitividad (FFI2014-51978-C2) and Xunta de Galicia (R2014/034). David Vilares is funded by the Ministerio de EducaciĂłn, Cultura y Deporte (FPU13/01180). Carlos GĂłmez-RodrĂguez is funded by an Oportunius program grant (Xunta de Galicia).Xunta de Galicia; R2014/03
HausaNLP at SemEval-2023 Task 12: Leveraging African Low Resource TweetData for Sentiment Analysis
We present the findings of SemEval-2023 Task 12, a shared task on sentiment
analysis for low-resource African languages using Twitter dataset. The task
featured three subtasks; subtask A is monolingual sentiment classification with
12 tracks which are all monolingual languages, subtask B is multilingual
sentiment classification using the tracks in subtask A and subtask C is a
zero-shot sentiment classification. We present the results and findings of
subtask A, subtask B and subtask C. We also release the code on github. Our
goal is to leverage low-resource tweet data using pre-trained Afro-xlmr-large,
AfriBERTa-Large, Bert-base-arabic-camelbert-da-sentiment (Arabic-camelbert),
Multilingual-BERT (mBERT) and BERT models for sentiment analysis of 14 African
languages. The datasets for these subtasks consists of a gold standard
multi-class labeled Twitter datasets from these languages. Our results
demonstrate that Afro-xlmr-large model performed better compared to the other
models in most of the languages datasets. Similarly, Nigerian languages: Hausa,
Igbo, and Yoruba achieved better performance compared to other languages and
this can be attributed to the higher volume of data present in the languages
- âŚ