3,537 research outputs found

    A review of sentiment analysis research in Arabic language

    Full text link
    Sentiment analysis is a task of natural language processing which has recently attracted increasing attention. However, sentiment analysis research has mainly been carried out for the English language. Although Arabic is ramping up as one of the most used languages on the Internet, only a few studies have focused on Arabic sentiment analysis so far. In this paper, we carry out an in-depth qualitative study of the most important research works in this context by presenting limits and strengths of existing approaches. In particular, we survey both approaches that leverage machine translation or transfer learning to adapt English resources to Arabic and approaches that stem directly from the Arabic language

    A hybrid approach based on personality traits for hate speech detection in Arabic social media

    Get PDF
    In recent years, as social media has grown in popularity, people have gained the ability to freely share their views. However, this may lead to users' conflict and hostility, resulting in unattractive online environments. Hate speech relates to using expressions or phrases that are violent, offensive, or insulting to a minority of people. The number of Arab social media users is quickly rising, and this is being followed by an increase in the frequency of cyber hate speech in the area. Therefore, the automated detection of Arabic hate speech has become a major concern for many stakeholders. The intersection of personality learning and hate speech detection is a relatively less studied niche. We suggest a novel approach that is focused on extracting personality trait features and using these features to detect Arabic hate speech. The experimental results show that the proposed approach is superior in terms of the macro-F1 score by achieving 82.3% compared to previous work reported in the literature

    Machine learning for Arabic phonemes recognition using electrolarynx speech

    Get PDF
    Automatic speech recognition system is one of the essential ways of interaction with machines. Interests in speech based intelligent systems have grown in the past few decades. Therefore, there is a need to develop more efficient methods for human speech recognition to ensure the reliability of communication between individuals and machines. This paper is concerned with Arabic phoneme recognition of electrolarynx device. Electrolarynx is a device used by cancer patients having vocal laryngeal cords removed. Speech recognition here is considered to find the preferred machine learning model that can classify phonemes produced by electrolarynx device. The phonemes recognition employs different machine learning schemes, including convolutional neural network, recurrent neural network, artificial neural network (ANN), random forest, extreme gradient boosting (XGBoost), and long short-term memory. Modern standard Arabic is utilized for testing and training phases of the recognition system. The dataset covers both an ordinary speech and electrolarynx device speech recorded by the same person. Mel frequency cepstral coefficients are considered as speech features. The results show that the ANN machine learning method outperformed other methods with an accuracy rate of 75%, a precision value of 77%, and a phoneme error rate (PER) of 21.85%

    Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companies’ Customers

    Get PDF
    The flexibility in mobile communications allows customers to quickly switch from one service provider to another, making customer churn one of the most critical challenges for the data and voice telecommunication service industry. In 2019, the percentage of post-paid telecommunication customers in Saudi Arabia decreased; this represents a great deal of customer dissatisfaction and subsequent corporate fiscal losses. Many studies correlate customer satisfaction with customer churn. The Telecom companies have depended on historical customer data to measure customer churn. However, historical data does not reveal current customer satisfaction or future likeliness to switch between telecom companies. Current methods of analysing churn rates are inadequate and faced some issues, particularly in the Saudi market. This research was conducted to realize the relationship between customer satisfaction and customer churn and how to use social media mining to measure customer satisfaction and predict customer churn. This research conducted a systematic review to address the churn prediction models problems and their relation to Arabic Sentiment Analysis. The findings show that the current churn models lack integrating structural data frameworks with real-time analytics to target customers in real-time. In addition, the findings show that the specific issues in the existing churn prediction models in Saudi Arabia relate to the Arabic language itself, its complexity, and lack of resources. As a result, I have constructed the first gold standard corpus of Saudi tweets related to telecom companies, comprising 20,000 manually annotated tweets. It has been generated as a dialect sentiment lexicon extracted from a larger Twitter dataset collected by me to capture text characteristics in social media. I developed a new ASA prediction model for telecommunication that fills the detected gaps in the ASA literature and fits the telecommunication field. The proposed model proved its effectiveness for Arabic sentiment analysis and churn prediction. This is the first work using Twitter mining to predict potential customer loss (churn) in Saudi telecom companies, which has not been attempted before. Different fields, such as education, have different features, making applying the proposed model is interesting because it based on text-mining

    Automatic symptoms identification from a massive volume of unstructured medical consultations using deep neural and BERT models

    Get PDF
    Automatic symptom identification plays a crucial role in assisting doctors during the diagnosis process in Telemedicine. In general, physicians spend considerable time on clinical documentation and symptom identification, which is unfeasible due to their full schedule. With text-based consultation services in telemedicine, the identification of symptoms from a user’s consultation is a sophisticated process and time-consuming. Moreover, at Altibbi, which is an Arabic telemedicine platform and the context of this work, users consult doctors and describe their conditions in different Arabic dialects which makes the problem more complex and challenging. Therefore, in this work, an advanced deep learning approach is developed consultations with multi-dialects. The approach is formulated as a multi-label multi-class classification using features extracted based on AraBERT and fine-tuned on the bidirectional long short-term memory (BiLSTM) network. The Fine-tuning of BiLSTM relies on features engineered based on different variants of the bidirectional encoder representations from transformers (BERT). Evaluating the models based on precision, recall, and a customized hit rate showed a successful identification of symptoms from Arabic texts with promising accuracy. Hence, this paves the way toward deploying an automated symptom identification model in production at Altibbi which can help general practitioners in telemedicine in providing more efficient and accurate consultations
    • …
    corecore