12,951 research outputs found

    The Creation of an Arabic Emotion Ontology Based on E-Motive

    Get PDF
    © 2017 The Authors. Published by Elsevier B.V. There is an increased interest in social media monitoring to analyse massive, free form, short user-generated text from multiple social media sites such as Facebook, WhatsApp and Twitter. Companies are interested in sentiment analysis to understand customers\u27 opinions about their products/services. Governments and law enforcement agencies are interested in identifying threats to safeguard their country\u27s national security. They are actively seeking ways to monitor and analyse the public\u27s responses to various services, activities and events, especially since social media has become a valuable real-time resource of information. This study builds on prior work that focused on sentiment classification (i.e., positive, negative). This study primarily aims to design and develop a social sentiment-parsing algorithm for capturing and monitoring an extensive and comprehensive range of emotions from Arabic social media text. The study contributes to the field of sentiment analysis (opinion mining) and can subsequently be used for web mining, cleansing and analytics

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    #Halal Culture on Instagram

    Full text link
    Halal is a notion that applies to both objects and actions, and means permissible according to Islamic law. It may be most often associated with food and the rules of selecting, slaughtering, and cooking animals. In the globalized world, halal can be found in street corners of New York and beauty shops of Manila. In this study, we explore the cultural diversity of the concept, as revealed through social media, and specifically the way it is expressed by different populations around the world, and how it relates to their perception of (i) religious and (ii) governmental authority, and (iii) personal health. Here, we analyze two Instagram datasets, using Halal in Arabic (325,665 posts) and in English (1,004,445 posts), which provide a global view of major Muslim populations around the world. We find a great variety in the use of halal within Arabic, English, and Indonesian-speaking populations, with animal trade emphasized in first (making up 61% of the language's stream), food in second (80%), and cosmetics and supplements in third (70%). The commercialization of the term halal is a powerful signal of its detraction from its traditional roots. We find a complex social engagement around posts mentioning religious terms, such that when a food-related post is accompanied by a religious term, it on average gets more likes in English and Indonesian, but not in Arabic, indicating a potential shift out of its traditional moral framing

    Multilingual Large Language Models Are Not (Yet) Code-Switchers

    Full text link
    Multilingual Large Language Models (LLMs) have recently shown great capabilities in a wide range of tasks, exhibiting state-of-the-art performance through zero-shot or few-shot prompting methods. While there have been extensive studies on their abilities in monolingual tasks, the investigation of their potential in the context of code-switching (CSW), the practice of alternating languages within an utterance, remains relatively uncharted. In this paper, we provide a comprehensive empirical analysis of various multilingual LLMs, benchmarking their performance across four tasks: sentiment analysis, machine translation, summarization and word-level language identification. Our results indicate that despite multilingual LLMs exhibiting promising outcomes in certain tasks using zero or few-shot prompting, they still underperform in comparison to fine-tuned models of much smaller scales. We argue that current "multilingualism" in LLMs does not inherently imply proficiency with code-switching texts, calling for future research to bridge this discrepancy.Comment: Accepted at EMNLP 202

    Otrouha: A Corpus of Arabic ETDs and a Framework for Automatic Subject Classification

    Get PDF
    Although the Arabic language is spoken by more than 300 million people and is one of the six official languages of the United Nations (UN), there has been less research done on Arabic text data (compared to English) in the realm of machine learning, especially in text classification. In the past decade, Arabic data such as news, tweets, etc. have begun to receive some attention. Although automatic text classification plays an important role in improving the browsability and accessibility of data, Electronic Theses and Dissertations (ETDs) have not received their fair share of attention, in spite of the huge number of benefits they provide to students, universities, and future generations of scholars. There are two main roadblocks to performing automatic subject classification on Arabic ETDs. The first is the unavailability of a public corpus of Arabic ETDs. The second is the linguistic complexity of the Arabic language; that complexity is particularly evident in academic documents such as ETDs. To address these roadblocks, this paper presents Otrouha, a framework for automatic subject classification of Arabic ETDs, which has two main goals. The first is building a Corpus of Arabic ETDs and their key metadata such as abstracts, keywords, and title to pave the way for more exploratory research on this valuable genre of data. The second is to provide a framework for automatic subject classification of Arabic ETDs through different classification models that use classical machine learning as well as deep learning techniques. The first goal is aided by searching the AskZad Digital Library, which is part of the Saudi Digital Library (SDL). AskZad provides other key metadata of Arabic ETDs, such as abstract, title, and keywords. The current search results consist of abstracts of Arabic ETDs. This raw data then undergoes a pre-processing phase that includes stop word removal using the Natural Language Tool Kit (NLTK), and word lemmatization using the Farasa API. To date, abstracts of 518 ETDs across 12 subjects have been collected. For the second goal, the preliminary results show that among the machine learning models, binary classification (one-vs.-all) performed better than multiclass classification. The maximum per subject accuracy is 95%, with an average accuracy of 68% across all subjects. It is noteworthy that the binary classification model performed better for some categories than others. For example, Applied Science and Technology shows 95% accuracy, while the category of Administration shows 36%. Deep learning models resulted in higher accuracy but lower F-measure; their overall performance is lower than machine learning models. This may be due to the small size of the dataset as well as the imbalance in the number of documents per category. Work to collect additional ETDs will be aided by collaborative contributions of data from additional sources

    A review of sentiment analysis research in Arabic language

    Full text link
    Sentiment analysis is a task of natural language processing which has recently attracted increasing attention. However, sentiment analysis research has mainly been carried out for the English language. Although Arabic is ramping up as one of the most used languages on the Internet, only a few studies have focused on Arabic sentiment analysis so far. In this paper, we carry out an in-depth qualitative study of the most important research works in this context by presenting limits and strengths of existing approaches. In particular, we survey both approaches that leverage machine translation or transfer learning to adapt English resources to Arabic and approaches that stem directly from the Arabic language

    Contextualizing Palestinian Hybridity: How Pragmatic Citizenship Influences Diasporic Identities

    Get PDF
    Palestinians are one of the largest diaspora populations in the world, with members in the Middle East, Africa, Europe, and the Americas. How are the individual diasporic experiences of nationalism similar and different to one another? This research examines the creation and maintenance of Palestinian identity in diasporic contexts through ethnographic analysis and a series of interviews conducted in Chile, Jordan, and The United States. The results show that despite Palestinians maintaining Palestinianness as a dominant characteristic of identity in all three settings, there are contextual influences on how people integrate that identity into their lives. Within Jordan, Palestinians experience conflicting national identities and economic disparity while sharing language, culture and geographic proximity with Palestine. In The United States and Chile, Palestinians experience cultural and spatial separation from Palestine and are influenced by local political and economic situations. Evidence also shows that the identities of most of the participants in the three countries demonstrate various levels of cultural hybridity

    Tourist Responses to Tourism Experiences in Saudi Arabia

    Get PDF
    A decade ago, the Kingdom of Saudi Arabia (KSA) was not perceived to be a popular tourism destination except for religious purposes, the government of KSA has been proactive in recent years building new destinations, changing longstanding policies, focusing on tourism and hospitality education, and renovating its image to attract domestic and international tourists. Tourism contributed to almost 9% of the Kingdom’s GDP in 2018, around 65 billion dollars (WTTC, 2019). The purpose of this paper is to understand the sentiment that tourists have regarding the new tourism campaigns in KSA, to have transparent feedback about the experiences and services mostly adopted by tourists, and to study the feasibility of KSA Vision 2030 regarding the tourism sector. This study will perform an open data analysis by extracting and analyzing data from a well-known online source (Twitter). Results will highlight the utilization of online data tools to measure tourism trends
    corecore