12,951 research outputs found
The Creation of an Arabic Emotion Ontology Based on E-Motive
© 2017 The Authors. Published by Elsevier B.V. There is an increased interest in social media monitoring to analyse massive, free form, short user-generated text from multiple social media sites such as Facebook, WhatsApp and Twitter. Companies are interested in sentiment analysis to understand customers\u27 opinions about their products/services. Governments and law enforcement agencies are interested in identifying threats to safeguard their country\u27s national security. They are actively seeking ways to monitor and analyse the public\u27s responses to various services, activities and events, especially since social media has become a valuable real-time resource of information. This study builds on prior work that focused on sentiment classification (i.e., positive, negative). This study primarily aims to design and develop a social sentiment-parsing algorithm for capturing and monitoring an extensive and comprehensive range of emotions from Arabic social media text. The study contributes to the field of sentiment analysis (opinion mining) and can subsequently be used for web mining, cleansing and analytics
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
#Halal Culture on Instagram
Halal is a notion that applies to both objects and actions, and means
permissible according to Islamic law. It may be most often associated with food
and the rules of selecting, slaughtering, and cooking animals. In the
globalized world, halal can be found in street corners of New York and beauty
shops of Manila. In this study, we explore the cultural diversity of the
concept, as revealed through social media, and specifically the way it is
expressed by different populations around the world, and how it relates to
their perception of (i) religious and (ii) governmental authority, and (iii)
personal health. Here, we analyze two Instagram datasets, using Halal in Arabic
(325,665 posts) and in English (1,004,445 posts), which provide a global view
of major Muslim populations around the world. We find a great variety in the
use of halal within Arabic, English, and Indonesian-speaking populations, with
animal trade emphasized in first (making up 61% of the language's stream), food
in second (80%), and cosmetics and supplements in third (70%). The
commercialization of the term halal is a powerful signal of its detraction from
its traditional roots. We find a complex social engagement around posts
mentioning religious terms, such that when a food-related post is accompanied
by a religious term, it on average gets more likes in English and Indonesian,
but not in Arabic, indicating a potential shift out of its traditional moral
framing
Multilingual Large Language Models Are Not (Yet) Code-Switchers
Multilingual Large Language Models (LLMs) have recently shown great
capabilities in a wide range of tasks, exhibiting state-of-the-art performance
through zero-shot or few-shot prompting methods. While there have been
extensive studies on their abilities in monolingual tasks, the investigation of
their potential in the context of code-switching (CSW), the practice of
alternating languages within an utterance, remains relatively uncharted. In
this paper, we provide a comprehensive empirical analysis of various
multilingual LLMs, benchmarking their performance across four tasks: sentiment
analysis, machine translation, summarization and word-level language
identification. Our results indicate that despite multilingual LLMs exhibiting
promising outcomes in certain tasks using zero or few-shot prompting, they
still underperform in comparison to fine-tuned models of much smaller scales.
We argue that current "multilingualism" in LLMs does not inherently imply
proficiency with code-switching texts, calling for future research to bridge
this discrepancy.Comment: Accepted at EMNLP 202
Otrouha: A Corpus of Arabic ETDs and a Framework for Automatic Subject Classification
Although the Arabic language is spoken by more than 300 million people and is one of the six official languages of the United Nations (UN), there has been less research done on Arabic text data (compared to English) in the realm of machine learning, especially in text classification. In the past decade, Arabic data such as news, tweets, etc. have begun to receive some attention. Although automatic text classification plays an important role in improving the browsability and accessibility of data, Electronic Theses and Dissertations (ETDs) have not received their fair share of attention, in spite of the huge number of benefits they provide to students, universities, and future generations of scholars. There are two main roadblocks to performing automatic subject classification on Arabic ETDs. The first is the unavailability of a public corpus of Arabic ETDs. The second is the linguistic complexity of the Arabic language; that complexity is particularly evident in academic documents such as ETDs. To address these roadblocks, this paper presents Otrouha, a framework for automatic subject classification of Arabic ETDs, which has two main goals. The first is building a Corpus of Arabic ETDs and their key metadata such as abstracts, keywords, and title to pave the way for more exploratory research on this valuable genre of data. The second is to provide a framework for automatic subject classification of Arabic ETDs through different classification models that use classical machine learning as well as deep learning techniques. The first goal is aided by searching the AskZad Digital Library, which is part of the Saudi Digital Library (SDL). AskZad provides other key metadata of Arabic ETDs, such as abstract, title, and keywords. The current search results consist of abstracts of Arabic ETDs. This raw data then undergoes a pre-processing phase that includes stop word removal using the Natural Language Tool Kit (NLTK), and word lemmatization using the Farasa API. To date, abstracts of 518 ETDs across 12 subjects have been collected. For the second goal, the preliminary results show that among the machine learning models, binary classification (one-vs.-all) performed better than multiclass classification. The maximum per subject accuracy is 95%, with an average accuracy of 68% across all subjects. It is noteworthy that the binary classification model performed better for some categories than others. For example, Applied Science and Technology shows 95% accuracy, while the category of Administration shows 36%. Deep learning models resulted in higher accuracy but lower F-measure; their overall performance is lower than machine learning models. This may be due to the small size of the dataset as well as the imbalance in the number of documents per category. Work to collect additional ETDs will be aided by collaborative contributions of data from additional sources
A review of sentiment analysis research in Arabic language
Sentiment analysis is a task of natural language processing which has
recently attracted increasing attention. However, sentiment analysis research
has mainly been carried out for the English language. Although Arabic is
ramping up as one of the most used languages on the Internet, only a few
studies have focused on Arabic sentiment analysis so far. In this paper, we
carry out an in-depth qualitative study of the most important research works in
this context by presenting limits and strengths of existing approaches. In
particular, we survey both approaches that leverage machine translation or
transfer learning to adapt English resources to Arabic and approaches that stem
directly from the Arabic language
Contextualizing Palestinian Hybridity: How Pragmatic Citizenship Influences Diasporic Identities
Palestinians are one of the largest diaspora populations in the world, with members in the Middle East, Africa, Europe, and the Americas. How are the individual diasporic experiences of nationalism similar and different to one another? This research examines the creation and maintenance of Palestinian identity in diasporic contexts through ethnographic analysis and a series of interviews conducted in Chile, Jordan, and The United States. The results show that despite Palestinians maintaining Palestinianness as a dominant characteristic of identity in all three settings, there are contextual influences on how people integrate that identity into their lives. Within Jordan, Palestinians experience conflicting national identities and economic disparity while sharing language, culture and geographic proximity with Palestine. In The United States and Chile, Palestinians experience cultural and spatial separation from Palestine and are influenced by local political and economic situations. Evidence also shows that the identities of most of the participants in the three countries demonstrate various levels of cultural hybridity
Recommended from our members
Artificial Intelligence and Online Extremism: Challenges and Opportunities
Radicalisation is a process that historically used to be triggered mainly through social interactions in places of worship, religious schools, prisons, meeting venues, etc. Today, this process is often initiated on the Internet, where radicalisation content is easily shared, and potential candidates are reached more easily, rapidly, and at an unprecedented scale (Edwards and Gribbon, 2013; Von Behr et al., 2013).
In recent years, some terrorist organisations succeeded in leveraging the power of social media to recruit individuals to their cause and ideology (Farwell, 2014). It is often the case that such recruitment attempts are initiated on open social media platforms (e.g., Twitter, Facebook, Tumblr, YouTube) but then move onto private messages and/or encrypted platforms (e.g., WhatsApp, Telegram). Such encrypted communication channels have also been used by terrorist cells and networks to plan their operations (Gartenstein-Ross and Barr).
To counteract the activities of such organisations, and to halt the spread of radicalisation content, some governments, social media platforms, and counter-extremism agencies are investing in the creation of advanced information technologies to identify and counter extremism through the development of Artificial Intelligent (AI) solutions (Correa and Sureka, 2013; Agarwal and Sureka 2015a; Scrivens and Davies, 2018).
These solutions have three main objectives: (i) understanding the phenomena behind online extremism (the communication flow, the use of propaganda, the different stages of the radicalisation process, the variety of radicalisation channels, etc.), (ii) automatically detecting radical users and content, and (iii) predicting the adoption and spreading of extremist ideas.
Despite current advancements in the area, multiple challenges still exist, including: (i) the lack of a common definition of prohibited radical and extremist internet activity, (ii) the lack of solid verification of the datasets collected to develop detection and prediction models, (iii) the lack of cooperation across research fields, since most of the developed technological solutions are neither based on, nor do they take advantage of, existing social theories and studies of radicalisation, (iv) the constant evolution of behaviours associated with online extremism in order to avoid being detected by the developed algorithms (changes in terminology, creation of new accounts, etc.) and, (v) the development of ethical guidelines and legislation to regulate the design and development of AI technology to counter radicalisation.
In this book chapter we provide an overview of the current technological advancements towards addressing the problem of online extremism (with a particular focus on Jihadism). We identify some of the limitations of current technologies, and highlight some of the potential opportunities. Our aim is to reflect on the current state of the art and to stimulate discussions on the future design and development of AI technology to target the problem of online extremism
Tourist Responses to Tourism Experiences in Saudi Arabia
A decade ago, the Kingdom of Saudi Arabia (KSA) was not perceived to be a popular tourism destination except for religious purposes, the government of KSA has been proactive in recent years building new destinations, changing longstanding policies, focusing on tourism and hospitality education, and renovating its image to attract domestic and international tourists. Tourism contributed to almost 9% of the Kingdom’s GDP in 2018, around 65 billion dollars (WTTC, 2019). The purpose of this paper is to understand the sentiment that tourists have regarding the new tourism campaigns in KSA, to have transparent feedback about the experiences and services mostly adopted by tourists, and to study the feasibility of KSA Vision 2030 regarding the tourism sector. This study will perform an open data analysis by extracting and analyzing data from a well-known online source (Twitter). Results will highlight the utilization of online data tools to measure tourism trends
- …