40,346 research outputs found
Arabia Felix 2.0: a cross-linguistic Twitter analysis of happiness patterns in the United Arab Emirates
© 2019, The Author(s). The global popularity of social media platforms has given rise to unprecedented amounts of data, much of which reflects the thoughts, opinions and affective states of individual users. Systematic explorations of these large datasets can yield valuable information about a variety of psychological and sociocultural variables. The global nature of these platforms makes it important to extend this type of exploration across cultures and languages as each situation is likely to present unique methodological challenges and yield findings particular to the specific sociocultural context. To date, very few studies exploring large social media datasets have focused on the Arab world. This study examined social media use in Arabic and English across the United Arab Emirates (UAE), looking specifically at indicators of subjective wellbeing (happiness) across both languages. A large social media dataset, spanning 2013 to 2017, was extracted from Twitter. More than 17 million Twitter messages (tweets), written in Arabic and English and posted by users based in the UAE, were analyzed. Numerous differences were observed between individuals posting messages (tweeting) in English compared with those posting in Arabic. These differences included significant variations in the mean number of tweets posted, and the mean size of users networks (e.g. the number of followers). Additionally, using lexicon-based sentiment analytic tools (Hedonometer and Valence Shift Word Graphs), temporal patterns of happiness (expressions of positive sentiment) were explored in both languages across all seven regions (Emirates) of the UAE. Findings indicate that 7:00 am was the happiest hour, and Friday was the happiest day for both languages (the least happy day varied by language). The happiest months differed based on language, and there were also significant variations in sentiment patterns, peaks and troughs in happiness, associated with events of sociopolitical and religio-cultural significance for the UAE
Modeling Global Syntactic Variation in English Using Dialect Classification
This paper evaluates global-scale dialect identification for 14 national
varieties of English as a means for studying syntactic variation. The paper
makes three main contributions: (i) introducing data-driven language mapping as
a method for selecting the inventory of national varieties to include in the
task; (ii) producing a large and dynamic set of syntactic features using
grammar induction rather than focusing on a few hand-selected features such as
function words; and (iii) comparing models across both web corpora and social
media corpora in order to measure the robustness of syntactic variation across
registers
Unproceedings of the Fourth .Astronomy Conference (.Astronomy 4), Heidelberg, Germany, July 9-11 2012
The goal of the .Astronomy conference series is to bring together
astronomers, educators, developers and others interested in using the Internet
as a medium for astronomy. Attendance at the event is limited to approximately
50 participants, and days are split into mornings of scheduled talks, followed
by 'unconference' afternoons, where sessions are defined by participants during
the course of the event. Participants in unconference sessions are discouraged
from formal presentations, with discussion, workshop-style formats or informal
practical tutorials encouraged. The conference also designates one day as a
'hack day', in which attendees collaborate in groups on day-long projects for
presentation the following morning. These hacks are often a way of
concentrating effort, learning new skills, and exploring ideas in a practical
fashion. The emphasis on informal, focused interaction makes recording
proceedings more difficult than for a normal meeting. While the first
.Astronomy conference is preserved formally in a book, more recent iterations
are not documented. We therefore, in the spirit of .Astronomy, report
'unproceedings' from .Astronomy 4, which was held in Heidelberg in July 2012.Comment: 11 pages, 1 figure, .Astronomy 4, #dotastr
Towards Real-Time, Country-Level Location Classification of Worldwide Tweets
In contrast to much previous work that has focused on location classification
of tweets restricted to a specific country, here we undertake the task in a
broader context by classifying global tweets at the country level, which is so
far unexplored in a real-time scenario. We analyse the extent to which a
tweet's country of origin can be determined by making use of eight
tweet-inherent features for classification. Furthermore, we use two datasets,
collected a year apart from each other, to analyse the extent to which a model
trained from historical tweets can still be leveraged for classification of new
tweets. With classification experiments on all 217 countries in our datasets,
as well as on the top 25 countries, we offer some insights into the best use of
tweet-inherent features for an accurate country-level classification of tweets.
We find that the use of a single feature, such as the use of tweet content
alone -- the most widely used feature in previous work -- leaves much to be
desired. Choosing an appropriate combination of both tweet content and metadata
can actually lead to substantial improvements of between 20\% and 50\%. We
observe that tweet content, the user's self-reported location and the user's
real name, all of which are inherent in a tweet and available in a real-time
scenario, are particularly useful to determine the country of origin. We also
experiment on the applicability of a model trained on historical tweets to
classify new tweets, finding that the choice of a particular combination of
features whose utility does not fade over time can actually lead to comparable
performance, avoiding the need to retrain. However, the difficulty of achieving
accurate classification increases slightly for countries with multiple
commonalities, especially for English and Spanish speaking countries.Comment: Accepted for publication in IEEE Transactions on Knowledge and Data
Engineering (IEEE TKDE
- …