40,346 research outputs found

    Arabia Felix 2.0: a cross-linguistic Twitter analysis of happiness patterns in the United Arab Emirates

    Get PDF
    © 2019, The Author(s). The global popularity of social media platforms has given rise to unprecedented amounts of data, much of which reflects the thoughts, opinions and affective states of individual users. Systematic explorations of these large datasets can yield valuable information about a variety of psychological and sociocultural variables. The global nature of these platforms makes it important to extend this type of exploration across cultures and languages as each situation is likely to present unique methodological challenges and yield findings particular to the specific sociocultural context. To date, very few studies exploring large social media datasets have focused on the Arab world. This study examined social media use in Arabic and English across the United Arab Emirates (UAE), looking specifically at indicators of subjective wellbeing (happiness) across both languages. A large social media dataset, spanning 2013 to 2017, was extracted from Twitter. More than 17 million Twitter messages (tweets), written in Arabic and English and posted by users based in the UAE, were analyzed. Numerous differences were observed between individuals posting messages (tweeting) in English compared with those posting in Arabic. These differences included significant variations in the mean number of tweets posted, and the mean size of users networks (e.g. the number of followers). Additionally, using lexicon-based sentiment analytic tools (Hedonometer and Valence Shift Word Graphs), temporal patterns of happiness (expressions of positive sentiment) were explored in both languages across all seven regions (Emirates) of the UAE. Findings indicate that 7:00 am was the happiest hour, and Friday was the happiest day for both languages (the least happy day varied by language). The happiest months differed based on language, and there were also significant variations in sentiment patterns, peaks and troughs in happiness, associated with events of sociopolitical and religio-cultural significance for the UAE

    DARIAH and the Benelux

    Get PDF

    Modeling Global Syntactic Variation in English Using Dialect Classification

    Get PDF
    This paper evaluates global-scale dialect identification for 14 national varieties of English as a means for studying syntactic variation. The paper makes three main contributions: (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers

    Unproceedings of the Fourth .Astronomy Conference (.Astronomy 4), Heidelberg, Germany, July 9-11 2012

    Full text link
    The goal of the .Astronomy conference series is to bring together astronomers, educators, developers and others interested in using the Internet as a medium for astronomy. Attendance at the event is limited to approximately 50 participants, and days are split into mornings of scheduled talks, followed by 'unconference' afternoons, where sessions are defined by participants during the course of the event. Participants in unconference sessions are discouraged from formal presentations, with discussion, workshop-style formats or informal practical tutorials encouraged. The conference also designates one day as a 'hack day', in which attendees collaborate in groups on day-long projects for presentation the following morning. These hacks are often a way of concentrating effort, learning new skills, and exploring ideas in a practical fashion. The emphasis on informal, focused interaction makes recording proceedings more difficult than for a normal meeting. While the first .Astronomy conference is preserved formally in a book, more recent iterations are not documented. We therefore, in the spirit of .Astronomy, report 'unproceedings' from .Astronomy 4, which was held in Heidelberg in July 2012.Comment: 11 pages, 1 figure, .Astronomy 4, #dotastr

    Towards Real-Time, Country-Level Location Classification of Worldwide Tweets

    Get PDF
    In contrast to much previous work that has focused on location classification of tweets restricted to a specific country, here we undertake the task in a broader context by classifying global tweets at the country level, which is so far unexplored in a real-time scenario. We analyse the extent to which a tweet's country of origin can be determined by making use of eight tweet-inherent features for classification. Furthermore, we use two datasets, collected a year apart from each other, to analyse the extent to which a model trained from historical tweets can still be leveraged for classification of new tweets. With classification experiments on all 217 countries in our datasets, as well as on the top 25 countries, we offer some insights into the best use of tweet-inherent features for an accurate country-level classification of tweets. We find that the use of a single feature, such as the use of tweet content alone -- the most widely used feature in previous work -- leaves much to be desired. Choosing an appropriate combination of both tweet content and metadata can actually lead to substantial improvements of between 20\% and 50\%. We observe that tweet content, the user's self-reported location and the user's real name, all of which are inherent in a tweet and available in a real-time scenario, are particularly useful to determine the country of origin. We also experiment on the applicability of a model trained on historical tweets to classify new tweets, finding that the choice of a particular combination of features whose utility does not fade over time can actually lead to comparable performance, avoiding the need to retrain. However, the difficulty of achieving accurate classification increases slightly for countries with multiple commonalities, especially for English and Spanish speaking countries.Comment: Accepted for publication in IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE
    corecore