410 research outputs found

    Inferring Social Media Users’ Demographics from Profile Pictures: A Face++ Analysis on Twitter Users

    Get PDF
    In this research, we evaluate the applicability of using facial recognition of social media account profile pictures to infer the demographic attributes of gender, race, and age of the account owners leveraging a commercial and well-known image service, specifically Face++. Our goal is to determine the feasibility of this approach for actual system implementation. Using a dataset of approximately 10,000 Twitter profile pictures, we use Face++ to classify this set of images for gender, race, and age. We determine that about 30% of these profile pictures contain identifiable images of people using the current state-of-the-art automated means. We then employ human evaluations to manually tag both the set of images that were determined to contain faces and the set that was determined not to contain faces, comparing the results to Face++. Of the thirty percent that Face++ identified as containing a face, about 80% are more likely than not the account holder based on our manual classification, with a variety of issues in the remaining 20%. Of the images that Face++ was unable to detect a face, we isolate a variety of likely issues preventing this detection, when a face actually appeared in the image. Overall, we find the applicability of automatic facial recognition to infer demographics for system development to be problematic, despite the reported high accuracy achieved for image test collection

    Towards Real-Time, Country-Level Location Classification of Worldwide Tweets

    Get PDF
    In contrast to much previous work that has focused on location classification of tweets restricted to a specific country, here we undertake the task in a broader context by classifying global tweets at the country level, which is so far unexplored in a real-time scenario. We analyse the extent to which a tweet's country of origin can be determined by making use of eight tweet-inherent features for classification. Furthermore, we use two datasets, collected a year apart from each other, to analyse the extent to which a model trained from historical tweets can still be leveraged for classification of new tweets. With classification experiments on all 217 countries in our datasets, as well as on the top 25 countries, we offer some insights into the best use of tweet-inherent features for an accurate country-level classification of tweets. We find that the use of a single feature, such as the use of tweet content alone -- the most widely used feature in previous work -- leaves much to be desired. Choosing an appropriate combination of both tweet content and metadata can actually lead to substantial improvements of between 20\% and 50\%. We observe that tweet content, the user's self-reported location and the user's real name, all of which are inherent in a tweet and available in a real-time scenario, are particularly useful to determine the country of origin. We also experiment on the applicability of a model trained on historical tweets to classify new tweets, finding that the choice of a particular combination of features whose utility does not fade over time can actually lead to comparable performance, avoiding the need to retrain. However, the difficulty of achieving accurate classification increases slightly for countries with multiple commonalities, especially for English and Spanish speaking countries.Comment: Accepted for publication in IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE

    Location Inference for Non-geotagged Tweets in User Timelines

    Get PDF

    What demographic attributes do our digital footprints reveal? A systematic review

    Get PDF
    <div><p>To what extent does our online activity reveal who we are? Recent research has demonstrated that the digital traces left by individuals as they browse and interact with others online may reveal who they are and what their interests may be. In the present paper we report a systematic review that synthesises current evidence on predicting demographic attributes from online digital traces. Studies were included if they met the following criteria: (i) they reported findings where at least one demographic attribute was predicted/inferred from at least one form of digital footprint, (ii) the method of prediction was automated, and (iii) the traces were either visible (e.g. tweets) or non-visible (e.g. clickstreams). We identified 327 studies published up until October 2018. Across these articles, 14 demographic attributes were successfully inferred from digital traces; the most studied included gender, age, location, and political orientation. For each of the demographic attributes identified, we provide a database containing the platforms and digital traces examined, sample sizes, accuracy measures and the classification methods applied. Finally, we discuss the main research trends/findings, methodological approaches and recommend directions for future research.</p></div

    Refugees Welcome? Online Hate Speech and Sentiments in Twitter in Spain during the Reception of the Boat Aquarius

    Get PDF
    High-profile events can trigger expressions of hate speech online, which in turn modifies attitudes and offline behavior towards stigmatized groups. This paper addresses the first path of this process using manual and computational methods to analyze the stream of Twitter messages in Spanish around the boat Aquarius (n = 24,254) before and after the announcement of the Spanish government to welcome the boat in June 2018, a milestone for asylum seekers acceptance in the EU and an event that was highly covered by media. It was observed that most of the messages were related to a few topics and had a generally positive sentiment, although a significant part of messages expressed rejection or hate—often supported by stereotypes and lies—towards refugees and migrants and towards politicians. These expressions grew after the announcement of hosting the boat, although the general sentiment of the messages became more positive. We discuss the theoretical, practical, and methodological implications of the study, and acknowledge limitations referred to the examined timeframe and to the preliminary condition of the conclusions

    Applications of new forms of data to demographics

    Get PDF
    At the outset, this thesis sets out to address limitations in conventional population data for the representation of stocks and flows of human populations. Until now, many of the data available for studying population behaviour have been static in nature, often collected on an infrequent basis or in an inconsistent manner. However, rapid expansion in the use of online technologies has led to the generation of a huge volume of data as a byproduct of individuals’ online activities. This thesis sets out to exploit just one of these new data channels: raw geographically referenced messages collected by the Twitter Online Social Network. The thesis develops a framework for the creation of functional population inventories from Twitter. Through the application of various data mining and heuristic techniques, individual Twitter users are attributed with key demographic markers including age, gender, ethnicity and place of residence. However, while these inventories possess the required data structure for analysis, little is understood about whom they represent and for what purposes they may be reliably employed. Thus a primary focus of this thesis is the assessment of Twitter-based population inventories at a range of spatial scales from the local to the global. More specifically, the assessment considers issues of age, gender, ethnicity, geographic distribution and surname composition. The value of such rich data is demonstrated in the final chapter in which a detailed analysis of the stocks and flows of peoples within the four largest London airports is undertaken. The analysis demonstrates both the extraction of conventional insight, such as passenger statistics and new insights such as footfall and sentiment. The thesis concludes with recommendations for the ways in which social media analysis may be used in demographics to supplement the analysis of populations using conventional sources of data

    Towards Real-Time, Country-Level Location Classification of Worldwide Tweets

    Full text link
    • …
    corecore