28 research outputs found

    Illuminating an Ecosystem of Partisan Websites

    Full text link
    This paper aims to shed light on alternative news media ecosystems that are believed to have influenced opinions and beliefs by false and/or biased news reporting during the 2016 US Presidential Elections. We examine a large, professionally curated list of 668 hyper-partisan websites and their corresponding Facebook pages, and identify key characteristics that mediate the traffic flow within this ecosystem. We uncover a pattern of new websites being established in the run up to the elections, and abandoned after. Such websites form an ecosystem, creating links from one website to another, and by `liking' each others' Facebook pages. These practices are highly effective in directing user traffic internally within the ecosystem in a highly partisan manner, with right-leaning sites linking to and liking other right-leaning sites and similarly left-leaning sites linking to other sites on the left, thus forming a filter bubble amongst news producers similar to the filter bubble which has been widely observed among consumers of partisan news. Whereas there is activity along both left- and right-leaning sites, right-leaning sites are more evolved, accounting for a disproportionate number of abandoned websites and partisan internal links. We also examine demographic characteristics of consumers of hyper-partisan news and find that some of the more populous demographic groups in the US tend to be consumers of more right-leaning sites.Comment: Published at The Web Conference 2018 (WWW 2018). Please cite the WWW versio

    Multidimensional Tie Strength and Economic Development

    Get PDF
    The strength of social relations has been shown to affect an individual’s access to opportunities. To date, however, the correspondence between tie strength and population’s economic prospects has not been quantified, largely because of the inability to operationalise strength based on Granovetter’s classic theory. Our work departed from the premise that tie strength is a unidimensional construct (typically operationalized with frequency or volume of contact), and used instead a validated model of ten fundamental dimensions of social relationships grounded in the literature of social psychology. We built state-of-the-art NLP tools to infer the presence of these dimensions from textual communication, and analyzed a large conversation network of 630K geo-referenced Reddit users across the entire US connected by 12.8M social ties created over the span of 7 years. We found that unidimensional tie strength is only weakly correlated with economic opportunities ([Formula: see text] ), while multidimensional constructs are highly correlated ([Formula: see text] ). In particular, economic opportunities are associated to the combination of: (i) knowledge ties, which bridge geographically distant groups, facilitating the knowledge dissemination across communities; and (ii) social support ties, which knit geographically close communities together, and represent dependable sources of social and emotional support. These results point to the importance of developing high-quality measures of tie strength in network theory

    A Graph-Based Context-Aware Model to Understand Online Conversations

    Full text link
    Online forums that allow for participatory engagement between users have been transformative for the public discussion of many important issues. However, such conversations can sometimes escalate into full-blown exchanges of hate and misinformation. Existing approaches in natural language processing (NLP), such as deep learning models for classification tasks, use as inputs only a single comment or a pair of comments depending upon whether the task concerns the inference of properties of the individual comments or the replies between pairs of comments, respectively. But in online conversations, comments and replies may be based on external context beyond the immediately relevant information that is input to the model. Therefore, being aware of the conversations' surrounding contexts should improve the model's performance for the inference task at hand. We propose GraphNLI, a novel graph-based deep learning architecture that uses graph walks to incorporate the wider context of a conversation in a principled manner. Specifically, a graph walk starts from a given comment and samples "nearby" comments in the same or parallel conversation threads, which results in additional embeddings that are aggregated together with the initial comment's embedding. We then use these enriched embeddings for downstream NLP prediction tasks that are important for online conversations. We evaluate GraphNLI on two such tasks - polarity prediction and misogynistic hate speech detection - and found that our model consistently outperforms all relevant baselines for both tasks. Specifically, GraphNLI with a biased root-seeking random walk performs with a macro-F1 score of 3 and 6 percentage points better than the best-performing BERT-based baselines for the polarity prediction and hate speech detection tasks, respectively.Comment: 25 pages, 9 figures. arXiv admin note: text overlap with arXiv:2202.0817

    Fake it till you make it:Fishing for Catfishes

    Get PDF
    Many adult content websites incorporate social networking features. Although these are popular, they raise significant challenges, including the potential for users to "catfish", i.e., to create fake profiles to deceive other users. This paper takes an initial step towards automated catfish detection. We explore the characteristics of the different age and gender groups, identifying a number of distinctions. Through this, we train models based on user profiles and comments, via the ground truth of specially verified profiles. Applying our models for age and gender estimation of unverified profiles, we identify 38% of profiles who are likely lying about their age, and 25% who are likely lying about their gender. We find that women have a greater propensity to catfish than men. Further, whereas women catfish select from a wide age range, men consistently lie about being younger. Our work has notable implications on operators of such online social networks, as well as users who may worry about interacting with catfishes

    Fake it till you make it: Fishing for Catfishes

    Get PDF
    Many adult content websites incorporate social networking features. Although these are popular, they raise significant challenges, including the potential for users to "catfish", i.e., to create fake profiles to deceive other users. This paper takes an initial step towards automated catfish detection. We explore the characteristics of the different age and gender groups, identifying a number of distinctions. Through this, we train models based on user profiles and comments, via the ground truth of specially verified profiles. Applying our models for age and gender estimation of unverified profiles, we identify 38% of profiles who are likely lying about their age, and 25% who are likely lying about their gender. We find that women have a greater propensity to catfish than men. Further, whereas women catfish select from a wide age range, men consistently lie about being younger. Our work has notable implications on operators of such online social networks, as well as users who may worry about interacting with catfishes

    Our Nudges, Our Selves: Tailoring Mobile User Engagement Using Personality

    Full text link
    To increase mobile user engagement, current apps employ a variety of behavioral nudges, but these engagement techniques are applied in a one-size-fits-all approach. Yet the very same techniques may be perceived differently by different individuals. To test this, we developed HarrySpotter, a location-based AR app that embedded six engagement techniques. We deployed it in a 2-week study involving 29 users who also took the Big-Five personality test. Preferences for specific engagement techniques are not only descriptive but also predictive of personality traits. The Adj. R2R^2 ranges from 0.16 for conscientious users (encouraged by competition) to 0.32 for neurotic users (self-centered and focused on their own achievements), and even up to 0.61 for extroverts (motivated by both exploration of objects and places). These findings suggest that these techniques need to be personalized in the future.Comment: 10 pages, 1 figure, 2 table

    Characterising User Content on a Multi-lingual Social Network

    Full text link
    Social media has been on the vanguard of political information diffusion in the 21st century. Most studies that look into disinformation, political influence and fake-news focus on mainstream social media platforms. This has inevitably made English an important factor in our current understanding of political activity on social media. As a result, there has only been a limited number of studies into a large portion of the world, including the largest, multilingual and multi-cultural democracy: India. In this paper we present our characterisation of a multilingual social network in India called ShareChat. We collect an exhaustive dataset across 72 weeks before and during the Indian general elections of 2019, across 14 languages. We investigate the cross lingual dynamics by clustering visually similar images together, and exploring how they move across language barriers. We find that Telugu, Malayalam, Tamil and Kannada languages tend to be dominant in soliciting political images (often referred to as memes), and posts from Hindi have the largest cross-lingual diffusion across ShareChat (as well as images containing text in English). In the case of images containing text that cross language barriers, we see that language translation is used to widen the accessibility. That said, we find cases where the same image is associated with very different text (and therefore meanings). This initial characterisation paves the way for more advanced pipelines to understand the dynamics of fake and political content in a multi-lingual and non-textual setting.Comment: Accepted at ICWSM 2020, please cite the ICWSM versio

    On the adoption of e-moped sharing systems

    Get PDF
    AbstractRecent years have witnessed the emerging of novel shared mobility solutions that provide diffused on-demand access to transportation. The widespread adoption of these solutions, particularly electric mopeds (e-mopeds), is expected to bring important benefits such as the reduction of noise and atmospheric pollution, and road congestion, with extensive repercussions on liveability and quality of life in urban areas. Currently, almost no effort has been devoted to exploring the adoption patterns of e-moped sharing services, therefore, optimal management and allocation of vehicles appears to be a problem for service managers. In this study, we tried to demonstrate the validity of the hypothesis that the adoption of electric mopeds depends on the built environment and demographic aspects of each neighbourhood. In detail, we singled out three features concerning the area characteristics (distance from centre, walkability, concentration of places) and one about the population (education index). The results obtained on a real world case study show the strong impact these factors have in determining the adoption of e-moped sharing services. Finally, an analysis was conducted on the possible role that the electric moped sharing can play in social equalization by studying the interactions between rich and poor neighbourhoods. The results of the analyses conducted indicate that communities within a city tend to aggregate by wealth and isolate themselves from one another (social isolation): very few interactions, in terms of trajectories, have been observed between the richest and poorest areas of the city under study

    Who has the last word? Understanding How to Sample Online Discussions

    Get PDF
    In online debates individual arguments support or attack each other, leading to some subset of arguments being considered more relevant than others. However, in large discussions readers are often forced to sample a subset of the arguments being put forth. Since such sampling is rarely done in a principled manner, users may not read all the relevant arguments to get a full picture of the debate. This paper is interested in answering the question of how users should sample online conversations to selectively favour the currently justified or accepted positions in the debate. We apply techniques from argumentation theory and complex networks to build a model that predicts the probabilities of the normatively justified arguments given their location in online discussions. Our model shows that the proportion of replies that are supportive, the number of replies that comments receive, and the locations of un-replied comments all determine the probability that a comment is a justified argument. We show that when the degree distribution of the number of replies is homogeneous along the discussion, for acrimonious discussions, the distribution of justified arguments depends on the parity of the graph level. In supportive discussions the probability of having justified comments increases as one moves away from the root. For discussion trees that have a non-homogeneous in-degree distribution, for supportive discussions we observe the same behaviour as before, while for acrimonious discussions we cannot observe the same parity-based distribution. This is verified with data obtained from the online debating platform Kialo. By predicting the locations of the justified arguments in reply trees, we can suggest which arguments readers should sample to grasp the currently accepted opinions in such discussions. Our models have important implications for the design of future online debating platforms

    The Healthy States of America: Creating a Health Taxonomy with Social Media

    Get PDF
    Since the uptake of social media, researchers have mined online discussions to track the outbreak and evolution of specific diseases or chronic conditions such as influenza or depression. To broaden the set of diseases under study, we developed a Deep Learning tool for Natural Language Processing that extracts mentions of virtually any medical condition or disease from unstructured social media text. With that tool at hand, we processed Reddit and Twitter posts, analyzed the clusters of the two resulting co-occurrence networks of conditions, and discovered that they correspond to well-defined categories of medical conditions. This resulted in the creation of the first comprehensive taxonomy of medical conditions automatically derived from online discussions. We validated the structure of our taxonomy against the official International Statistical Classification of Diseases and Related Health Problems (ICD-11), finding matches of our clusters with 20 official categories, out of 22. Based on the mentions of our taxonomy's sub-categories on Reddit posts geo-referenced in the U.S., we were then able to compute disease-specific health scores. As opposed to counts of disease mentions or counts with no knowledge of our taxonomy's structure, we found that our disease-specific health scores are causally linked with the officially reported prevalences of 18 conditions
    corecore