28 research outputs found
Illuminating an Ecosystem of Partisan Websites
This paper aims to shed light on alternative news media ecosystems that are
believed to have influenced opinions and beliefs by false and/or biased news
reporting during the 2016 US Presidential Elections. We examine a large,
professionally curated list of 668 hyper-partisan websites and their
corresponding Facebook pages, and identify key characteristics that mediate the
traffic flow within this ecosystem. We uncover a pattern of new websites being
established in the run up to the elections, and abandoned after. Such websites
form an ecosystem, creating links from one website to another, and by `liking'
each others' Facebook pages. These practices are highly effective in directing
user traffic internally within the ecosystem in a highly partisan manner, with
right-leaning sites linking to and liking other right-leaning sites and
similarly left-leaning sites linking to other sites on the left, thus forming a
filter bubble amongst news producers similar to the filter bubble which has
been widely observed among consumers of partisan news. Whereas there is
activity along both left- and right-leaning sites, right-leaning sites are more
evolved, accounting for a disproportionate number of abandoned websites and
partisan internal links. We also examine demographic characteristics of
consumers of hyper-partisan news and find that some of the more populous
demographic groups in the US tend to be consumers of more right-leaning sites.Comment: Published at The Web Conference 2018 (WWW 2018). Please cite the WWW
versio
Multidimensional Tie Strength and Economic Development
The strength of social relations has been shown to affect an individual’s access to opportunities. To date, however, the correspondence between tie strength and population’s economic prospects has not been quantified, largely because of the inability to operationalise strength based on Granovetter’s classic theory. Our work departed from the premise that tie strength is a unidimensional construct (typically operationalized with frequency or volume of contact), and used instead a validated model of ten fundamental dimensions of social relationships grounded in the literature of social psychology. We built state-of-the-art NLP tools to infer the presence of these dimensions from textual communication, and analyzed a large conversation network of 630K geo-referenced Reddit users across the entire US connected by 12.8M social ties created over the span of 7 years. We found that unidimensional tie strength is only weakly correlated with economic opportunities ([Formula: see text] ), while multidimensional constructs are highly correlated ([Formula: see text] ). In particular, economic opportunities are associated to the combination of: (i) knowledge ties, which bridge geographically distant groups, facilitating the knowledge dissemination across communities; and (ii) social support ties, which knit geographically close communities together, and represent dependable sources of social and emotional support. These results point to the importance of developing high-quality measures of tie strength in network theory
A Graph-Based Context-Aware Model to Understand Online Conversations
Online forums that allow for participatory engagement between users have been
transformative for the public discussion of many important issues. However,
such conversations can sometimes escalate into full-blown exchanges of hate and
misinformation. Existing approaches in natural language processing (NLP), such
as deep learning models for classification tasks, use as inputs only a single
comment or a pair of comments depending upon whether the task concerns the
inference of properties of the individual comments or the replies between pairs
of comments, respectively. But in online conversations, comments and replies
may be based on external context beyond the immediately relevant information
that is input to the model. Therefore, being aware of the conversations'
surrounding contexts should improve the model's performance for the inference
task at hand.
We propose GraphNLI, a novel graph-based deep learning architecture that uses
graph walks to incorporate the wider context of a conversation in a principled
manner. Specifically, a graph walk starts from a given comment and samples
"nearby" comments in the same or parallel conversation threads, which results
in additional embeddings that are aggregated together with the initial
comment's embedding. We then use these enriched embeddings for downstream NLP
prediction tasks that are important for online conversations. We evaluate
GraphNLI on two such tasks - polarity prediction and misogynistic hate speech
detection - and found that our model consistently outperforms all relevant
baselines for both tasks. Specifically, GraphNLI with a biased root-seeking
random walk performs with a macro-F1 score of 3 and 6 percentage points better
than the best-performing BERT-based baselines for the polarity prediction and
hate speech detection tasks, respectively.Comment: 25 pages, 9 figures. arXiv admin note: text overlap with
arXiv:2202.0817
Fake it till you make it:Fishing for Catfishes
Many adult content websites incorporate social networking features. Although these are popular, they raise significant challenges, including the potential for users to "catfish", i.e., to create fake profiles to deceive other users. This paper takes an initial step towards automated catfish detection. We explore the characteristics of the different age and gender groups, identifying a number of distinctions. Through this, we train models based on user profiles and comments, via the ground truth of specially verified profiles. Applying our models for age and gender estimation of unverified profiles, we identify 38% of profiles who are likely lying about their age, and 25% who are likely lying about their gender. We find that women have a greater propensity to catfish than men. Further, whereas women catfish select from a wide age range, men consistently lie about being younger. Our work has notable implications on operators of such online social networks, as well as users who may worry about interacting with catfishes
Fake it till you make it: Fishing for Catfishes
Many adult content websites incorporate social networking features. Although these are popular, they raise significant challenges, including the potential for users to "catfish", i.e., to create fake profiles to deceive other users. This paper takes an initial step towards automated catfish detection. We explore the characteristics of the different age and gender groups, identifying a number of distinctions. Through this, we train models based on user profiles and comments, via the ground truth of specially verified profiles. Applying our models for age and gender estimation of unverified profiles, we identify 38% of profiles who are likely lying about their age, and 25% who are likely lying about their gender. We find that women have a greater propensity to catfish than men. Further, whereas women catfish select from a wide age range, men consistently lie about being younger. Our work has notable implications on operators of such online social networks, as well as users who may worry about interacting with catfishes
Our Nudges, Our Selves: Tailoring Mobile User Engagement Using Personality
To increase mobile user engagement, current apps employ a variety of
behavioral nudges, but these engagement techniques are applied in a
one-size-fits-all approach. Yet the very same techniques may be perceived
differently by different individuals. To test this, we developed HarrySpotter,
a location-based AR app that embedded six engagement techniques. We deployed it
in a 2-week study involving 29 users who also took the Big-Five personality
test. Preferences for specific engagement techniques are not only descriptive
but also predictive of personality traits. The Adj. ranges from 0.16 for
conscientious users (encouraged by competition) to 0.32 for neurotic users
(self-centered and focused on their own achievements), and even up to 0.61 for
extroverts (motivated by both exploration of objects and places). These
findings suggest that these techniques need to be personalized in the future.Comment: 10 pages, 1 figure, 2 table
Characterising User Content on a Multi-lingual Social Network
Social media has been on the vanguard of political information diffusion in
the 21st century. Most studies that look into disinformation, political
influence and fake-news focus on mainstream social media platforms. This has
inevitably made English an important factor in our current understanding of
political activity on social media. As a result, there has only been a limited
number of studies into a large portion of the world, including the largest,
multilingual and multi-cultural democracy: India. In this paper we present our
characterisation of a multilingual social network in India called ShareChat. We
collect an exhaustive dataset across 72 weeks before and during the Indian
general elections of 2019, across 14 languages. We investigate the cross
lingual dynamics by clustering visually similar images together, and exploring
how they move across language barriers. We find that Telugu, Malayalam, Tamil
and Kannada languages tend to be dominant in soliciting political images (often
referred to as memes), and posts from Hindi have the largest cross-lingual
diffusion across ShareChat (as well as images containing text in English). In
the case of images containing text that cross language barriers, we see that
language translation is used to widen the accessibility. That said, we find
cases where the same image is associated with very different text (and
therefore meanings). This initial characterisation paves the way for more
advanced pipelines to understand the dynamics of fake and political content in
a multi-lingual and non-textual setting.Comment: Accepted at ICWSM 2020, please cite the ICWSM versio
On the adoption of e-moped sharing systems
AbstractRecent years have witnessed the emerging of novel shared mobility solutions that provide diffused on-demand access to transportation. The widespread adoption of these solutions, particularly electric mopeds (e-mopeds), is expected to bring important benefits such as the reduction of noise and atmospheric pollution, and road congestion, with extensive repercussions on liveability and quality of life in urban areas. Currently, almost no effort has been devoted to exploring the adoption patterns of e-moped sharing services, therefore, optimal management and allocation of vehicles appears to be a problem for service managers. In this study, we tried to demonstrate the validity of the hypothesis that the adoption of electric mopeds depends on the built environment and demographic aspects of each neighbourhood. In detail, we singled out three features concerning the area characteristics (distance from centre, walkability, concentration of places) and one about the population (education index). The results obtained on a real world case study show the strong impact these factors have in determining the adoption of e-moped sharing services. Finally, an analysis was conducted on the possible role that the electric moped sharing can play in social equalization by studying the interactions between rich and poor neighbourhoods. The results of the analyses conducted indicate that communities within a city tend to aggregate by wealth and isolate themselves from one another (social isolation): very few interactions, in terms of trajectories, have been observed between the richest and poorest areas of the city under study
Who has the last word? Understanding How to Sample Online Discussions
In online debates individual arguments support or attack each other, leading
to some subset of arguments being considered more relevant than others.
However, in large discussions readers are often forced to sample a subset of
the arguments being put forth. Since such sampling is rarely done in a
principled manner, users may not read all the relevant arguments to get a full
picture of the debate. This paper is interested in answering the question of
how users should sample online conversations to selectively favour the
currently justified or accepted positions in the debate. We apply techniques
from argumentation theory and complex networks to build a model that predicts
the probabilities of the normatively justified arguments given their location
in online discussions. Our model shows that the proportion of replies that are
supportive, the number of replies that comments receive, and the locations of
un-replied comments all determine the probability that a comment is a justified
argument. We show that when the degree distribution of the number of replies is
homogeneous along the discussion, for acrimonious discussions, the distribution
of justified arguments depends on the parity of the graph level. In supportive
discussions the probability of having justified comments increases as one moves
away from the root. For discussion trees that have a non-homogeneous in-degree
distribution, for supportive discussions we observe the same behaviour as
before, while for acrimonious discussions we cannot observe the same
parity-based distribution. This is verified with data obtained from the online
debating platform Kialo. By predicting the locations of the justified arguments
in reply trees, we can suggest which arguments readers should sample to grasp
the currently accepted opinions in such discussions. Our models have important
implications for the design of future online debating platforms
The Healthy States of America: Creating a Health Taxonomy with Social Media
Since the uptake of social media, researchers have mined online discussions to track the outbreak and evolution of specific diseases or chronic conditions such as influenza or depression. To broaden the set of diseases under study, we developed a Deep Learning tool for Natural Language Processing that extracts mentions of virtually any medical condition or disease from unstructured social media text. With that tool at hand, we processed Reddit and Twitter posts, analyzed the clusters of the two resulting co-occurrence networks of conditions, and discovered that they correspond to well-defined categories of medical conditions. This resulted in the creation of the first comprehensive taxonomy of medical conditions automatically derived from online discussions. We validated the structure of our taxonomy against the official International Statistical Classification of Diseases and Related Health Problems (ICD-11), finding matches of our clusters with 20 official categories, out of 22. Based on the mentions of our taxonomy's sub-categories on Reddit posts geo-referenced in the U.S., we were then able to compute disease-specific health scores. As opposed to counts of disease mentions or counts with no knowledge of our taxonomy's structure, we found that our disease-specific health scores are causally linked with the officially reported prevalences of 18 conditions