254 research outputs found
Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization
Geographically annotated social media is extremely valuable for modern
information retrieval. However, when researchers can only access
publicly-visible data, one quickly finds that social media users rarely publish
location information. In this work, we provide a method which can geolocate the
overwhelming majority of active Twitter users, independent of their location
sharing preferences, using only publicly-visible Twitter data.
Our method infers an unknown user's location by examining their friend's
locations. We frame the geotagging problem as an optimization over a social
network with a total variation-based objective and provide a scalable and
distributed algorithm for its solution. Furthermore, we show how a robust
estimate of the geographic dispersion of each user's ego network can be used as
a per-user accuracy measure which is effective at removing outlying errors.
Leave-many-out evaluation shows that our method is able to infer location for
101,846,236 Twitter users at a median error of 6.38 km, allowing us to geotag
over 80\% of public tweets.Comment: 9 pages, 8 figures, accepted to IEEE BigData 2014, Compton, Ryan,
David Jurgens, and David Allen. "Geotagging one hundred million twitter
accounts with total variation minimization." Big Data (Big Data), 2014 IEEE
International Conference on. IEEE, 201
Smart, Responsible, and Upper Caste Only: Measuring Caste Attitudes through Large-Scale Analysis of Matrimonial Profiles
Discriminatory caste attitudes currently stigmatize millions of Indians,
subjecting individuals to prejudice in all aspects of life. Governmental
incentives and societal movements have attempted to counter these attitudes,
yet accurate measurements of public opinions on caste are not yet available for
understanding whether progress is being made. Here, we introduce a novel
approach to measure public attitudes of caste through an indicator variable:
openness to intercaste marriage. Using a massive dataset of over 313K profiles
from a major Indian matrimonial site, we precisely quantify public attitudes,
along with differences between generations and between Indian residents and
diaspora. We show that younger generations are more open to intercaste
marriage, yet attitudes are based on a complex function of social status beyond
their own caste. In examining the desired qualities in a spouse, we find that
individuals open to intercaste marriage are more individualistic in the
qualities they desire, rather than favoring family-related qualities, which
mirrors larger societal trends away from collectivism. Finally, we show that
attitudes in diaspora are significantly less open, suggesting a bi-cultural
model of integration. Our research provides the first empirical evidence
identifying how various intersections of identity shape attitudes toward
intercaste marriage in India and among the Indian diaspora in the US.Comment: 12 pages; Accepted to be published at ICWSM'1
Creating Full Individual-level Location Timelines from Sparse Social Media Data
In many domain applications, a continuous timeline of human locations is
critical; for example for understanding possible locations where a disease may
spread, or the flow of traffic. While data sources such as GPS trackers or Call
Data Records are temporally-rich, they are expensive, often not publicly
available or garnered only in select locations, restricting their wide use.
Conversely, geo-located social media data are publicly and freely available,
but present challenges especially for full timeline inference due to their
sparse nature. We propose a stochastic framework, Intermediate Location
Computing (ILC) which uses prior knowledge about human mobility patterns to
predict every missing location from an individual's social media timeline. We
compare ILC with a state-of-the-art RNN baseline as well as methods that are
optimized for next-location prediction only. For three major cities, ILC
predicts the top 1 location for all missing locations in a timeline, at 1 and
2-hour resolution, with up to 77.2% accuracy (up to 6% better accuracy than all
compared methods). Specifically, ILC also outperforms the RNN in settings of
low data; both cases of very small number of users (under 50), as well as
settings with more users, but with sparser timelines. In general, the RNN model
needs a higher number of users to achieve the same performance as ILC. Overall,
this work illustrates the tradeoff between prior knowledge of heuristics and
more data, for an important societal problem of filling in entire timelines
using freely available, but sparse social media data.Comment: 10 pages, 8 figures, 2 table
The structure of online social networks modulates the rate of lexical change
New words are regularly introduced to communities, yet not all of these words
persist in a community's lexicon. Among the many factors contributing to
lexical change, we focus on the understudied effect of social networks. We
conduct a large-scale analysis of over 80k neologisms in 4420 online
communities across a decade. Using Poisson regression and survival analysis,
our study demonstrates that the community's network structure plays a
significant role in lexical change. Apart from overall size, properties
including dense connections, the lack of local clusters and more external
contacts promote lexical innovation and retention. Unlike offline communities,
these topic-based communities do not experience strong lexical levelling
despite increased contact but accommodate more niche words. Our work provides
support for the sociolinguistic hypothesis that lexical change is partially
shaped by the structure of the underlying network but also uncovers findings
specific to online communities.Comment: NAACL 202
Social Meme-ing: Measuring Linguistic Variation in Memes
Much work in the space of NLP has used computational methods to explore
sociolinguistic variation in text. In this paper, we argue that memes, as
multimodal forms of language comprised of visual templates and text, also
exhibit meaningful social variation. We construct a computational pipeline to
cluster individual instances of memes into templates and semantic variables,
taking advantage of their multimodal structure in doing so. We apply this
method to a large collection of meme images from Reddit and make available the
resulting \textsc{SemanticMemes} dataset of 3.8M images clustered by their
semantic function. We use these clusters to analyze linguistic variation in
memes, discovering not only that socially meaningful variation in meme usage
exists between subreddits, but that patterns of meme innovation and
acculturation within these communities align with previous findings on written
language
Author Mentions in Science News Reveal Wide-Spread Ethnic Bias
Media outlets play a key role in spreading scientific knowledge to the
general public and raising the profile of researchers among their peers. Yet,
given time and space constraints, not all scholars can receive equal media
attention, and journalists' choices of whom to mention are poorly understood.
In this study, we use a comprehensive dataset of 232,524 news stories from 288
U.S.-based outlets covering 100,208 research papers across all sciences to
investigate the rates at which scientists of different ethnicities are
mentioned by name. We find strong evidence of ethnic biases in author mentions,
even after controlling for a wide range of possible confounds. Specifically,
authors with non-British-origin names are significantly less likely to be
mentioned or quoted than comparable British-origin named authors, even within
the stories of a particular news outlet covering a particular scientific venue
on a particular research topic. Instead, minority scholars are more likely to
have their names substituted with their role at their institution. This ethnic
bias is consistent across all types of media outlets, with even larger
disparities in General-Interest outlets that tend to publish longer stories and
have dedicated editorial teams for accurately reporting science. Our findings
reveal that the perceived ethnicity can substantially shape scientists' media
attention, and, by our estimation, this bias has affected thousands of scholars
unfairly.Comment: 68 pages, 8 figures, 11 table
Are All Successful Communities Alike? Characterizing and Predicting the Success of Online Communities
The proliferation of online communities has created exciting opportunities to
study the mechanisms that explain group success. While a growing body of
research investigates community success through a single measure -- typically,
the number of members -- we argue that there are multiple ways of measuring
success. Here, we present a systematic study to understand the relations
between these success definitions and test how well they can be predicted based
on community properties and behaviors from the earliest period of a community's
lifetime. We identify four success measures that are desirable for most
communities: (i) growth in the number of members; (ii) retention of members;
(iii) long term survival of the community; and (iv) volume of activities within
the community. Surprisingly, we find that our measures do not exhibit very high
correlations, suggesting that they capture different types of success.
Additionally, we find that different success measures are predicted by
different attributes of online communities, suggesting that success can be
achieved through different behaviors. Our work sheds light on the basic
understanding of what success represents in online communities and what
predicts it. Our results suggest that success is multi-faceted and cannot be
measured nor predicted by a single measurement. This insight has practical
implications for the creation of new online communities and the design of
platforms that facilitate such communities.Comment: To appear at The Web Conference 201
Analyzing the Engagement of Social Relationships During Life Event Shocks in Social Media
Individuals experiencing unexpected distressing events, shocks, often rely on
their social network for support. While prior work has shown how social
networks respond to shocks, these studies usually treat all ties equally,
despite differences in the support provided by different social relationships.
Here, we conduct a computational analysis on Twitter that examines how
responses to online shocks differ by the relationship type of a user dyad. We
introduce a new dataset of over 13K instances of individuals' self-reporting
shock events on Twitter and construct networks of relationship-labeled dyadic
interactions around these events. By examining behaviors across 110K replies to
shocked users in a pseudo-causal analysis, we demonstrate relationship-specific
patterns in response levels and topic shifts. We also show that while
well-established social dimensions of closeness such as tie strength and
structural embeddedness contribute to shock responsiveness, the degree of
impact is highly dependent on relationship and shock types. Our findings
indicate that social relationships contain highly distinctive characteristics
in network interactions and that relationship-specific behaviors in online
shock responses are unique from those of offline settings.Comment: Accepted to ICWSM 2023. 12 pages, 5 figures, 5 table
- …