8,936 research outputs found
Semantic Stability in Social Tagging Streams
One potential disadvantage of social tagging systems is that due to the lack
of a centralized vocabulary, a crowd of users may never manage to reach a
consensus on the description of resources (e.g., books, users or songs) on the
Web. Yet, previous research has provided interesting evidence that the tag
distributions of resources may become semantically stable over time as more and
more users tag them. At the same time, previous work has raised an array of new
questions such as: (i) How can we assess the semantic stability of social
tagging systems in a robust and methodical way? (ii) Does semantic
stabilization of tags vary across different social tagging systems and
ultimately, (iii) what are the factors that can explain semantic stabilization
in such systems? In this work we tackle these questions by (i) presenting a
novel and robust method which overcomes a number of limitations in existing
methods, (ii) empirically investigating semantic stabilization processes in a
wide range of social tagging systems with distinct domains and properties and
(iii) detecting potential causes for semantic stabilization, specifically
imitation behavior, shared background knowledge and intrinsic properties of
natural language. Our results show that tagging streams which are generated by
a combination of imitation dynamics and shared background knowledge exhibit
faster and higher semantic stability than tagging streams which are generated
via imitation dynamics or natural language streams alone
Confounds and Consequences in Geotagged Twitter Data
Twitter is often used in quantitative studies that identify
geographically-preferred topics, writing styles, and entities. These studies
rely on either GPS coordinates attached to individual messages, or on the
user-supplied location field in each profile. In this paper, we compare these
data acquisition techniques and quantify the biases that they introduce; we
also measure their effects on linguistic analysis and text-based geolocation.
GPS-tagging and self-reported locations yield measurably different corpora, and
these linguistic differences are partially attributable to differences in
dataset composition by age and gender. Using a latent variable model to induce
age and gender, we show how these demographic variables interact with geography
to affect language use. We also show that the accuracy of text-based
geolocation varies with population demographics, giving the best results for
men above the age of 40.Comment: final version for EMNLP 201
Recommended from our members
E-liquid-related posts to Twitter in 2018: Thematic analysis.
IntroductionE-liquid is the solution aerosolized by e-cigarette devices to produce vapor. Continuously evolving e-liquids, and corresponding devices, can affect user experiences associated with these products. Twitter conversations about e-liquids can capture salient behavioral, social, and communicative cues associated with e-liquids. We analyzed Twitter data to characterize key topics of conversation about e-liquids to inform surveillance, and regulatory efforts.MethodsTwitter posts containing e-liquid-related terms ("e-liquid(s)," "e-juice(s)") were obtained from 1 January 2018 to 31 December 2018. Text classifiers were used to identify topics of the posts (n = 15,927).ResultsThe most prevalent topic was Promotional at 29.35% followed by Flavors at 24.22%, and Person Tagging at 21.47%. Juice Composition was next most prevalent at 17.61% followed by Cannabis at 16.83%, and Nicotine Health Risks at 6.39%. Quit Smoking was rare at 0.57%.ConclusionThese results suggest that flavors, cannabis, health risks of nicotine, and composition warrant consideration as targets in future surveillance, public policy, and interventions addressing the use of e-liquids. Twitter provides ample opportunity to influence the normalization, and uptake, of e-cigarette-related products among non-smokers and youth, unless regulatory restrictions, and counter messaging campaigns are developed to reduce this risk
- …