148 research outputs found
Two Tales of the World: Comparison of Widely Used World News Datasets GDELT and EventRegistry
In this work, we compare GDELT and Event Registry, which monitor news
articles worldwide and provide big data to researchers regarding scale, news
sources, and news geography. We found significant differences in scale and news
sources, but surprisingly, we observed high similarity in news geography
between the two datasets.Comment: To be appeared in ICWSM'1
Exploring Cyberbullying and Other Toxic Behavior in Team Competition Online Games
In this work we explore cyberbullying and other toxic behavior in team
competition online games. Using a dataset of over 10 million player reports on
1.46 million toxic players along with corresponding crowdsourced decisions, we
test several hypotheses drawn from theories explaining toxic behavior. Besides
providing large-scale, empirical based understanding of toxic behavior, our
work can be used as a basis for building systems to detect, prevent, and
counter-act toxic behavior.Comment: CHI'1
SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment
Because word semantics can substantially change across communities and
contexts, capturing domain-specific word semantics is an important challenge.
Here, we propose SEMAXIS, a simple yet powerful framework to characterize word
semantics using many semantic axes in word- vector spaces beyond sentiment. We
demonstrate that SEMAXIS can capture nuanced semantic representations in
multiple online communities. We also show that, when the sentiment axis is
examined, SEMAXIS outperforms the state-of-the-art approaches in building
domain-specific sentiment lexicons.Comment: Accepted in ACL 2018 as a full pape
A systematic media frame analysis of 1.5 million New York Times articles from 2000 to 2017
Defense Advanced Research Projects Agenc
- …