277 research outputs found
A Dataset of State-Censored Tweets
Many governments impose traditional censorship methods on social media
platforms. Instead of removing it completely, many social media companies,
including Twitter, only withhold the content from the requesting country. This
makes such content still accessible outside of the censored region, allowing
for an excellent setting in which to study government censorship on social
media. We mine such content using the Internet Archive's Twitter Stream Grab.
We release a dataset of 583,437 tweets by 155,715 users that were censored
between 2012-2020 July. We also release 4,301 accounts that were censored in
their entirety. Additionally, we release a set of 22,083,759 supplemental
tweets made up of all tweets by users with at least one censored tweet as well
as instances of other users retweeting the censored user. We provide an
exploratory analysis of this dataset. Our dataset will not only aid in the
study of government censorship but will also aid in studying hate speech
detection and the effect of censorship on social media users. The dataset is
publicly available at https://doi.org/10.5281/zenodo.4439509Comment: Accepted to ICWSM 202
Just Another Day on Twitter: A Complete 24 Hours of Twitter Data
At the end of October 2022, Elon Musk concluded his acquisition of Twitter.
In the weeks and months before that, several questions were publicly discussed
that were not only of interest to the platform's future buyers, but also of
high relevance to the Computational Social Science research community. For
example, how many active users does the platform have? What percentage of
accounts on the site are bots? And, what are the dominating topics and
sub-topical spheres on the platform? In a globally coordinated effort of 80
scholars to shed light on these questions, and to offer a dataset that will
equip other researchers to do the same, we have collected all 375 million
tweets published within a 24-hour time period starting on September 21, 2022.
To the best of our knowledge, this is the first complete 24-hour Twitter
dataset that is available for the research community. With it, the present work
aims to accomplish two goals. First, we seek to answer the aforementioned
questions and provide descriptive metrics about Twitter that can serve as
references for other researchers. Second, we create a baseline dataset for
future research that can be used to study the potential impact of the
platform's ownership change
Detecting and Monitoring Hate Speech in Twitter
Social Media are sensors in the real world that can be used to measure the pulse of societies.
However, the massive and unfiltered feed of messages posted in social media is a phenomenon that
nowadays raises social alarms, especially when these messages contain hate speech targeted to a
specific individual or group. In this context, governments and non-governmental organizations
(NGOs) are concerned about the possible negative impact that these messages can have on individuals
or on the society. In this paper, we present HaterNet, an intelligent system currently being used by
the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that
identifies and monitors the evolution of hate speech in Twitter. The contributions of this research
are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social
network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on
hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification
approaches based on different document representation strategies and text classification models. (4)
The best approach consists of a combination of a LTSM+MLP neural network that takes as input the
tweetâs word, emoji, and expression tokensâ embeddings enriched by the tf-idf, and obtains an area
under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the
literatureThe work by Quijano-Sanchez was supported by the Spanish Ministry of Science and Innovation
grant FJCI-2016-28855. The research of Liberatore was supported by the Government of Spain, grant MTM2015-65803-R, and by the European Unionâs Horizon 2020 Research and Innovation Programme, under the Marie Sklodowska-Curie grant agreement No. 691161 (GEOSAFE). All the financial support is gratefully acknowledge
Mapping (Dis-)Information Flow about the MH17 Plane Crash
Digital media enables not only fast sharing of information, but also
disinformation. One prominent case of an event leading to circulation of
disinformation on social media is the MH17 plane crash. Studies analysing the
spread of information about this event on Twitter have focused on small,
manually annotated datasets, or used proxys for data annotation. In this work,
we examine to what extent text classifiers can be used to label data for
subsequent content analysis, in particular we focus on predicting pro-Russian
and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though
we find that a neural classifier improves over a hashtag based baseline,
labeling pro-Russian and pro-Ukrainian content with high precision remains a
challenging problem. We provide an error analysis underlining the difficulty of
the task and identify factors that might help improve classification in future
work. Finally, we show how the classifier can facilitate the annotation task
for human annotators
How People Perceive The Dynamic Zero-COVID Policy: A Retrospective Analysis From The Perspective of Appraisal Theory
The Dynamic Zero-COVID Policy in China spanned three years and diverse
emotional responses have been observed at different times. In this paper, we
retrospectively analyzed public sentiments and perceptions of the policy,
especially regarding how they evolved over time, and how they related to
people's lived experiences. Through sentiment analysis of 2,358 collected Weibo
posts, we identified four representative points, i.e., policy initialization,
sharp sentiment change, lowest sentiment score, and policy termination, for an
in-depth discourse analysis through the lens of appraisal theory. In the end,
we reflected on the evolving public sentiments toward the Dynamic Zero-COVID
Policy and proposed implications for effective epidemic prevention and control
measures for future crises
Chinese collective trolling
The vast majority of research on online trolling focused on Western cultures. Given the role context plays in shaping online interactions, it is important to take into account its socioâcultural context and investigate the role of national culture, by conducting research into trolling in Eastern cultures. In this paper, we attempt to begin addressing this gap by focusing on Chinese collective trolling, looking at Sina Weibo's PG One case. Specifically, we aim to identify who are the major players, what are the metaphors they use, and what are the major trolling tactics employed in Chinese collective trolling event. Using a mixedâmethod approach, we analyzed 2,004 posts and 9,967 comments on Sina Weibo's PG One case, of which 480 were sampled for thematic content analysis. Major contributions of this study include an account of collective trolling in Chinese cultural context that is characterized by role switching between trolls, bystanders, and victims during the various stages of the event. We conclude with suggestion for future research directions
Temporal Event Modeling of Social Harm with High Dimensional and Latent Covariates
Indiana University-Purdue University Indianapolis (IUPUI)The counting process is the fundamental of many real-world problems with event data. Poisson process, used as the background intensity of Hawkes process, is the most commonly used point process. The Hawkes process, a self-exciting point process fits to temporal event data, spatial-temporal event data, and event data with covariates. We study the Hawkes process that fits to heterogeneous drug overdose data via a novel semi-parametric approach. The counting process is also related to survival data based on the fact that they both study the occurrences of events over time. We fit a Cox model to temporal event data with a large corpus that is processed into high dimensional covariates. We study the significant features that influence the intensity of events
Advanced Location-Based Technologies and Services
Since the publication of the first edition in 2004, advances in mobile devices, positioning sensors, WiFi fingerprinting, and wireless communications, among others, have paved the way for developing new and advanced location-based services (LBSs). This second edition provides up-to-date information on LBSs, including WiFi fingerprinting, mobile computing, geospatial clouds, geospatial data mining, location privacy, and location-based social networking. It also includes new chapters on application areas such as LBSs for public health, indoor navigation, and advertising. In addition, the chapter on remote sensing has been revised to address advancements
4th. International Conference on Advanced Research Methods and Analytics (CARMA 2022)
Research methods in economics and social sciences are evolving with the increasing availability of Internet and Big Data sources of information. As these sources, methods, and applications become more interdisciplinary, the 4th International Conference on Advanced Research Methods and Analytics (CARMA) is a forum for researchers and practitioners to exchange ideas and advances on how emerging research methods and sources are applied to different fields of social sciences as well as to discuss current and future challenges. Due to the covid pandemic, CARMA 2022 is planned as a virtual and face-to-face conference, simultaneouslyDoménech I De Soria, J.; Vicente Cuervo, MR. (2022). 4th. International Conference on Advanced Research Methods and Analytics (CARMA 2022). Editorial Universitat PolitÚcnica de ValÚncia. https://doi.org/10.4995/CARMA2022.2022.1595
Et Cetera
Et Cetera is woven together with five works that are essentially five bodies of writings as digital poetry -- a poetic practice that is made possible by digital media and technology in which aesthetic possibilities are extended through the semantic impact of data, alphabets, visuals, sound, etc. Interlaced by multimedial meaning-making, Et Cetera re(produces) installations that are engineered with algorithmic materials utilizing real-time data feeds, animated letterforms, performative instructions and sensory synthesis.
Exploring different scenarios of human-machine coupling that consequently lead to multifarious illegibilities, Et Cetera amplifies the noise of information overflow in the concurrent mediascape with its rhizomatic networks largely beyond human conscious apprehension. On the B-side, Et Cetera is also involved with writing about the alphabetic writing apparatus, the role of artist as author as human-machine-centaur and networked subjectivity
- âŠ