Search CORE

277 research outputs found

A Dataset of State-Censored Tweets

Author: Aberer Karl
Elmas Tuğrulcan
Overdorf Rebekah
Publication venue
Publication date: 14/01/2021
Field of study

Many governments impose traditional censorship methods on social media platforms. Instead of removing it completely, many social media companies, including Twitter, only withhold the content from the requesting country. This makes such content still accessible outside of the censored region, allowing for an excellent setting in which to study government censorship on social media. We mine such content using the Internet Archive's Twitter Stream Grab. We release a dataset of 583,437 tweets by 155,715 users that were censored between 2012-2020 July. We also release 4,301 accounts that were censored in their entirety. Additionally, we release a set of 22,083,759 supplemental tweets made up of all tweets by users with at least one censored tweet as well as instances of other users retweeting the censored user. We provide an exploratory analysis of this dataset. Our dataset will not only aid in the study of government censorship but will also aid in studying hate speech detection and the effect of censorship on social media users. The dataset is publicly available at https://doi.org/10.5281/zenodo.4439509Comment: Accepted to ICWSM 202

arXiv.org e-Print Archive

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Association for the Advancement of Artificial Intelligence: AAAI Publications

Just Another Day on Twitter: A Complete 24 Hours of Twitter Data

Author: Assenmacher Dennis
Brantner Cornelia
Garcia David
Jaidka Kokil
Joseph Kenneth
Lasser Jana
Mashhadi Afra
Matter Daniel
Morstatter Fred
Otterbacher Jahna
Pfeffer Juergen
Romero Daniel M.
Schwemmer Carsten
Varol Onur
Wu Siqi
Yang Diyi
Publication venue
Publication date: 11/04/2023
Field of study

At the end of October 2022, Elon Musk concluded his acquisition of Twitter. In the weeks and months before that, several questions were publicly discussed that were not only of interest to the platform's future buyers, but also of high relevance to the Computational Social Science research community. For example, how many active users does the platform have? What percentage of accounts on the site are bots? And, what are the dominating topics and sub-topical spheres on the platform? In a globally coordinated effort of 80 scholars to shed light on these questions, and to offer a dataset that will equip other researchers to do the same, we have collected all 375 million tweets published within a 24-hour time period starting on September 21, 2022. To the best of our knowledge, this is the first complete 24-hour Twitter dataset that is available for the research community. With it, the present work aims to accomplish two goals. First, we seek to answer the aforementioned questions and provide descriptive metrics about Twitter that can serve as references for other researchers. Second, we create a baseline dataset for future research that can be used to study the potential impact of the platform's ownership change

arXiv.org e-Print Archive

Detecting and Monitoring Hate Speech in Twitter

Author: Camacho-Collados Miguel
Liberatore Federico
Pereira-Kohatsu Juan Carlos
Quijano-Sánchez Lara
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Social Media are sensors in the real world that can be used to measure the pulse of societies. However, the massive and unfiltered feed of messages posted in social media is a phenomenon that nowadays raises social alarms, especially when these messages contain hate speech targeted to a specific individual or group. In this context, governments and non-governmental organizations (NGOs) are concerned about the possible negative impact that these messages can have on individuals or on the society. In this paper, we present HaterNet, an intelligent system currently being used by the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that identifies and monitors the evolution of hate speech in Twitter. The contributions of this research are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification approaches based on different document representation strategies and text classification models. (4) The best approach consists of a combination of a LTSM+MLP neural network that takes as input the tweet’s word, emoji, and expression tokens’ embeddings enriched by the tf-idf, and obtains an area under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the literatureThe work by Quijano-Sanchez was supported by the Spanish Ministry of Science and Innovation grant FJCI-2016-28855. The research of Liberatore was supported by the Government of Spain, grant MTM2015-65803-R, and by the European Union’s Horizon 2020 Research and Innovation Programme, under the Marie Sklodowska-Curie grant agreement No. 691161 (GEOSAFE). All the financial support is gratefully acknowledge

Multidisciplinary Digital Publishing Institute

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Online Research @ Cardiff

Universidad Carlos III de Madrid e-Archivo

Biblos-e Archivo

Mapping (Dis-)Information Flow about the MH17 Plane Crash

Author: Augenstein Isabelle
Golovchenko Yevgeniy
Hartmann Mareike
Publication venue
Publication date: 01/01/2019
Field of study

Digital media enables not only fast sharing of information, but also disinformation. One prominent case of an event leading to circulation of disinformation on social media is the MH17 plane crash. Studies analysing the spread of information about this event on Twitter have focused on small, manually annotated datasets, or used proxys for data annotation. In this work, we examine to what extent text classifiers can be used to label data for subsequent content analysis, in particular we focus on predicting pro-Russian and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though we find that a neural classifier improves over a hashtag based baseline, labeling pro-Russian and pro-Ukrainian content with high precision remains a challenging problem. We provide an error analysis underlining the difficulty of the task and identify factors that might help improve classification in future work. Finally, we show how the classifier can facilitate the annotation task for human annotators

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

How People Perceive The Dynamic Zero-COVID Policy: A Retrospective Analysis From The Perspective of Appraisal Theory

Author: Li Yunzhe
Yang Na
Zhou Kyrie Zhixuan
Publication venue
Publication date: 17/09/2023
Field of study

The Dynamic Zero-COVID Policy in China spanned three years and diverse emotional responses have been observed at different times. In this paper, we retrospectively analyzed public sentiments and perceptions of the policy, especially regarding how they evolved over time, and how they related to people's lived experiences. Through sentiment analysis of 2,358 collected Weibo posts, we identified four representative points, i.e., policy initialization, sharp sentiment change, lowest sentiment score, and policy termination, for an in-depth discourse analysis through the lens of appraisal theory. In the end, we reflected on the evolving public sentiments toward the Dynamic Zero-COVID Policy and proposed implications for effective epidemic prevention and control measures for future crises

arXiv.org e-Print Archive

Chinese collective trolling

Author: Fichman P.
Sun H.
Publication venue: 'Wiley'
Publication date: 01/11/2018
Field of study

The vast majority of research on online trolling focused on Western cultures. Given the role context plays in shaping online interactions, it is important to take into account its socio‐cultural context and investigate the role of national culture, by conducting research into trolling in Eastern cultures. In this paper, we attempt to begin addressing this gap by focusing on Chinese collective trolling, looking at Sina Weibo's PG One case. Specifically, we aim to identify who are the major players, what are the metaphors they use, and what are the major trolling tactics employed in Chinese collective trolling event. Using a mixed‐method approach, we analyzed 2,004 posts and 9,967 comments on Sina Weibo's PG One case, of which 480 were sampled for thematic content analysis. Major contributions of this study include an account of collective trolling in Chinese cultural context that is characterized by role switching between trolls, bystanders, and victims during the various stages of the event. We conclude with suggestion for future research directions

IUScholarWorks (University of Indiana)

Temporal Event Modeling of Social Harm with High Dimensional and Latent Covariates

Author: Liu Xueying
Publication venue
Publication date: 01/01/2022
Field of study

Indiana University-Purdue University Indianapolis (IUPUI)The counting process is the fundamental of many real-world problems with event data. Poisson process, used as the background intensity of Hawkes process, is the most commonly used point process. The Hawkes process, a self-exciting point process fits to temporal event data, spatial-temporal event data, and event data with covariates. We study the Hawkes process that fits to heterogeneous drug overdose data via a novel semi-parametric approach. The counting process is also related to survival data based on the fact that they both study the occurrences of events over time. We fit a Cox model to temporal event data with a large corpus that is processed into high dimensional covariates. We study the significant features that influence the intensity of events

IUPUIScholarWorks

Purdue E-Pubs

FigShare

Advanced Location-Based Technologies and Services

Author
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Since the publication of the first edition in 2004, advances in mobile devices, positioning sensors, WiFi fingerprinting, and wireless communications, among others, have paved the way for developing new and advanced location-based services (LBSs). This second edition provides up-to-date information on LBSs, including WiFi fingerprinting, mobile computing, geospatial clouds, geospatial data mining, location privacy, and location-based social networking. It also includes new chapters on application areas such as LBSs for public health, indoor navigation, and advertising. In addition, the chapter on remote sensing has been revised to address advancements

OAPEN Library

4th. International Conference on Advanced Research Methods and Analytics (CARMA 2022)

Author: Doménech i de Soria Josep
Vicente Cuervo María Rosalía
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 21/09/2022
Field of study

Research methods in economics and social sciences are evolving with the increasing availability of Internet and Big Data sources of information. As these sources, methods, and applications become more interdisciplinary, the 4th International Conference on Advanced Research Methods and Analytics (CARMA) is a forum for researchers and practitioners to exchange ideas and advances on how emerging research methods and sources are applied to different fields of social sciences as well as to discuss current and future challenges. Due to the covid pandemic, CARMA 2022 is planned as a virtual and face-to-face conference, simultaneouslyDoménech I De Soria, J.; Vicente Cuervo, MR. (2022). 4th. International Conference on Advanced Research Methods and Analytics (CARMA 2022). Editorial Universitat Politècnica de València. https://doi.org/10.4995/CARMA2022.2022.1595

RiuNet

Et Cetera

Author: Ye Xuan
Publication venue
Publication date: 21/11/2018
Field of study

Et Cetera is woven together with five works that are essentially five bodies of writings as digital poetry -- a poetic practice that is made possible by digital media and technology in which aesthetic possibilities are extended through the semantic impact of data, alphabets, visuals, sound, etc. Interlaced by multimedial meaning-making, Et Cetera re(produces) installations that are engineered with algorithmic materials utilizing real-time data feeds, animated letterforms, performative instructions and sensory synthesis. Exploring different scenarios of human-machine coupling that consequently lead to multifarious illegibilities, Et Cetera amplifies the noise of information overflow in the concurrent mediascape with its rhizomatic networks largely beyond human conscious apprehension. On the B-side, Et Cetera is also involved with writing about the alphabetic writing apparatus, the role of artist as author as human-machine-centaur and networked subjectivity

YorkSpace