1,396 research outputs found

    Spanish Corpora of tweets about COVID-19 vaccination for automatic stance detection

    Get PDF
    The paper presents new annotated corpora for performing stance detection on Spanish Twitter data, most notably Health-related tweets. The objectives of this research are threefold: (1) to develop a manually annotated benchmark corpus for emotion recognition taking into account different variants of Spanish in social posts; (2) to evaluate the efficiency of semi-supervised models for extending such corpus with unlabelled posts; and (3) to describe such short text corpora via specialised topic modelling. A corpus of 2,801 tweets about COVID-19 vaccination was annotated by three native speakers to be in favour (904), against (674) or neither (1,223) with a 0.725 Fleiss’ kappa score. Results show that the self-training method with SVM base estimator can alleviate annotation work while ensuring high model performance. The self-training model outperformed the other approaches and produced a corpus of 11,204 tweets with a macro averaged f1 score of 0.94. The combination of sentence-level deep learning embeddings and density-based clustering was applied to explore the contents of both corpora. Topic quality was measured in terms of the trustworthiness and the validation index.Agencia Estatal de Investigación | Ref. PID2020–113673RB-I00Xunta de Galicia | Ref. ED431C2018/55Fundação para a Ciência e a Tecnologia | Ref. UIDB/04469/2020Financiado para publicación en acceso aberto: Universidade de Vigo/CISU

    Detecting Political Framing Shifts and the Adversarial Phrases within\\ Rival Factions and Ranking Temporal Snapshot Contents in Social Media

    Get PDF
    abstract: Social Computing is an area of computer science concerned with dynamics of communities and cultures, created through computer-mediated social interaction. Various social media platforms, such as social network services and microblogging, enable users to come together and create social movements expressing their opinions on diverse sets of issues, events, complaints, grievances, and goals. Methods for monitoring and summarizing these types of sociopolitical trends, its leaders and followers, messages, and dynamics are needed. In this dissertation, a framework comprising of community and content-based computational methods is presented to provide insights for multilingual and noisy political social media content. First, a model is developed to predict the emergence of viral hashtag breakouts, using network features. Next, another model is developed to detect and compare individual and organizational accounts, by using a set of domain and language-independent features. The third model exposes contentious issues, driving reactionary dynamics between opposing camps. The fourth model develops community detection and visualization methods to reveal underlying dynamics and key messages that drive dynamics. The final model presents a use case methodology for detecting and monitoring foreign influence, wherein a state actor and news media under its control attempt to shift public opinion by framing information to support multiple adversarial narratives that facilitate their goals. In each case, a discussion of novel aspects and contributions of the models is presented, as well as quantitative and qualitative evaluations. An analysis of multiple conflict situations will be conducted, covering areas in the UK, Bangladesh, Libya and the Ukraine where adversarial framing lead to polarization, declines in social cohesion, social unrest, and even civil wars (e.g., Libya and the Ukraine).Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Effectiveness of dismantling strategies on moderated vs. unmoderated online social platforms

    Full text link
    Online social networks are the perfect test bed to better understand large-scale human behavior in interacting contexts. Although they are broadly used and studied, little is known about how their terms of service and posting rules affect the way users interact and information spreads. Acknowledging the relation between network connectivity and functionality, we compare the robustness of two different online social platforms, Twitter and Gab, with respect to dismantling strategies based on the recursive censor of users characterized by social prominence (degree) or intensity of inflammatory content (sentiment). We find that the moderated (Twitter) vs unmoderated (Gab) character of the network is not a discriminating factor for intervention effectiveness. We find, however, that more complex strategies based upon the combination of topological and content features may be effective for network dismantling. Our results provide useful indications to design better strategies for countervailing the production and dissemination of anti-social content in online social platforms

    Context-Aware Message-Level Rumour Detection with Weak Supervision

    Get PDF
    Social media has become the main source of all sorts of information beyond a communication medium. Its intrinsic nature can allow a continuous and massive flow of misinformation to make a severe impact worldwide. In particular, rumours emerge unexpectedly and spread quickly. It is challenging to track down their origins and stop their propagation. One of the most ideal solutions to this is to identify rumour-mongering messages as early as possible, which is commonly referred to as "Early Rumour Detection (ERD)". This dissertation focuses on researching ERD on social media by exploiting weak supervision and contextual information. Weak supervision is a branch of ML where noisy and less precise sources (e.g. data patterns) are leveraged to learn limited high-quality labelled data (Ratner et al., 2017). This is intended to reduce the cost and increase the efficiency of the hand-labelling of large-scale data. This thesis aims to study whether identifying rumours before they go viral is possible and develop an architecture for ERD at individual post level. To this end, it first explores major bottlenecks of current ERD. It also uncovers a research gap between system design and its applications in the real world, which have received less attention from the research community of ERD. One bottleneck is limited labelled data. Weakly supervised methods to augment limited labelled training data for ERD are introduced. The other bottleneck is enormous amounts of noisy data. A framework unifying burst detection based on temporal signals and burst summarisation is investigated to identify potential rumours (i.e. input to rumour detection models) by filtering out uninformative messages. Finally, a novel method which jointly learns rumour sources and their contexts (i.e. conversational threads) for ERD is proposed. An extensive evaluation setting for ERD systems is also introduced

    Mining Public Opinion on COVID-19 Vaccines using Unstructured Social Media Data

    Get PDF
    The emergence of the novel coronavirus (COVID-19), and the necessary separation of populations led to an unprecedented number of new social media users seeking information related to the pandemic. Nowadays, with an estimated 4.5 billion users worldwide, social media data offer an opportunity for near real-time analysis of large bodies of text related to disease outbreaks and vaccination. This study investigated and compared public discourse related to COVID-19 vaccines expressed on two popular social media platforms, Reddit and Twitter. Approximately 9.5 million Tweets and 70 thousand Reddit comments were analyzed from dates January 1, 2020, to March 1, 2022, and analyzed through topic modeling, sentiment analysis, and semantic network analysis. Sentiment analysis through the fine-tuned DistilRoBERTa model revealed that even though Twitter content was overall more negative than content expressed on Reddit, relatively similar changes in sentiment occurred among users of both online platforms. Reversals in sentiment trends typically occurred within relative proximity to events such as vaccine development news, vaccine release, frequent discussion of side-effects, the discovery of new variants, and pandemic fatigue. Topic modeling and semantic network analysis provided insight into how public discourse related to COVID-19 and vaccinations, misinformation, and vaccine hesitancy evolved over 26 months. Though misinformation and mention of conspiracy theories were detected with the analysis, the occurrence of both was less frequent than expected. This work provides a framework that could be scaled and utilized by public health officials to monitor disease outbreaks in near real-time in large communities as well as smaller local groups. Hopefully, the results from this study will help to guide and facilitate the implementation of targeted digital interventions among vaccine-hesitant populations and provide insights to public health officials to inform decision-making and effective policy development

    A Survey on Visual Analytics of Social Media Data

    Get PDF
    The unprecedented availability of social media data offers substantial opportunities for data owners, system operators, solution providers, and end users to explore and understand social dynamics. However, the exponential growth in the volume, velocity, and variability of social media data prevents people from fully utilizing such data. Visual analytics, which is an emerging research direction, ha..
    • …
    corecore