290 research outputs found

    Classification aware neural topic model and its application on a new COVID-19 disinformation corpus

    Get PDF
    The explosion of disinformation related to the COVID-19 pandemic has overloaded fact-checkers and media worldwide. To help tackle this, we developed computational methods to support COVID-19 disinformation debunking and social impacts research. This paper presents: 1) the currently largest available manually annotated COVID-19 disinformation category dataset; and 2) a classification-aware neural topic model (CANTM) that combines classification and topic modelling under a variational autoencoder framework. We demonstrate that CANTM efficiently improves classification performance with low resources, and is scalable. In addition, the classification-aware topics help researchers and end-users to better understand the classification results

    Digital Methods to Study (and Reduce) the Impact of Disinformation

    Get PDF
    Social media have democratized communication but have led to the explosion of the so-called "fake news" phenomenon. This problem has visible implications on global security, both political (e.g.the QANON case) and health ( anti-Covid vaccination and No-Vax fake news). Models that detect the problem in real time and on large amounts of data are needed. Digital methods and text classification procedures are able to do this through predictive approaches to identify a suspect message or author. This paper aims to apply a supervised model to the study of fake news on the Twittersphere to highlight its potential and preliminary limitations. The case study is the infodemic generated on social media during the first phase of the COVID-19 emergency. The application of the supervised model involved the use of a training and testing dataset. The different preliminary steps to build the training dataset are also shown, highlighting, with a critical approach, the challenges of working with supervised algorithms. Two aspects emerge. The first is that it is important to block the sources of bad information, before the information itself. The second is that algorithms could be sources of bias. Social media companies need to be very careful about relying on automated classificatio

    A Gamefied Synthetic Environment for Evaluation of Counter-Disinformation Solutions

    Get PDF
    This paper presents a simulation-based approach to countering online dis/misinformation. This disruptive technology experiment incorporated a synthetic environment component, based on adapted SIR epidemiological model to evaluate and visualize the effectiveness of suggested solutions to the issue. The participants in the simulation were given a realistic scenario depicting a dis/misinformation threat and were asked to select a number of solutions, described in IoS (Ideas-of-Systems) cards. During the event, the qualitative and quantitative characteristics of the IoS cards, were tested in a synthetic environment (SEN), built after a Susceptible-Infected-Resistant (SIR) model. The participants, divided into teams, presented and justified their dis/misinformation strategy which included three IoS card selections. A jury of subject matter experts, announced the winning team, based on the merits of the proposed strategies and the compatibility of the different cards, grouped together

    Sentiment Analysis for Fake News Detection

    Get PDF
    [Abstract] In recent years, we have witnessed a rise in fake news, i.e., provably false pieces of information created with the intention of deception. The dissemination of this type of news poses a serious threat to cohesion and social well-being, since it fosters political polarization and the distrust of people with respect to their leaders. The huge amount of news that is disseminated through social media makes manual verification unfeasible, which has promoted the design and implementation of automatic systems for fake news detection. The creators of fake news use various stylistic tricks to promote the success of their creations, with one of them being to excite the sentiments of the recipients. This has led to sentiment analysis, the part of text analytics in charge of determining the polarity and strength of sentiments expressed in a text, to be used in fake news detection approaches, either as a basis of the system or as a complementary element. In this article, we study the different uses of sentiment analysis in the detection of fake news, with a discussion of the most relevant elements and shortcomings, and the requirements that should be met in the near future, such as multilingualism, explainability, mitigation of biases, or treatment of multimedia elements.Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2020/11This work has been funded by FEDER/Ministerio de Ciencia, Innovación y Universidades — Agencia Estatal de Investigación through the ANSWERASAP project (TIN2017-85160-C2-1-R); and by Xunta de Galicia through a Competitive Reference Group grant (ED431C 2020/11). CITIC, as Research Center of the Galician University System, is funded by the Consellería de Educación, Universidade e Formación Profesional of the Xunta de Galicia through the European Regional Development Fund (ERDF/FEDER) with 80%, the Galicia ERDF 2014-20 Operational Programme, and the remaining 20% from the Secretaría Xeral de Universidades (ref. ED431G 2019/01). David Vilares is also supported by a 2020 Leonardo Grant for Researchers and Cultural Creators from the BBVA Foundation. Carlos Gómez-Rodríguez has also received funding from the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, grant No. 714150

    Deploying Artificial Intelligence to Combat Covid-19 Misinformation on Social Media: Technological and Ethical Considerations

    Get PDF
    This paper reports on research into online misinformation pertaining to the COVID-19 pandemic using artificial intelligence. This is part of our longer-term goal, i.e., the development of an artificial intelligence (machine-learning) tool to assist social media platforms, online service providers and government agencies in identifying and responding to misinformation on social media. We report herein on the predictive accuracy accomplished by applying a combination of technologies, including a custom-designed web-crawler, The Dark Crawler (TDC) and the Posit toolkit, a text-reading software solution designed by George Weir of University of Strathclyde. Overall, we found that performance of models based upon Posit-derived textual features showed high levels of correlation to the pre-determined (manual and machine-driven) data classifications. We further argue that the harms associated with COVID-19 misinformation — e.g., the social and economic damage, and the deaths and severe illnesses — outweigh the right to personal privacy and freedom of speech considerations

    A semi-automatic annotation methodology that combines Summarization and Human-In-The-Loop to create disinformation detection resources

    Get PDF
    Early detection of disinformation is one of the most challenging big-scale problems facing present day society. This is why the application of technologies such as Artificial Intelligence and Natural Language Processing is necessary. The vast majority of Artificial Intelligence approaches require annotated data, and generating these resources is very expensive. This proposal aims to improve the efficiency of the annotation process with a two-level semi-automatic annotation methodology. The first level extracts relevant information through summarization techniques. The second applies a Human-in-the-Loop strategy whereby the labels are pre-annotated by the machine, corrected by the human and reused by the machine to retrain the automatic annotator. After evaluating the system, the average annotation time per news item is reduced by 50%. In addition, a set of experiments on the semi-automatically annotated dataset that is generated are performed so as to demonstrate the effectiveness of the proposal. Although the dataset is annotated in terms of unreliable content, it is applied to the veracity detection task with very promising results (0.95 accuracy in reliability detection and 0.78 in veracity detection).This research work is funded by MCIN/AEI/ 10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union” or by the “European Union NextGenerationEU/PRTR” through the project TRIVIAL: Technological Resources for Intelligent VIral AnaLysis through NLP (PID2021-122263OB-C22) and the project SOCIALTRUST: Assessing trustworthiness in digital media (PDC2022-133146-C22). Also funded by Generalitat Valenciana through the project NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation (CIPROM/ 2021/21), and the grant ACIF/2020/177

    NOFACE: A new framework for irrelevant content filtering in social media according to credibility and expertise

    Get PDF
    Social networks have taken an irreplaceable role in our lives. They are used daily by millions of people to communicate and inform themselves. This success has also led to a lot of irrelevant content and even misinformation on social media. In this paper, we propose a user-centred framework to reduce the amount of irrelevant content in social networks to support further stages of data mining processes. The system also helps in the reduction of misinformation in social networks, since it selects credible and reputable users. The system is based on the belief that if a user is credible then their content will be credible. Our proposal uses word embeddings in a first stage, to create a set of interesting users according to their expertise. After that, in a later stage, it employs social network metrics to further narrow down the relevant users according to their credibility in the network. To validate the framework, it has been tested with two real Big Data problems on Twitter. One related to COVID-19 tweets and the other to last United States elections on 3rd November. Both are problems in which finding relevant content may be difficult due to the large amount of data published during the last years. The proposed framework, called NOFACE, reduces the number of irrelevant users posting about the topic, taking only those that have a higher credibility, and thus giving interesting information about the selected topic. This entails a reduction of irrelevant information, mitigating therefore the presence of misinformation on a posterior data mining method application, improving the obtained results, as it is illustrated in the mentioned two topics using clustering, association rules and LDA techniques.European Commission 786687Andalusian government FEDER operative program P18-RT-2947 B-TIC-145-UGR18University of Granada's internal plan PPJIB2021-04Spanish Government FPU18/0015

    Unmasking Medical Fake News Using Machine Learning Techniques

    Get PDF
    Fake news has always been a critical and challenging problem in the informationenvironment. The propagation of false news is a serious concern, especially in medical information, which can have dangerous and potentially deadly consequences. With the tsunami of online misinformation, it is crucial to fight fake medical news. In this study, we use machine learning techniques to help detect fake news related to diseases, including COVID-19, Ebola, Zika, SARS, Cancer, and Polio. To facilitate research in this space, we create a new medical dataset named MedHub. MedHub has records from two publicly available datasets on COVID and manually curated facts and myths about the other diseases. In addition, we build several different machine learning models trained on MedHub, including KNN, Na¨ıve Bayes, SVM, Logistic regression, and MLP classifier, and present a proof-of-concept web application that uses these models to detect fake medical news. Our best-performing model, which we call Disease Myth Buster, is based on BERT and achieves an accuracy of 99%. In addition, we perform experiments to demonstrate that 1) our models perform well at identifying misinformation related to any disease even if it is not represented in the dataset, and 2) they are well optimized to identify COVID-19 specific misinformation, and 3) Disease Myth Buster can be extended for general fake news classification using Transfer learning. We create two new manually curated test datasets for the first two experiments. The first test dataset has 164 records related to Diabetes and the second test dataset has 13459 records of COVID-19 myths. We open-source all our datasets and models for future research
    corecore