6,004 research outputs found
Veracity Roadmap: Is Big Data Objective, Truthful and Credible?
This paper argues that big data can possess different characteristics, which affect its quality. Depending on its origin, data processing technologies, and methodologies used for data collection and scientific discoveries, big data can have biases, ambiguities, and inaccuracies which need to be identified and accounted for to reduce inference errors and improve the accuracy of generated insights. Big data veracity is now being recognized as a necessary property for its utilization, complementing the three previously established quality dimensions (volume, variety, and velocity), But there has been little discussion of the concept of veracity thus far. This paper provides a roadmap for theoretical and empirical definitions of veracity along with its practical implications. We explore veracity across three main dimensions: 1) objectivity/subjectivity, 2) truthfulness/deception, 3) credibility/implausibility – and propose to operationalize each of these dimensions with either existing computational tools or potential ones, relevant particularly to textual data analytics. We combine the measures of veracity dimensions into one composite index – the big data veracity index. This newly developed veracity index provides a useful way of assessing systematic variations in big data quality across datasets with textual information. The paper contributes to the big data research by categorizing the range of existing tools to measure the suggested dimensions, and to Library and Information Science (LIS) by proposing to account for heterogeneity of diverse big data, and to identify information quality dimensions important for each big data type
Using machine learning in social media systems
Διπλωματική εργασία - Πανεπιστήμιο Μακεδονίας, Θεσσαλονίκη, 2019.Nowadays, vast use of Internet and especially, social media, as primary sources of information on everything is happening around the world, has unfortunately facilitated the spreading of fake news at the same time. Thus, everyone can alter real news and publish it on a news website or their social media account, or even invent news and promote it as real, misinforming and even disorienting in this way the public. For this reason, it is crucial to find ways to detect fake news as fast as possible, since fake news dissemination can sometimes be proved destructive, mainly as far as political and social issues are concerned, which have the stronger impact on people’s lives. Classification algorithms use is one way researchers have found in order to deal with this serious problem. In this thesis, we are going to present such a solution, which deploys Data Science and Machine Learning, in order to build a classifier for fake news detection. More specifically, after studying various articles concerning fake news classification, we are going to implement and evaluate our own classifier in a kernel created in Kaggle platform
Deception Detection and Rumor Debunking for Social Media
Abstract
The main premise of this chapter is that the time is ripe for more extensive research and development of social media tools that filter out intentionally deceptive information such as deceptive memes, rumors and hoaxes, fake news or other fake posts, tweets and fraudulent profiles. Social media users’ awareness of intentional manipulation of online content appears to be relatively low, while the reliance on unverified information (often obtained from strangers) is at an all-time high. I argue there is need for content verification, systematic fact-checking and filtering of social media streams. This literature survey provides a background for understanding current automated deception detection research, rumor debunking, and broader content verification methodologies, suggests a path towards hybrid technologies, and explains why the development and adoption of such tools might still be a significant challenge
Evaluation of Fake News Detection with Knowledge-Enhanced Language Models
Recent advances in fake news detection have exploited the success of
large-scale pre-trained language models (PLMs). The predominant
state-of-the-art approaches are based on fine-tuning PLMs on labelled fake news
datasets. However, large-scale PLMs are generally not trained on structured
factual data and hence may not possess priors that are grounded in factually
accurate knowledge. The use of existing knowledge bases (KBs) with rich
human-curated factual information has thus the potential to make fake news
detection more effective and robust. In this paper, we investigate the impact
of knowledge integration into PLMs for fake news detection. We study several
state-of-the-art approaches for knowledge integration, mostly using Wikidata as
KB, on two popular fake news datasets - LIAR, a politics-based dataset, and
COVID-19, a dataset of messages posted on social media relating to the COVID-19
pandemic. Our experiments show that knowledge-enhanced models can significantly
improve fake news detection on LIAR where the KB is relevant and up-to-date.
The mixed results on COVID-19 highlight the reliance on stylistic features and
the importance of domain specific and current KBs.Comment: To appear in Proceedings of the 16th International AAAI Conference on
Web and Social Media (AAAI ICWSM-2022
Fact Checking in Community Forums
Community Question Answering (cQA) forums are very popular nowadays, as they
represent effective means for communities around particular topics to share
information. Unfortunately, this information is not always factual. Thus, here
we explore a new dimension in the context of cQA, which has been ignored so
far: checking the veracity of answers to particular questions in cQA forums. As
this is a new problem, we create a specialized dataset for it. We further
propose a novel multi-faceted model, which captures information from the answer
content (what is said and how), from the author profile (who says it), from the
rest of the community forum (where it is said), and from external authoritative
sources of information (external support). Evaluation results show a MAP value
of 86.54, which is 21 points absolute above the baseline.Comment: AAAI-2018; Fact-Checking; Veracity; Community-Question Answering;
Neural Networks; Distributed Representation
- …