9 research outputs found
Some like it Hoax: Automated fake news detection in social networks
In the recent years, the reliability of information on the Internet has emerged as a crucial issue of modern society. Social network sites (SNSs) have revolutionized the way in which information is spread by allowing users to freely share content. As a consequence, SNSs are also increasingly used as vectors for the diffusion of misinformation and hoaxes. The amount of disseminated information and the rapidity of its diffusion make it practically impossible to assess reliability in a timely manner, highlighting the need for automatic online hoax detection systems. As a contribution towards this objective, we show that Facebook posts can be classified with high accuracy as hoaxes or non-hoaxes on the basis of the users who \ue2\u80\u9cliked\ue2\u80\u9d them. We present two classification techniques, one based on logistic regression, the other on a novel adaptation of boolean crowdsourcing algorithms. On a dataset consisting of 15,500 Facebook posts and 909,236 users, we obtain classification accuracies exceeding 99% even when the training set contains less than 1% of the posts. We further show that our techniques are robust: they work even when we restrict our attention to the users who like both hoax and non-hoax posts. These results suggest that mapping the diffusion pattern of information can be a useful component of automatic hoax detection systems
FAKE NEWS DETECTION ON THE WEB: A DEEP LEARNING BASED APPROACH
The acceptance and popularity of social media platforms for the dispersion and proliferation of news articles have led to the spread of questionable and untrusted information (in part) due to the ease by which misleading content can be created and shared among the communities. While prior research has attempted to automatically classify news articles and tweets as credible and non-credible. This work complements such research by proposing an approach that utilizes the amalgamation of Natural Language Processing (NLP), and Deep Learning techniques such as Long Short-Term Memory (LSTM).
Moreover, in Information System’s paradigm, design science research methodology (DSRM) has become the major stream that focuses on building and evaluating an artifact to solve emerging problems. Hence, DSRM can accommodate deep learning-based models with the availability of adequate datasets. Two publicly available datasets that contain labeled news articles and tweets have been used to validate the proposed model’s effectiveness. This work presents two distinct experiments, and the results demonstrate that the proposed model works well for both long sequence news articles and short-sequence texts such as tweets. Finally, the findings suggest that the sentiments, tagging, linguistics, syntactic, and text embeddings are the features that have the potential to foster fake news detection through training the proposed model on various dimensionality to learn the contextual meaning of the news content
Quality Control in Crowdsourcing: A Survey of Quality Attributes, Assessment Techniques and Assurance Actions
Crowdsourcing enables one to leverage on the intelligence and wisdom of
potentially large groups of individuals toward solving problems. Common
problems approached with crowdsourcing are labeling images, translating or
transcribing text, providing opinions or ideas, and similar - all tasks that
computers are not good at or where they may even fail altogether. The
introduction of humans into computations and/or everyday work, however, also
poses critical, novel challenges in terms of quality control, as the crowd is
typically composed of people with unknown and very diverse abilities, skills,
interests, personal objectives and technological resources. This survey studies
quality in the context of crowdsourcing along several dimensions, so as to
define and characterize it and to understand the current state of the art.
Specifically, this survey derives a quality model for crowdsourcing tasks,
identifies the methods and techniques that can be used to assess the attributes
of the model, and the actions and strategies that help prevent and mitigate
quality problems. An analysis of how these features are supported by the state
of the art further identifies open issues and informs an outlook on hot future
research directions.Comment: 40 pages main paper, 5 pages appendi
Recommended from our members
Reliable Aggregation of Boolean Crowdsourced Tasks
We propose novel algorithms for the problem of crowd- sourcing binary labels. Such binary labeling tasks are very common in crowdsourcing platforms, for instance, to judge the appropriateness of web content or to flag vandalism. We propose two unsupervised algorithms: one simple to implement albeit derived heuristically, and one based on iterated bayesian parameter estimation of user reputation models. We provide mathematical insight into the benefits of the proposed algorithms over existing approaches, and we confirm these insights by showing that both algorithms offer improved performance on many occasions across both synthetic and real-world datasets obtained via Amazon Mechanical Turk
Recommended from our members
Reliable Aggregation of Boolean Crowdsourced Tasks
We propose novel algorithms for the problem of crowd- sourcing binary labels. Such binary labeling tasks are very common in crowdsourcing platforms, for instance, to judge the appropriateness of web content or to flag vandalism. We propose two unsupervised algorithms: one simple to implement albeit derived heuristically, and one based on iterated bayesian parameter estimation of user reputation models. We provide mathematical insight into the benefits of the proposed algorithms over existing approaches, and we confirm these insights by showing that both algorithms offer improved performance on many occasions across both synthetic and real-world datasets obtained via Amazon Mechanical Turk