73 research outputs found
Fake news identification on Twitter with hybrid CNN and RNN models
The problem associated with the propagation of fake news continues to grow at an alarming scale. This trend has generated much interest from politics to academia and industry alike. We propose a framework that detects and classifies fake news messages from Twitter posts using hybrid of convolutional neural networks and long-short term recurrent neural network models. The proposed work using this deep learning approach achieves 82% accuracy. Our approach intuitively identifies relevant features associated with fake news stories without previous knowledge of the domain
Perils and Challenges of Social Media and Election Manipulation Analysis: The 2018 US Midterms
One of the hallmarks of a free and fair society is the ability to conduct a
peaceful and seamless transfer of power from one leader to another.
Democratically, this is measured in a citizen population's trust in the
electoral system of choosing a representative government. In view of the well
documented issues of the 2016 US Presidential election, we conducted an
in-depth analysis of the 2018 US Midterm elections looking specifically for
voter fraud or suppression. The Midterm election occurs in the middle of a 4
year presidential term. For the 2018 midterms, 35 senators and all the 435
seats in the House of Representatives were up for re-election, thus, every
congressional district and practically every state had a federal election. In
order to collect election related tweets, we analyzed Twitter during the month
prior to, and the two weeks following, the November 6, 2018 election day. In a
targeted analysis to detect statistical anomalies or election interference, we
identified several biases that can lead to wrong conclusions. Specifically, we
looked for divergence between actual voting outcomes and instances of the
#ivoted hashtag on the election day. This analysis highlighted three states of
concern: New York, California, and Texas. We repeated our analysis discarding
malicious accounts, such as social bots. Upon further inspection and against a
backdrop of collected general election-related tweets, we identified some
confounding factors, such as population bias, or bot and political ideology
inference, that can lead to false conclusions. We conclude by providing an
in-depth discussion of the perils and challenges of using social media data to
explore questions about election manipulation
Tokenization of the Common: An Economic Model of Multidimensional Incentives
The concept of the tragedy of the commons, originally rooted in economics, describes the depletion of shared resources due to self-interested actions by individuals. This work proposes a novel solution to address this economic challenge by leveraging tokens to capture its multidimensional nature. By utilising blockchain and DLTs, this decentralised approach aims to achieve a social optimum while promoting self-regulation. The paper presents a mathematical treatment of the tragedy of the commons, incorporating multi-dimensional tokens and exploring the divergence from the classic optimal solution, highlighting the potential of tokenisation in shaping a sustainable and efficient economy
Scholarly event characteristics in four fields of science : a metrics-based analysis
One of the key channels of scholarly knowledge exchange are scholarly events such as conferences, workshops, symposiums, etc.; such events are especially important and popular in Computer Science, Engineering, and Natural Sciences.However, scholars encounter problems in finding relevant information about upcoming events and statistics on their historic evolution.In order to obtain a better understanding of scholarly event characteristics in four fields of science, we analyzed the metadata of scholarly events of four major fields of science, namely Computer Science, Physics, Engineering, and Mathematics using Scholarly Events Quality Assessment suite, a suite of ten metrics.In particular, we analyzed renowned scholarly events belonging to five sub-fields within Computer Science, namely World Wide Web, Computer Vision, Software Engineering, Data Management, as well as Security and Privacy.This analysis is based on a systematic approach using descriptive statistics as well as exploratory data analysis. The findings are on the one hand interesting to observe the general evolution and success factors of scholarly events; on the other hand, they allow (prospective) event organizers, publishers, and committee members to assess the progress of their event over time and compare it to other events in the same field; and finally, they help researchers to make more informed decisions when selecting suitable venues for presenting their work.Based on these findings, a set of recommendations has been concluded to different stakeholders, involving event organizers, potential authors, proceedings publishers, and sponsors. Our comprehensive dataset of scholarly events of the aforementioned fields is openly available in a semantic format and maintained collaboratively at OpenResearch.org. © 2020, The Author(s)
Neural Architecture for Question Answering Using a Knowledge Graph and Web Corpus
In Web search, entity-seeking queries often trigger a special Question
Answering (QA) system. It may use a parser to interpret the question to a
structured query, execute that on a knowledge graph (KG), and return direct
entity responses. QA systems based on precise parsing tend to be brittle: minor
syntax variations may dramatically change the response. Moreover, KG coverage
is patchy. At the other extreme, a large corpus may provide broader coverage,
but in an unstructured, unreliable form. We present AQQUCN, a QA system that
gracefully combines KG and corpus evidence. AQQUCN accepts a broad spectrum of
query syntax, between well-formed questions to short `telegraphic' keyword
sequences. In the face of inherent query ambiguities, AQQUCN aggregates signals
from KGs and large corpora to directly rank KG entities, rather than commit to
one semantic interpretation of the query. AQQUCN models the ideal
interpretation as an unobservable or latent variable. Interpretations and
candidate entity responses are scored as pairs, by combining signals from
multiple convolutional networks that operate collectively on the query, KG and
corpus. On four public query workloads, amounting to over 8,000 queries with
diverse query syntax, we see 5--16% absolute improvement in mean average
precision (MAP), compared to the entity ranking performance of recent systems.
Our system is also competitive at entity set retrieval, almost doubling F1
scores for challenging short queries.Comment: Accepted to Information Retrieval Journa
Hate Speech and Offensive Language Detection in Bengali
Social media often serves as a breeding ground for various hateful and
offensive content. Identifying such content on social media is crucial due to
its impact on the race, gender, or religion in an unprejudiced society.
However, while there is extensive research in hate speech detection in English,
there is a gap in hateful content detection in low-resource languages like
Bengali. Besides, a current trend on social media is the use of Romanized
Bengali for regular interactions. To overcome the existing research's
limitations, in this study, we develop an annotated dataset of 10K Bengali
posts consisting of 5K actual and 5K Romanized Bengali tweets. We implement
several baseline models for the classification of such hateful posts. We
further explore the interlingual transfer mechanism to boost classification
performance. Finally, we perform an in-depth error analysis by looking into the
misclassified posts by the models. While training actual and Romanized datasets
separately, we observe that XLM-Roberta performs the best. Further, we witness
that on joint training and few-shot training, MuRIL outperforms other models by
interpreting the semantic expressions better. We make our code and dataset
public for others.Comment: Accepted at AACL-IJCNLP 202
- …