4 research outputs found
Collective Classification for Social Media Credibility Estimation
We introduce a novel extension of the iterative classification algorithm to heterogeneous graphs and apply it to estimate credibility in social media. Given a heterogeneous graph of events, users, and websites derived from social media posts, and given prior knowledge of the credibility of a subset of graph nodes, the approach iteratively converges to a set of classifiers that estimate credibility of the remaining nodes. To measure the performance of this approach, we train on a set of manually labeled events extracted from a corpus of Twitter data and calculate the resulting receiver operating characteristic (ROC) curves. We show that collective classification outperforms independent classification approaches, implying that graph dependencies are crucial to estimating credibility in social media
Combining Text Classification and Fact Checking to Detect Fake News
Due to the widespread use of fake news in social and news media, it is an emerging research
topic gaining attention in today‘s world. In news media and social media, information is
spread at high speed but without accuracy, and therefore detection mechanisms should be
able to predict news quickly enough to combat the spread of fake news. It has the potential
for a negative impact on individuals and society. Therefore, detecting fake news is important
and also a technically challenging problem nowadays. The challenge is to use text
classification to combat fake news. This includes determining appropriate text classification
methods and evaluating how good these methods are at distinguishing between fake and non-
fake news. Machine learning is helpful for building Artificial intelligence systems based on
tacit knowledge because it can help us solve complex problems based on real-world data. For
this reason, I proposed that integrating text classification and fact checking of check-worthy
statements can be helpful in detecting fake news. I used text processing and three classifiers
such as Passive Aggressive, Naïve Bayes, and Support Vector Machine to classify the news
data. Text classification mainly focuses on extracting various features from texts and then
incorporating these features into the classification. The big challenge in this area is the lack of
an efficient method to distinguish between fake news and non-fake news due to the lack of
corpora. I applied three different machine learning classifiers to two publicly available
datasets. Experimental analysis based on the available dataset shows very encouraging and
improved performance. Simple classification is not quite accurate in detecting fake news
because the classification methods are not specialized for fake news. So I added a system that
checks the news in depth sentence by sentence. Fact checking is a multi-step process that
begins with the extraction of check-worthy statements. Identification of check-worthy
statements is a subtask in the fact checking process, the automation of which would reduce
the time and effort required to fact check a statement. In this thesis I have proposed an
approach that focuses on classifying statements into check-worthy and not check-worthy,
while also taking into account the context around a statement. This work shows that inclusion
of context in the approach makes a significant contribution to classification, while at the same
time using more general features to capture information from sentences. The aim of thischallenge is to propose an approach that automatically identifies check-worthy statements for
fact checking, including the context around a statement. The results are analyzed by
examining which features contributes more to classification, but also how well the approach
performs. For this work, a dataset is created by consulting different fact checking
organizations. It contains debates and speeches in the domain of politics. The capability of
the approach is evaluated in this domain. The approach starts with extracting sentence and
context features from the sentences, and then classifying the sentences based on these
features. The feature set and context features are selected after several experiments, based on
how well they differentiate check-worthy statements. Fact checking has received increasing
attention after the 2016 United States Presidential election; so far that many efforts have been
made to develop a viable automated fact checking system. I introduced a web based approach
for fact checking that compares the full news text and headline with known facts such as
name, location, and place. The challenge is to develop an automated application that takes
claims directly from mainstream news media websites and fact checks the news after
applying classification and fact checking components. For fact checking a dataset is
constructed that contains 2146 news articles labelled fake, non-fake and unverified. I include
forty mainstream news media sources to compare the results and also Wikipedia for double
verification. This work shows that a combination of text classification and fact checking
gives considerable contribution to the detection of fake news, while also using more general
features to capture information from sentences