37,283 research outputs found
CSI: A Hybrid Deep Model for Fake News Detection
The topic of fake news has drawn attention both from the public and the
academic communities. Such misinformation has the potential of affecting public
opinion, providing an opportunity for malicious parties to manipulate the
outcomes of public events such as elections. Because such high stakes are at
play, automatically detecting fake news is an important, yet challenging
problem that is not yet well understood. Nevertheless, there are three
generally agreed upon characteristics of fake news: the text of an article, the
user response it receives, and the source users promoting it. Existing work has
largely focused on tailoring solutions to one particular characteristic which
has limited their success and generality. In this work, we propose a model that
combines all three characteristics for a more accurate and automated
prediction. Specifically, we incorporate the behavior of both parties, users
and articles, and the group behavior of users who propagate fake news.
Motivated by the three characteristics, we propose a model called CSI which is
composed of three modules: Capture, Score, and Integrate. The first module is
based on the response and text; it uses a Recurrent Neural Network to capture
the temporal pattern of user activity on a given article. The second module
learns the source characteristic based on the behavior of users, and the two
are integrated with the third module to classify an article as fake or not.
Experimental analysis on real-world data demonstrates that CSI achieves higher
accuracy than existing models, and extracts meaningful latent representations
of both users and articles.Comment: In Proceedings of the 26th ACM International Conference on
Information and Knowledge Management (CIKM) 201
Identifying Clickbait: A Multi-Strategy Approach Using Neural Networks
Online media outlets, in a bid to expand their reach and subsequently
increase revenue through ad monetisation, have begun adopting clickbait
techniques to lure readers to click on articles. The article fails to fulfill
the promise made by the headline. Traditional methods for clickbait detection
have relied heavily on feature engineering which, in turn, is dependent on the
dataset it is built for. The application of neural networks for this task has
only been explored partially. We propose a novel approach considering all
information found in a social media post. We train a bidirectional LSTM with an
attention mechanism to learn the extent to which a word contributes to the
post's clickbait score in a differential manner. We also employ a Siamese net
to capture the similarity between source and target information. Information
gleaned from images has not been considered in previous approaches. We learn
image embeddings from large amounts of data using Convolutional Neural Networks
to add another layer of complexity to our model. Finally, we concatenate the
outputs from the three separate components, serving it as input to a fully
connected layer. We conduct experiments over a test corpus of 19538 social
media posts, attaining an F1 score of 65.37% on the dataset bettering the
previous state-of-the-art, as well as other proposed approaches, feature
engineering or otherwise.Comment: Accepted at SIGIR 2018 as Short Pape
BL-MNE: Emerging Heterogeneous Social Network Embedding through Broad Learning with Aligned Autoencoder
Network embedding aims at projecting the network data into a low-dimensional
feature space, where the nodes are represented as a unique feature vector and
network structure can be effectively preserved. In recent years, more and more
online application service sites can be represented as massive and complex
networks, which are extremely challenging for traditional machine learning
algorithms to deal with. Effective embedding of the complex network data into
low-dimension feature representation can both save data storage space and
enable traditional machine learning algorithms applicable to handle the network
data. Network embedding performance will degrade greatly if the networks are of
a sparse structure, like the emerging networks with few connections. In this
paper, we propose to learn the embedding representation for a target emerging
network based on the broad learning setting, where the emerging network is
aligned with other external mature networks at the same time. To solve the
problem, a new embedding framework, namely "Deep alIgned autoencoder based
eMbEdding" (DIME), is introduced in this paper. DIME handles the diverse link
and attribute in a unified analytic based on broad learning, and introduces the
multiple aligned attributed heterogeneous social network concept to model the
network structure. A set of meta paths are introduced in the paper, which
define various kinds of connections among users via the heterogeneous link and
attribute information. The closeness among users in the networks are defined as
the meta proximity scores, which will be fed into DIME to learn the embedding
vectors of users in the emerging network. Extensive experiments have been done
on real-world aligned social networks, which have demonstrated the
effectiveness of DIME in learning the emerging network embedding vectors.Comment: 10 pages, 9 figures, 4 tables. Full paper is accepted by ICDM 2017,
In: Proceedings of the 2017 IEEE International Conference on Data Mining
Extracting News Events from Microblogs
Twitter stream has become a large source of information for many people, but
the magnitude of tweets and the noisy nature of its content have made
harvesting the knowledge from Twitter a challenging task for researchers for a
long time. Aiming at overcoming some of the main challenges of extracting the
hidden information from tweet streams, this work proposes a new approach for
real-time detection of news events from the Twitter stream. We divide our
approach into three steps. The first step is to use a neural network or deep
learning to detect news-relevant tweets from the stream. The second step is to
apply a novel streaming data clustering algorithm to the detected news tweets
to form news events. The third and final step is to rank the detected events
based on the size of the event clusters and growth speed of the tweet
frequencies. We evaluate the proposed system on a large, publicly available
corpus of annotated news events from Twitter. As part of the evaluation, we
compare our approach with a related state-of-the-art solution. Overall, our
experiments and user-based evaluation show that our approach on detecting
current (real) news events delivers a state-of-the-art performance
Interactive Search and Exploration in Online Discussion Forums Using Multimodal Embeddings
In this paper we present a novel interactive multimodal learning system,
which facilitates search and exploration in large networks of social multimedia
users. It allows the analyst to identify and select users of interest, and to
find similar users in an interactive learning setting. Our approach is based on
novel multimodal representations of users, words and concepts, which we
simultaneously learn by deploying a general-purpose neural embedding model. We
show these representations to be useful not only for categorizing users, but
also for automatically generating user and community profiles. Inspired by
traditional summarization approaches, we create the profiles by selecting
diverse and representative content from all available modalities, i.e. the
text, image and user modality. The usefulness of the approach is evaluated
using artificial actors, which simulate user behavior in a relevance feedback
scenario. Multiple experiments were conducted in order to evaluate the quality
of our multimodal representations, to compare different embedding strategies,
and to determine the importance of different modalities. We demonstrate the
capabilities of the proposed approach on two different multimedia collections
originating from the violent online extremism forum Stormfront and the
microblogging platform Twitter, which are particularly interesting due to the
high semantic level of the discussions they feature
- …