4,963 research outputs found
PageRank optimization applied to spam detection
We give a new link spam detection and PageRank demotion algorithm called
MaxRank. Like TrustRank and AntiTrustRank, it starts with a seed of hand-picked
trusted and spam pages. We define the MaxRank of a page as the frequency of
visit of this page by a random surfer minimizing an average cost per time unit.
On a given page, the random surfer selects a set of hyperlinks and clicks with
uniform probability on any of these hyperlinks. The cost function penalizes
spam pages and hyperlink removals. The goal is to determine a hyperlink
deletion policy that minimizes this score. The MaxRank is interpreted as a
modified PageRank vector, used to sort web pages instead of the usual PageRank
vector. The bias vector of this ergodic control problem, which is unique up to
an additive constant, is a measure of the "spamicity" of each page, used to
detect spam pages. We give a scalable algorithm for MaxRank computation that
allowed us to perform experimental results on the WEBSPAM-UK2007 dataset. We
show that our algorithm outperforms both TrustRank and AntiTrustRank for spam
and nonspam page detection.Comment: 8 pages, 6 figure
POISED: Spotting Twitter Spam Off the Beaten Paths
Cybercriminals have found in online social networks a propitious medium to
spread spam and malicious content. Existing techniques for detecting spam
include predicting the trustworthiness of accounts and analyzing the content of
these messages. However, advanced attackers can still successfully evade these
defenses.
Online social networks bring people who have personal connections or share
common interests to form communities. In this paper, we first show that users
within a networked community share some topics of interest. Moreover, content
shared on these social network tend to propagate according to the interests of
people. Dissemination paths may emerge where some communities post similar
messages, based on the interests of those communities. Spam and other malicious
content, on the other hand, follow different spreading patterns.
In this paper, we follow this insight and present POISED, a system that
leverages the differences in propagation between benign and malicious messages
on social networks to identify spam and other unwanted content. We test our
system on a dataset of 1.3M tweets collected from 64K users, and we show that
our approach is effective in detecting malicious messages, reaching 91%
precision and 93% recall. We also show that POISED's detection is more
comprehensive than previous systems, by comparing it to three state-of-the-art
spam detection systems that have been proposed by the research community in the
past. POISED significantly outperforms each of these systems. Moreover, through
simulations, we show how POISED is effective in the early detection of spam
messages and how it is resilient against two well-known adversarial machine
learning attacks
Efficient Implementation of a Synchronous Parallel Push-Relabel Algorithm
Motivated by the observation that FIFO-based push-relabel algorithms are able
to outperform highest label-based variants on modern, large maximum flow
problem instances, we introduce an efficient implementation of the algorithm
that uses coarse-grained parallelism to avoid the problems of existing parallel
approaches. We demonstrate good relative and absolute speedups of our algorithm
on a set of large graph instances taken from real-world applications. On a
modern 40-core machine, our parallel implementation outperforms existing
sequential implementations by up to a factor of 12 and other parallel
implementations by factors of up to 3
- …