96 research outputs found
Search Rank Fraud De-Anonymization in Online Systems
We introduce the fraud de-anonymization problem, that goes beyond fraud
detection, to unmask the human masterminds responsible for posting search rank
fraud in online systems. We collect and study search rank fraud data from
Upwork, and survey the capabilities and behaviors of 58 search rank fraudsters
recruited from 6 crowdsourcing sites. We propose Dolos, a fraud
de-anonymization system that leverages traits and behaviors extracted from
these studies, to attribute detected fraud to crowdsourcing site fraudsters,
thus to real identities and bank accounts. We introduce MCDense, a min-cut
dense component detection algorithm to uncover groups of user accounts
controlled by different fraudsters, and leverage stylometry and deep learning
to attribute them to crowdsourcing site profiles. Dolos correctly identified
the owners of 95% of fraudster-controlled communities, and uncovered fraudsters
who promoted as many as 97.5% of fraud apps we collected from Google Play. When
evaluated on 13,087 apps (820,760 reviews), which we monitored over more than 6
months, Dolos identified 1,056 apps with suspicious reviewer groups. We report
orthogonal evidence of their fraud, including fraud duplicates and fraud
re-posts.Comment: The 29Th ACM Conference on Hypertext and Social Media, July 201
Where Graph Topology Matters: The Robust Subgraph Problem
Robustness is a critical measure of the resilience of large networked
systems, such as transportation and communication networks. Most prior works
focus on the global robustness of a given graph at large, e.g., by measuring
its overall vulnerability to external attacks or random failures. In this
paper, we turn attention to local robustness and pose a novel problem in the
lines of subgraph mining: given a large graph, how can we find its most robust
local subgraph (RLS)?
We define a robust subgraph as a subset of nodes with high communicability
among them, and formulate the RLS-PROBLEM of finding a subgraph of given size
with maximum robustness in the host graph. Our formulation is related to the
recently proposed general framework for the densest subgraph problem, however
differs from it substantially in that besides the number of edges in the
subgraph, robustness also concerns with the placement of edges, i.e., the
subgraph topology. We show that the RLS-PROBLEM is NP-hard and propose two
heuristic algorithms based on top-down and bottom-up search strategies.
Further, we present modifications of our algorithms to handle three practical
variants of the RLS-PROBLEM. Experiments on synthetic and real-world graphs
demonstrate that we find subgraphs with larger robustness than the densest
subgraphs even at lower densities, suggesting that the existing approaches are
not suitable for the new problem setting.Comment: 13 pages, 10 Figures, 3 Tables, to appear at SDM 2015 (9 pages only
Hyperparameter Optimization for Unsupervised Outlier Detection
Given an unsupervised outlier detection (OD) algorithm, how can we optimize
its hyperparameter(s) (HP) on a new dataset, without any labels? In this work,
we address this challenging hyperparameter optimization for unsupervised OD
problem, and propose the first systematic approach called HPOD that is based on
meta-learning. HPOD capitalizes on the prior performance of a large collection
of HPs on existing OD benchmark datasets, and transfers this information to
enable HP evaluation on a new dataset without labels. Moreover, HPOD adapts a
prominent sampling paradigm to identify promising HPs efficiently. Extensive
experiments show that HPOD works with both deep (e.g., Robust AutoEncoder) and
shallow (e.g., Local Outlier Factor (LOF) and Isolation Forest (iForest)) OD
algorithms on discrete and continuous HP spaces, and outperforms a wide range
of baselines with on average 58% and 66% performance improvement over the
default HPs of LOF and iForest
- …