7 research outputs found
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation
Many security and privacy problems can be modeled as a graph classification
problem, where nodes in the graph are classified by collective classification
simultaneously. State-of-the-art collective classification methods for such
graph-based security and privacy analytics follow the following paradigm:
assign weights to edges of the graph, iteratively propagate reputation scores
of nodes among the weighted graph, and use the final reputation scores to
classify nodes in the graph. The key challenge is to assign edge weights such
that an edge has a large weight if the two corresponding nodes have the same
label, and a small weight otherwise. Although collective classification has
been studied and applied for security and privacy problems for more than a
decade, how to address this challenge is still an open question. In this work,
we propose a novel collective classification framework to address this
long-standing challenge. We first formulate learning edge weights as an
optimization problem, which quantifies the goals about the final reputation
scores that we aim to achieve. However, it is computationally hard to solve the
optimization problem because the final reputation scores depend on the edge
weights in a very complex way. To address the computational challenge, we
propose to jointly learn the edge weights and propagate the reputation scores,
which is essentially an approximate solution to the optimization problem. We
compare our framework with state-of-the-art methods for graph-based security
and privacy analytics using four large-scale real-world datasets from various
application scenarios such as Sybil detection in social networks, fake review
detection in Yelp, and attribute inference attacks. Our results demonstrate
that our framework achieves higher accuracies than state-of-the-art methods
with an acceptable computational overhead.Comment: Network and Distributed System Security Symposium (NDSS), 2019.
Dataset link: http://gonglab.pratt.duke.edu/code-dat
Factorized Graph Representations for Semi-Supervised Learning from Sparse Data
Node classification is an important problem in graph data management. It is
commonly solved by various label propagation methods that work iteratively
starting from a few labeled seed nodes. For graphs with arbitrary
compatibilities between classes, these methods crucially depend on knowing the
compatibility matrix that must be provided by either domain experts or
heuristics. Can we instead directly estimate the correct compatibilities from a
sparsely labeled graph in a principled and scalable way? We answer this
question affirmatively and suggest a method called distant compatibility
estimation that works even on extremely sparsely labeled graphs (e.g., 1 in
10,000 nodes is labeled) in a fraction of the time it later takes to label the
remaining nodes. Our approach first creates multiple factorized graph
representations (with size independent of the graph) and then performs
estimation on these smaller graph sketches. We define algebraic amplification
as the more general idea of leveraging algebraic properties of an algorithm's
update equations to amplify sparse signals. We show that our estimator is by
orders of magnitude faster than an alternative approach and that the end-to-end
classification accuracy is comparable to using gold standard compatibilities.
This makes it a cheap preprocessing step for any existing label propagation
method and removes the current dependence on heuristics.Comment: SIGMOD 2020 (Extended version
Graph-based security and privacy analytics via collective classification
Graphs are a powerful tool to represent complex interactions between various entities. A particular family of graph-based machine learning techniques called collective classification has been applied to various security and privacy problems, e.g., malware detection, Sybil detection in social networks, fake review detection, malicious website detection, auction fraud detection, APT infection detection, attribute inference attacks, etc.. Moreover, some collective classification methods have been deployed in industry, e.g., Symantec deployed collective classification to detect malware; Tuenti, the largest social network in Spain, deployed collective classification to detect Sybils.
In this dissertation, we aim to systematically study graph-based security and privacy problems that are modeled via collective classification. In particular, we focus on collective classification methods that leverage random walk (RW) or loopy belief propagation (LBP).
First, we propose a local rule-based framework to unify existing RW-based and LBP-based methods. Under our framework, existing methods can be viewed as iteratively applying a different local rule to every node in the graph. know about the node.
Second, we design a novel local rule for undirected graphs. Based on our local rule, we propose a collective classification method that can maintain the advantages and overcome the disadvantages of state-of-the-art undirected graph-based collective classification methods for Sybil detection.
Third, many security and privacy problems are modeled using directed graphs. Directed graph- based security and privacy problems have their unique characteristics. Existing undirected graph- based collective classification methods (e.g., LBP-based methods) cannot be applied to directed graphs and existing directed graph-based methods (e.g., RW-based methods) cannot make full use of the labeled training set. To address the issue, we develop a novel local rule for directed graph-based Sybil detection and propose a collective classification method that captures unique characteristics of directed graph-based Sybil detection.
Finally, one key issue of all collective classification methods is that they either assign small weights to a large number of edges whose two corresponding nodes have the same label or/and assign large weights to a large number of edges whose two corresponding nodes have different labels. Although collective classification has been studied and applied for security and privacy problems for more than a decade, it is still challenging to assign edge weights such that an edge has a large weight if the two corresponding nodes have the same label, and a small weight otherwise. We develop a novel collective classification framework to address this long-standing challenge. Specifically, we first formulate learning edge weights as an optimization problem, which, however, is computationally challenging to solve. Then, we relax the optimization problem and design an efficient joint weight learning and propagation algorithm to solve this approximate optimization problem
The Linearization of Belief Propagation on Pairwise Markov Random Fields
Belief Propagation (BP) is a widely used approximation for exact probabilistic inference in graphical models, such as Markov Random Fields (MRFs). In graphs with cycles, however, no exact convergence guarantees for BP are known, in general. For the case when all edges in the MRF carry the same symmetric, doubly stochastic potential, recent works have proposed to approximate BP by linearizing the update equations around default values, which was shown to work well for the problem of node classification. The present paper generalizes all prior work and derives an approach that approximates loopy BP on any pairwise MRF with the problem of solving a linear equation system. This approach combines exact convergence guarantees and a fast matrix implementation with the ability to model heterogenous networks. Experiments on synthetic graphs with planted edge potentials show that the linearization has comparable labeling accuracy as BP for graphs with weak potentials, while speeding-up inference by orders of magnitude