31 research outputs found
Recommending with an Agenda: Active Learning of Private Attributes using Matrix Factorization
Recommender systems leverage user demographic information, such as age,
gender, etc., to personalize recommendations and better place their targeted
ads. Oftentimes, users do not volunteer this information due to privacy
concerns, or due to a lack of initiative in filling out their online profiles.
We illustrate a new threat in which a recommender learns private attributes of
users who do not voluntarily disclose them. We design both passive and active
attacks that solicit ratings for strategically selected items, and could thus
be used by a recommender system to pursue this hidden agenda. Our methods are
based on a novel usage of Bayesian matrix factorization in an active learning
setting. Evaluations on multiple datasets illustrate that such attacks are
indeed feasible and use significantly fewer rated items than static inference
methods. Importantly, they succeed without sacrificing the quality of
recommendations to users.Comment: This is the extended version of a paper that appeared in ACM RecSys
201
Node Classification in Social Networks
When dealing with large graphs, such as those that arise in the context of
online social networks, a subset of nodes may be labeled. These labels can
indicate demographic values, interest, beliefs or other characteristics of the
nodes (users). A core problem is to use this information to extend the labeling
so that all nodes are assigned a label (or labels). In this chapter, we survey
classification techniques that have been proposed for this problem. We consider
two broad categories: methods based on iterative application of traditional
classifiers using graph information as features, and methods which propagate
the existing labels via random walks. We adopt a common perspective on these
methods to highlight the similarities between different approaches within and
across the two categories. We also describe some extensions and related
directions to the central problem of node classification.Comment: To appear in Social Network Data Analytics (Springer) Ed. Charu
Aggarwal, March 201
Privacy Tradeoffs in Predictive Analytics
Online services routinely mine user data to predict user preferences, make
recommendations, and place targeted ads. Recent research has demonstrated that
several private user attributes (such as political affiliation, sexual
orientation, and gender) can be inferred from such data. Can a
privacy-conscious user benefit from personalization while simultaneously
protecting her private attributes? We study this question in the context of a
rating prediction service based on matrix factorization. We construct a
protocol of interactions between the service and users that has remarkable
optimality properties: it is privacy-preserving, in that no inference algorithm
can succeed in inferring a user's private attribute with a probability better
than random guessing; it has maximal accuracy, in that no other
privacy-preserving protocol improves rating prediction; and, finally, it
involves a minimal disclosure, as the prediction accuracy strictly decreases
when the service reveals less information. We extensively evaluate our protocol
using several rating datasets, demonstrating that it successfully blocks the
inference of gender, age and political affiliation, while incurring less than
5% decrease in the accuracy of rating prediction.Comment: Extended version of the paper appearing in SIGMETRICS 201
CSNE: Conditional Signed Network Embedding
Signed networks are mathematical structures that encode positive and negative
relations between entities such as friend/foe or trust/distrust. Recently,
several papers studied the construction of useful low-dimensional
representations (embeddings) of these networks for the prediction of missing
relations or signs. Existing embedding methods for sign prediction generally
enforce different notions of status or balance theories in their optimization
function. These theories, however, are often inaccurate or incomplete, which
negatively impacts method performance.
In this context, we introduce conditional signed network embedding (CSNE).
Our probabilistic approach models structural information about the signs in the
network separately from fine-grained detail. Structural information is
represented in the form of a prior, while the embedding itself is used for
capturing fine-grained information. These components are then integrated in a
rigorous manner. CSNE's accuracy depends on the existence of sufficiently
powerful structural priors for modelling signed networks, currently unavailable
in the literature. Thus, as a second main contribution, which we find to be
highly valuable in its own right, we also introduce a novel approach to
construct priors based on the Maximum Entropy (MaxEnt) principle. These priors
can model the \emph{polarity} of nodes (degree to which their links are
positive) as well as signed \emph{triangle counts} (a measure of the degree
structural balance holds to in a network).
Experiments on a variety of real-world networks confirm that CSNE outperforms
the state-of-the-art on the task of sign prediction. Moreover, the MaxEnt
priors on their own, while less accurate than full CSNE, achieve accuracies
competitive with the state-of-the-art at very limited computational cost, thus
providing an excellent runtime-accuracy trade-off in resource-constrained
situations
Representation Learning for Attributed Multiplex Heterogeneous Network
Network embedding (or graph embedding) has been widely used in many
real-world applications. However, existing methods mainly focus on networks
with single-typed nodes/edges and cannot scale well to handle large networks.
Many real-world networks consist of billions of nodes and edges of multiple
types, and each node is associated with different attributes. In this paper, we
formalize the problem of embedding learning for the Attributed Multiplex
Heterogeneous Network and propose a unified framework to address this problem.
The framework supports both transductive and inductive learning. We also give
the theoretical analysis of the proposed framework, showing its connection with
previous works and proving its better expressiveness. We conduct systematical
evaluations for the proposed framework on four different genres of challenging
datasets: Amazon, YouTube, Twitter, and Alibaba. Experimental results
demonstrate that with the learned embeddings from the proposed framework, we
can achieve statistically significant improvements (e.g., 5.99-28.23% lift by
F1 scores; p<<0.01, t-test) over previous state-of-the-art methods for link
prediction. The framework has also been successfully deployed on the
recommendation system of a worldwide leading e-commerce company, Alibaba Group.
Results of the offline A/B tests on product recommendation further confirm the
effectiveness and efficiency of the framework in practice.Comment: Accepted to KDD 2019. Website: https://sites.google.com/view/gatn
Unsupervised Domain Adaptive Graph Convolutional Networks
Graph convolutional networks (GCNs) have achieved impressive success in many graph related analytics tasks. However, most GCNs only work in a single domain (graph) incapable of transferring knowledge from/to other domains (graphs), due to the challenges in both graph representation learning and domain adaptation over graph structures. In this paper, we present a novel approach, unsupervised domain adaptive graph convolutional networks (UDA-GCN), for domain adaptation learning for graphs. To enable effective graph representation learning, we first develop a dual graph convolutional network component, which jointly exploits local and global consistency for feature aggregation. An attention mechanism is further used to produce a unified representation for each node in different graphs. To facilitate knowledge transfer between graphs, we propose a domain adaptive learning module to optimize three different loss functions, namely source classifier loss, domain classifier loss, and target classifier loss as a whole, thus our model can differentiate class labels in the source domain, samples from different domains, the class labels from the target domain, respectively. Experimental results on real-world datasets in the node classification task validate the performance of our method, compared to state-of-the-art graph neural network algorithms
IP SAN: Low on Fibre
The era of Fibre Channel interoperability is slowly dawning. In a world where Internet Protocol (IP) dominates local and wide area networks, and data storage requirements grow unabated, it seems inevitable that these two forces converge. Internet Protocol Storage Area Networks (IP SANs) [4] unite storage and IP networking enabling IP and Ethernet infrastructure to be used for expanding access to SAN storage and extending SAN connectivity across any distance. This paper provides an overview of storage networking and discusses its need. It goes deeper into the storage area network technology and analyzes its two types: Fibre Channel SAN [1] (FC SAN) and IP SAN. The paper then compares the two, making a case for IP SAN and highlighting that IP SANs will take the centre stage, possibly replacing Fibre Channel SANs completely.