6,080 research outputs found
LINE: Large-scale Information Network Embedding
This paper studies the problem of embedding very large information networks
into low-dimensional vector spaces, which is useful in many tasks such as
visualization, node classification, and link prediction. Most existing graph
embedding methods do not scale for real world information networks which
usually contain millions of nodes. In this paper, we propose a novel network
embedding method called the "LINE," which is suitable for arbitrary types of
information networks: undirected, directed, and/or weighted. The method
optimizes a carefully designed objective function that preserves both the local
and global network structures. An edge-sampling algorithm is proposed that
addresses the limitation of the classical stochastic gradient descent and
improves both the effectiveness and the efficiency of the inference. Empirical
experiments prove the effectiveness of the LINE on a variety of real-world
information networks, including language networks, social networks, and
citation networks. The algorithm is very efficient, which is able to learn the
embedding of a network with millions of vertices and billions of edges in a few
hours on a typical single machine. The source code of the LINE is available
online.Comment: WWW 201
Exploring Student Check-In Behavior for Improved Point-of-Interest Prediction
With the availability of vast amounts of user visitation history on
location-based social networks (LBSN), the problem of Point-of-Interest (POI)
prediction has been extensively studied. However, much of the research has been
conducted solely on voluntary checkin datasets collected from social apps such
as Foursquare or Yelp. While these data contain rich information about
recreational activities (e.g., restaurants, nightlife, and entertainment),
information about more prosaic aspects of people's lives is sparse. This not
only limits our understanding of users' daily routines, but more importantly
the modeling assumptions developed based on characteristics of recreation-based
data may not be suitable for richer check-in data. In this work, we present an
analysis of education "check-in" data using WiFi access logs collected at
Purdue University. We propose a heterogeneous graph-based method to encode the
correlations between users, POIs, and activities, and then jointly learn
embeddings for the vertices. We evaluate our method compared to previous
state-of-the-art POI prediction methods, and show that the assumptions made by
previous methods significantly degrade performance on our data with dense(r)
activity signals. We also show how our learned embeddings could be used to
identify similar students (e.g., for friend suggestions).Comment: published in KDD'1
Semi-supervised Embedding in Attributed Networks with Outliers
In this paper, we propose a novel framework, called Semi-supervised Embedding
in Attributed Networks with Outliers (SEANO), to learn a low-dimensional vector
representation that systematically captures the topological proximity,
attribute affinity and label similarity of vertices in a partially labeled
attributed network (PLAN). Our method is designed to work in both transductive
and inductive settings while explicitly alleviating noise effects from
outliers. Experimental results on various datasets drawn from the web, text and
image domains demonstrate the advantages of SEANO over state-of-the-art methods
in semi-supervised classification under transductive as well as inductive
settings. We also show that a subset of parameters in SEANO is interpretable as
outlier score and can significantly outperform baseline methods when applied
for detecting network outliers. Finally, we present the use of SEANO in a
challenging real-world setting -- flood mapping of satellite images and show
that it is able to outperform modern remote sensing algorithms for this task.Comment: in Proceedings of SIAM International Conference on Data Mining
(SDM'18
Neural Embeddings of Graphs in Hyperbolic Space
Neural embeddings have been used with great success in Natural Language
Processing (NLP). They provide compact representations that encapsulate word
similarity and attain state-of-the-art performance in a range of linguistic
tasks. The success of neural embeddings has prompted significant amounts of
research into applications in domains other than language. One such domain is
graph-structured data, where embeddings of vertices can be learned that
encapsulate vertex similarity and improve performance on tasks including edge
prediction and vertex labelling. For both NLP and graph based tasks, embeddings
have been learned in high-dimensional Euclidean spaces. However, recent work
has shown that the appropriate isometric space for embedding complex networks
is not the flat Euclidean space, but negatively curved, hyperbolic space. We
present a new concept that exploits these recent insights and propose learning
neural embeddings of graphs in hyperbolic space. We provide experimental
evidence that embedding graphs in their natural geometry significantly improves
performance on downstream tasks for several real-world public datasets.Comment: 7 pages, 5 figure
struc2vec: Learning Node Representations from Structural Identity
Structural identity is a concept of symmetry in which network nodes are
identified according to the network structure and their relationship to other
nodes. Structural identity has been studied in theory and practice over the
past decades, but only recently has it been addressed with representational
learning techniques. This work presents struc2vec, a novel and flexible
framework for learning latent representations for the structural identity of
nodes. struc2vec uses a hierarchy to measure node similarity at different
scales, and constructs a multilayer graph to encode structural similarities and
generate structural context for nodes. Numerical experiments indicate that
state-of-the-art techniques for learning node representations fail in capturing
stronger notions of structural identity, while struc2vec exhibits much superior
performance in this task, as it overcomes limitations of prior approaches. As a
consequence, numerical experiments indicate that struc2vec improves performance
on classification tasks that depend more on structural identity.Comment: 10 pages, KDD2017, Research Trac
- …