3,589 research outputs found
Conditional network embeddings
Network Embeddings (NEs) map the nodes of a given network into -dimensional Euclidean space . Ideally, this mapping is such that 'similar' nodes are mapped onto nearby points, such that the NE can be used for purposes such as link prediction (if 'similar' means being 'more likely to be connected') or classification (if 'similar' means 'being more likely to have the same label'). In recent years various methods for NE have been introduced, all following a similar strategy: defining a notion of similarity between nodes (typically some distance measure within the network), a distance measure in the embedding space, and a loss function that penalizes large distances for similar nodes and small distances for dissimilar nodes.
A difficulty faced by existing methods is that certain networks are fundamentally hard to embed due to their structural properties: (approximate) multipartiteness, certain degree distributions, assortativity, etc. To overcome this, we introduce a conceptual innovation to the NE literature and propose to create \emph{Conditional Network Embeddings} (CNEs); embeddings that maximally add information with respect to given structural properties (e.g. node degrees, block densities, etc.). We use a simple Bayesian approach to achieve this, and propose a block stochastic gradient descent algorithm for fitting it efficiently.
We demonstrate that CNEs are superior for link prediction and multi-label classification when compared to state-of-the-art methods, and this without adding significant mathematical or computational complexity. Finally, we illustrate the potential of CNE for network visualization
HARP: Hierarchical Representation Learning for Networks
We present HARP, a novel method for learning low dimensional embeddings of a
graph's nodes which preserves higher-order structural features. Our proposed
method achieves this by compressing the input graph prior to embedding it,
effectively avoiding troublesome embedding configurations (i.e. local minima)
which can pose problems to non-convex optimization. HARP works by finding a
smaller graph which approximates the global structure of its input. This
simplified graph is used to learn a set of initial representations, which serve
as good initializations for learning representations in the original, detailed
graph. We inductively extend this idea, by decomposing a graph in a series of
levels, and then embed the hierarchy of graphs from the coarsest one to the
original graph. HARP is a general meta-strategy to improve all of the
state-of-the-art neural algorithms for embedding graphs, including DeepWalk,
LINE, and Node2vec. Indeed, we demonstrate that applying HARP's hierarchical
paradigm yields improved implementations for all three of these methods, as
evaluated on both classification tasks on real-world graphs such as DBLP,
BlogCatalog, CiteSeer, and Arxiv, where we achieve a performance gain over the
original implementations by up to 14% Macro F1.Comment: To appear in AAAI 201
LINE: Large-scale Information Network Embedding
This paper studies the problem of embedding very large information networks
into low-dimensional vector spaces, which is useful in many tasks such as
visualization, node classification, and link prediction. Most existing graph
embedding methods do not scale for real world information networks which
usually contain millions of nodes. In this paper, we propose a novel network
embedding method called the "LINE," which is suitable for arbitrary types of
information networks: undirected, directed, and/or weighted. The method
optimizes a carefully designed objective function that preserves both the local
and global network structures. An edge-sampling algorithm is proposed that
addresses the limitation of the classical stochastic gradient descent and
improves both the effectiveness and the efficiency of the inference. Empirical
experiments prove the effectiveness of the LINE on a variety of real-world
information networks, including language networks, social networks, and
citation networks. The algorithm is very efficient, which is able to learn the
embedding of a network with millions of vertices and billions of edges in a few
hours on a typical single machine. The source code of the LINE is available
online.Comment: WWW 201
- …