6 research outputs found
Improving Skip-Gram based Graph Embeddings via Centrality-Weighted Sampling
Network embedding techniques inspired by word2vec represent an effective
unsupervised relational learning model. Commonly, by means of a Skip-Gram
procedure, these techniques learn low dimensional vector representations of the
nodes in a graph by sampling node-context examples. Although many ways of
sampling the context of a node have been proposed, the effects of the way a
node is chosen have not been analyzed in depth. To fill this gap, we have
re-implemented the main four word2vec inspired graph embedding techniques under
the same framework and analyzed how different sampling distributions affects
embeddings performance when tested in node classification problems. We present
a set of experiments on different well known real data sets that show how the
use of popular centrality distributions in sampling leads to improvements,
obtaining speeds of up to 2 times in learning times and increasing accuracy in
all cases
Robust Negative Sampling for Network Embedding
Many recent network embedding algorithms use negative sampling (NS) to approximate a variant of the computationally expensive Skip-Gram neural network architecture (SGA) objective. In this paper, we provide theoretical arguments that reveal how NS can fail to properly estimate the SGA objective, and why it is not a suitable candidate for the network embedding problem as a distinct objective. We show NS can learn undesirable embeddings, as the result of the “Popular Neighbor Problem.” We use the theory to develop a new method “R-NS” that alleviates the problems of NS by using a more intelligent negative sampling scheme and careful penalization of the embeddings. R-NS is scalable to large-scale networks, and we empirically demonstrate the superiority of R-NS over NS for multi-label classification on a variety of real-world networks including social networks and language networks