3,467 research outputs found
Improving Negative Sampling for Word Representation using Self-embedded Features
Although the word-popularity based negative sampler has shown superb
performance in the skip-gram model, the theoretical motivation behind
oversampling popular (non-observed) words as negative samples is still not well
understood. In this paper, we start from an investigation of the gradient
vanishing issue in the skipgram model without a proper negative sampler. By
performing an insightful analysis from the stochastic gradient descent (SGD)
learning perspective, we demonstrate that, both theoretically and intuitively,
negative samples with larger inner product scores are more informative than
those with lower scores for the SGD learner in terms of both convergence rate
and accuracy. Understanding this, we propose an alternative sampling algorithm
that dynamically selects informative negative samples during each SGD update.
More importantly, the proposed sampler accounts for multi-dimensional
self-embedded features during the sampling process, which essentially makes it
more effective than the original popularity-based (one-dimensional) sampler.
Empirical experiments further verify our observations, and show that our
fine-grained samplers gain significant improvement over the existing ones
without increasing computational complexity.Comment: Accepted in WSDM 201
Regularizing Matrix Factorization with User and Item Embeddings for Recommendation
Following recent successes in exploiting both latent factor and word
embedding models in recommendation, we propose a novel Regularized
Multi-Embedding (RME) based recommendation model that simultaneously
encapsulates the following ideas via decomposition: (1) which items a user
likes, (2) which two users co-like the same items, (3) which two items users
often co-liked, and (4) which two items users often co-disliked. In
experimental validation, the RME outperforms competing state-of-the-art models
in both explicit and implicit feedback datasets, significantly improving
Recall@5 by 5.9~7.0%, NDCG@20 by 4.3~5.6%, and MAP@10 by 7.9~8.9%. In addition,
under the cold-start scenario for users with the lowest number of interactions,
against the competing models, the RME outperforms NDCG@5 by 20.2% and 29.4% in
MovieLens-10M and MovieLens-20M datasets, respectively. Our datasets and source
code are available at: https://github.com/thanhdtran/RME.git.Comment: CIKM 201
LINE: Large-scale Information Network Embedding
This paper studies the problem of embedding very large information networks
into low-dimensional vector spaces, which is useful in many tasks such as
visualization, node classification, and link prediction. Most existing graph
embedding methods do not scale for real world information networks which
usually contain millions of nodes. In this paper, we propose a novel network
embedding method called the "LINE," which is suitable for arbitrary types of
information networks: undirected, directed, and/or weighted. The method
optimizes a carefully designed objective function that preserves both the local
and global network structures. An edge-sampling algorithm is proposed that
addresses the limitation of the classical stochastic gradient descent and
improves both the effectiveness and the efficiency of the inference. Empirical
experiments prove the effectiveness of the LINE on a variety of real-world
information networks, including language networks, social networks, and
citation networks. The algorithm is very efficient, which is able to learn the
embedding of a network with millions of vertices and billions of edges in a few
hours on a typical single machine. The source code of the LINE is available
online.Comment: WWW 201
- …