3,221 research outputs found
Incremental Skip-gram Model with Negative Sampling
This paper explores an incremental training strategy for the skip-gram model
with negative sampling (SGNS) from both empirical and theoretical perspectives.
Existing methods of neural word embeddings, including SGNS, are multi-pass
algorithms and thus cannot perform incremental model update. To address this
problem, we present a simple incremental extension of SGNS and provide a
thorough theoretical analysis to demonstrate its validity. Empirical
experiments demonstrated the correctness of the theoretical analysis as well as
the practical usefulness of the incremental algorithm
Distributed representation of multi-sense words: A loss-driven approach
Word2Vec's Skip Gram model is the current state-of-the-art approach for
estimating the distributed representation of words. However, it assumes a
single vector per word, which is not well-suited for representing words that
have multiple senses. This work presents LDMI, a new model for estimating
distributional representations of words. LDMI relies on the idea that, if a
word carries multiple senses, then having a different representation for each
of its senses should lead to a lower loss associated with predicting its
co-occurring words, as opposed to the case when a single vector representation
is used for all the senses. After identifying the multi-sense words, LDMI
clusters the occurrences of these words to assign a sense to each occurrence.
Experiments on the contextual word similarity task show that LDMI leads to
better performance than competing approaches.Comment: PAKDD 2018 Best paper award runner-u
Improving Negative Sampling for Word Representation using Self-embedded Features
Although the word-popularity based negative sampler has shown superb
performance in the skip-gram model, the theoretical motivation behind
oversampling popular (non-observed) words as negative samples is still not well
understood. In this paper, we start from an investigation of the gradient
vanishing issue in the skipgram model without a proper negative sampler. By
performing an insightful analysis from the stochastic gradient descent (SGD)
learning perspective, we demonstrate that, both theoretically and intuitively,
negative samples with larger inner product scores are more informative than
those with lower scores for the SGD learner in terms of both convergence rate
and accuracy. Understanding this, we propose an alternative sampling algorithm
that dynamically selects informative negative samples during each SGD update.
More importantly, the proposed sampler accounts for multi-dimensional
self-embedded features during the sampling process, which essentially makes it
more effective than the original popularity-based (one-dimensional) sampler.
Empirical experiments further verify our observations, and show that our
fine-grained samplers gain significant improvement over the existing ones
without increasing computational complexity.Comment: Accepted in WSDM 201
Combining Language and Vision with a Multimodal Skip-gram Model
We extend the SKIP-GRAM model of Mikolov et al. (2013a) by taking visual
information into account. Like SKIP-GRAM, our multimodal models (MMSKIP-GRAM)
build vector-based word representations by learning to predict linguistic
contexts in text corpora. However, for a restricted set of words, the models
are also exposed to visual representations of the objects they denote
(extracted from natural images), and must predict linguistic and visual
features jointly. The MMSKIP-GRAM models achieve good performance on a variety
of semantic benchmarks. Moreover, since they propagate visual information to
all words, we use them to improve image labeling and retrieval in the zero-shot
setup, where the test concepts are never seen during model training. Finally,
the MMSKIP-GRAM models discover intriguing visual properties of abstract words,
paving the way to realistic implementations of embodied theories of meaning.Comment: accepted at NAACL 2015, camera ready version, 11 page
- …