1,517 research outputs found
DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging
Tagging news articles or blog posts with relevant tags from a collection of
predefined ones is coined as document tagging in this work. Accurate tagging of
articles can benefit several downstream applications such as recommendation and
search. In this work, we propose a novel yet simple approach called DocTag2Vec
to accomplish this task. We substantially extend Word2Vec and Doc2Vec---two
popular models for learning distributed representation of words and documents.
In DocTag2Vec, we simultaneously learn the representation of words, documents,
and tags in a joint vector space during training, and employ the simple
-nearest neighbor search to predict tags for unseen documents. In contrast
to previous multi-label learning methods, DocTag2Vec directly deals with raw
text instead of provided feature vector, and in addition, enjoys advantages
like the learning of tag representation, and the ability of handling newly
created tags. To demonstrate the effectiveness of our approach, we conduct
experiments on several datasets and show promising results against
state-of-the-art methods.Comment: 10 page
Improving Negative Sampling for Word Representation using Self-embedded Features
Although the word-popularity based negative sampler has shown superb
performance in the skip-gram model, the theoretical motivation behind
oversampling popular (non-observed) words as negative samples is still not well
understood. In this paper, we start from an investigation of the gradient
vanishing issue in the skipgram model without a proper negative sampler. By
performing an insightful analysis from the stochastic gradient descent (SGD)
learning perspective, we demonstrate that, both theoretically and intuitively,
negative samples with larger inner product scores are more informative than
those with lower scores for the SGD learner in terms of both convergence rate
and accuracy. Understanding this, we propose an alternative sampling algorithm
that dynamically selects informative negative samples during each SGD update.
More importantly, the proposed sampler accounts for multi-dimensional
self-embedded features during the sampling process, which essentially makes it
more effective than the original popularity-based (one-dimensional) sampler.
Empirical experiments further verify our observations, and show that our
fine-grained samplers gain significant improvement over the existing ones
without increasing computational complexity.Comment: Accepted in WSDM 201
A Comparative Study on Regularization Strategies for Embedding-based Neural Networks
This paper aims to compare different regularization strategies to address a
common phenomenon, severe overfitting, in embedding-based neural networks for
NLP. We chose two widely studied neural models and tasks as our testbed. We
tried several frequently applied or newly proposed regularization strategies,
including penalizing weights (embeddings excluded), penalizing embeddings,
re-embedding words, and dropout. We also emphasized on incremental
hyperparameter tuning, and combining different regularizations. The results
provide a picture on tuning hyperparameters for neural NLP models.Comment: EMNLP '1
A Deep Network with Visual Text Composition Behavior
While natural languages are compositional, how state-of-the-art neural models
achieve compositionality is still unclear. We propose a deep network, which not
only achieves competitive accuracy for text classification, but also exhibits
compositional behavior. That is, while creating hierarchical representations of
a piece of text, such as a sentence, the lower layers of the network distribute
their layer-specific attention weights to individual words. In contrast, the
higher layers compose meaningful phrases and clauses, whose lengths increase as
the networks get deeper until fully composing the sentence.Comment: accepted to ACL201
- …