502 research outputs found
DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging
Tagging news articles or blog posts with relevant tags from a collection of
predefined ones is coined as document tagging in this work. Accurate tagging of
articles can benefit several downstream applications such as recommendation and
search. In this work, we propose a novel yet simple approach called DocTag2Vec
to accomplish this task. We substantially extend Word2Vec and Doc2Vec---two
popular models for learning distributed representation of words and documents.
In DocTag2Vec, we simultaneously learn the representation of words, documents,
and tags in a joint vector space during training, and employ the simple
-nearest neighbor search to predict tags for unseen documents. In contrast
to previous multi-label learning methods, DocTag2Vec directly deals with raw
text instead of provided feature vector, and in addition, enjoys advantages
like the learning of tag representation, and the ability of handling newly
created tags. To demonstrate the effectiveness of our approach, we conduct
experiments on several datasets and show promising results against
state-of-the-art methods.Comment: 10 page
Structural Regularities in Text-based Entity Vector Spaces
Entity retrieval is the task of finding entities such as people or products
in response to a query, based solely on the textual documents they are
associated with. Recent semantic entity retrieval algorithms represent queries
and experts in finite-dimensional vector spaces, where both are constructed
from text sequences.
We investigate entity vector spaces and the degree to which they capture
structural regularities. Such vector spaces are constructed in an unsupervised
manner without explicit information about structural aspects. For concreteness,
we address these questions for a specific type of entity: experts in the
context of expert finding. We discover how clusterings of experts correspond to
committees in organizations, the ability of expert representations to encode
the co-author graph, and the degree to which they encode academic rank. We
compare latent, continuous representations created using methods based on
distributional semantics (LSI), topic models (LDA) and neural networks
(word2vec, doc2vec, SERT). Vector spaces created using neural methods, such as
doc2vec and SERT, systematically perform better at clustering than LSI, LDA and
word2vec. When it comes to encoding entity relations, SERT performs best.Comment: ICTIR2017. Proceedings of the 3rd ACM International Conference on the
Theory of Information Retrieval. 201
Off-line vs. On-line Evaluation of Recommender Systems in Small E-commerce
In this paper, we present our work towards comparing on-line and off-line
evaluation metrics in the context of small e-commerce recommender systems.
Recommending on small e-commerce enterprises is rather challenging due to the
lower volume of interactions and low user loyalty, rarely extending beyond a
single session. On the other hand, we usually have to deal with lower volumes
of objects, which are easier to discover by users through various
browsing/searching GUIs.
The main goal of this paper is to determine applicability of off-line
evaluation metrics in learning true usability of recommender systems (evaluated
on-line in A/B testing). In total 800 variants of recommending algorithms were
evaluated off-line w.r.t. 18 metrics covering rating-based, ranking-based,
novelty and diversity evaluation. The off-line results were afterwards compared
with on-line evaluation of 12 selected recommender variants and based on the
results, we tried to learn and utilize an off-line to on-line results
prediction model.
Off-line results shown a great variance in performance w.r.t. different
metrics with the Pareto front covering 68\% of the approaches. Furthermore, we
observed that on-line results are considerably affected by the novelty of
users. On-line metrics correlates positively with ranking-based metrics (AUC,
MRR, nDCG) for novice users, while too high values of diversity and novelty had
a negative impact on the on-line results for them. For users with more visited
items, however, the diversity became more important, while ranking-based
metrics relevance gradually decrease.Comment: Submitted to ACM Hypertext 2020 Conferenc
- …