36 research outputs found
Learning Points and Routes to Recommend Trajectories
The problem of recommending tours to travellers is an important and broadly
studied area. Suggested solutions include various approaches of
points-of-interest (POI) recommendation and route planning. We consider the
task of recommending a sequence of POIs, that simultaneously uses information
about POIs and routes. Our approach unifies the treatment of various sources of
information by representing them as features in machine learning algorithms,
enabling us to learn from past behaviour. Information about POIs are used to
learn a POI ranking model that accounts for the start and end points of tours.
Data about previous trajectories are used for learning transition patterns
between POIs that enable us to recommend probable routes. In addition, a
probabilistic model is proposed to combine the results of POI ranking and the
POI to POI transitions. We propose a new F score on pairs of POIs that
capture the order of visits. Empirical results show that our approach improves
on recent methods, and demonstrate that combining points and routes enables
better trajectory recommendations
WordRank: Learning Word Embeddings via Robust Ranking
Embedding words in a vector space has gained a lot of attention in recent
years. While state-of-the-art methods provide efficient computation of word
similarities via a low-dimensional matrix embedding, their motivation is often
left unclear. In this paper, we argue that word embedding can be naturally
viewed as a ranking problem due to the ranking nature of the evaluation
metrics. Then, based on this insight, we propose a novel framework WordRank
that efficiently estimates word representations via robust ranking, in which
the attention mechanism and robustness to noise are readily achieved via the
DCG-like ranking losses. The performance of WordRank is measured in word
similarity and word analogy benchmarks, and the results are compared to the
state-of-the-art word embedding techniques. Our algorithm is very competitive
to the state-of-the- arts on large corpora, while outperforms them by a
significant margin when the training set is limited (i.e., sparse and noisy).
With 17 million tokens, WordRank performs almost as well as existing methods
using 7.2 billion tokens on a popular word similarity benchmark. Our multi-node
distributed implementation of WordRank is publicly available for general usage.Comment: Conference on Empirical Methods in Natural Language Processing
(EMNLP), November 1-5, 2016, Austin, Texas, US
PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation
Highlight detection models are typically trained to identify cues that make
visual content appealing or interesting for the general public, with the
objective of reducing a video to such moments. However, the "interestingness"
of a video segment or image is subjective. Thus, such highlight models provide
results of limited relevance for the individual user. On the other hand,
training one model per user is inefficient and requires large amounts of
personal information which is typically not available. To overcome these
limitations, we present a global ranking model which conditions on each
particular user's interests. Rather than training one model per user, our model
is personalized via its inputs, which allows it to effectively adapt its
predictions, given only a few user-specific examples. To train this model, we
create a large-scale dataset of users and the GIFs they created, giving us an
accurate indication of their interests. Our experiments show that using the
user history substantially improves the prediction accuracy. On our test set of
850 videos, our model improves the recall by 8% with respect to generic
highlight detectors. Furthermore, our method proves more precise than the
user-agnostic baselines even with just one person-specific example.Comment: Accepted for publication at the 2018 ACM Multimedia Conference (MM
'18
Graph-of-Entity: A Model for Combined Data Representation and Retrieval
Managing large volumes of digital documents along with the information they contain, or are associated with, can be challenging. As systems become more intelligent, it increasingly makes sense to power retrieval through all available data, where every lead makes it easier to reach relevant documents or entities. Modern search is heavily powered by structured knowledge, but users still query using keywords or, at the very best, telegraphic natural language. As search becomes increasingly dependent on the integration of text and knowledge, novel approaches for a unified representation of combined data present the opportunity to unlock new ranking strategies. We tackle entity-oriented search using graph-based approaches for representation and retrieval. In particular, we propose the graph-of-entity, a novel approach for indexing combined data, where terms, entities and their relations are jointly represented. We compare the graph-of-entity with the graph-of-word, a text-only model, verifying that, overall, it does not yet achieve a better performance, despite obtaining a higher precision. Our assessment was based on a small subset of the INEX 2009 Wikipedia Collection, created from a sample of 10 topics and respectively judged documents. The offline evaluation we do here is complementary to its counterpart from TREC 2017 OpenSearch track, where, during our participation, we had assessed graph-of-entity in an online setting, through team-draft interleaving