4,104 research outputs found
Adaptive, Personalized Diversity for Visual Discovery
Search queries are appropriate when users have explicit intent, but they
perform poorly when the intent is difficult to express or if the user is simply
looking to be inspired. Visual browsing systems allow e-commerce platforms to
address these scenarios while offering the user an engaging shopping
experience. Here we explore extensions in the direction of adaptive
personalization and item diversification within Stream, a new form of visual
browsing and discovery by Amazon. Our system presents the user with a diverse
set of interesting items while adapting to user interactions. Our solution
consists of three components (1) a Bayesian regression model for scoring the
relevance of items while leveraging uncertainty, (2) a submodular
diversification framework that re-ranks the top scoring items based on
category, and (3) personalized category preferences learned from the user's
behavior. When tested on live traffic, our algorithms show a strong lift in
click-through-rate and session duration.Comment: Best Paper Awar
Gamifying Video Object Segmentation
Video object segmentation can be considered as one of the most challenging
computer vision problems. Indeed, so far, no existing solution is able to
effectively deal with the peculiarities of real-world videos, especially in
cases of articulated motion and object occlusions; limitations that appear more
evident when we compare their performance with the human one. However, manually
segmenting objects in videos is largely impractical as it requires a lot of
human time and concentration. To address this problem, in this paper we propose
an interactive video object segmentation method, which exploits, on one hand,
the capability of humans to identify correctly objects in visual scenes, and on
the other hand, the collective human brainpower to solve challenging tasks. In
particular, our method relies on a web game to collect human inputs on object
locations, followed by an accurate segmentation phase achieved by optimizing an
energy function encoding spatial and temporal constraints between object
regions as well as human-provided input. Performance analysis carried out on
challenging video datasets with some users playing the game demonstrated that
our method shows a better trade-off between annotation times and segmentation
accuracy than interactive video annotation and automated video object
segmentation approaches.Comment: Submitted to PAM
Query Representation with Global Consistency on User Click Graph
Extensive research has been conducted on query log analysis. A query log is
generally represented as a bipartite graph on a query set and a URL set. Most
of the traditional methods used the raw click frequency to weigh the link
between a query and a URL on the click graph. In order to address the
disadvantages of raw click frequency, researchers proposed the entropy-biased
model, which incorporates raw click frequency with inverse query frequency of
the URL as the weighting scheme for query representation. In this paper, we
observe that the inverse query frequency can be considered a global property of
the URL on the click graph, which is more informative than raw click frequency,
which can be considered a local property of the URL. Based on this insight, we
develop the global consistency model for query representation, which utilizes
the click frequency and the inverse query frequency of a URL in a consistent
manner. Furthermore, we propose a new scheme called inverse URL frequency as an
effective way to capture the global property of a URL. Experiments have been
conducted on the AOL search engine log data. The result shows that our global
consistency model achieved better performance than the current models.Comment: accepted by Journal of Internet Technology on Sep. 9, 2012. To appear
in Vol. 4, September, 201
Personalized Neural Embeddings for Collaborative Filtering with Text
Collaborative filtering (CF) is a core technique for recommender systems.
Traditional CF approaches exploit user-item relations (e.g., clicks, likes, and
views) only and hence they suffer from the data sparsity issue. Items are
usually associated with unstructured text such as article abstracts and product
reviews. We develop a Personalized Neural Embedding (PNE) framework to exploit
both interactions and words seamlessly. We learn such embeddings of users,
items, and words jointly, and predict user preferences on items based on these
learned representations. PNE estimates the probability that a user will like an
item by two terms---behavior factors and semantic factors. On two real-world
datasets, PNE shows better performance than four state-of-the-art baselines in
terms of three metrics. We also show that PNE learns meaningful word embeddings
by visualization.Comment: NAACL 2019 short papers, oral presentatio
Modeling Perceived Relevance for Tail Queries without Click-Through Data
Click-through data has been used in various ways in Web search such as
estimating relevance between documents and queries. Since only search snippets
are perceived by users before issuing any clicks, the relevance induced by
clicks are usually called \emph{perceived relevance} which has proven to be
quite useful for Web search. While there is plenty of click data for popular
queries, very little information is available for unpopular tail ones. These
tail queries take a large portion of the search volume but search accuracy for
these queries is usually unsatisfactory due to data sparseness such as limited
click information. In this paper, we study the problem of modeling perceived
relevance for queries without click-through data. Instead of relying on users'
click data, we carefully design a set of snippet features and use them to
approximately capture the perceived relevance. We study the effectiveness of
this set of snippet features in two settings: (1) predicting perceived
relevance and (2) enhancing search engine ranking. Experimental results show
that our proposed model is effective to predict the relative perceived
relevance of Web search results. Furthermore, our proposed snippet features are
effective to improve search accuracy for longer tail queries without
click-through data
Sherlock: Sparse Hierarchical Embeddings for Visually-aware One-class Collaborative Filtering
Building successful recommender systems requires uncovering the underlying
dimensions that describe the properties of items as well as users' preferences
toward them. In domains like clothing recommendation, explaining users'
preferences requires modeling the visual appearance of the items in question.
This makes recommendation especially challenging, due to both the complexity
and subtlety of people's 'visual preferences,' as well as the scale and
dimensionality of the data and features involved. Ultimately, a successful
model should be capable of capturing considerable variance across different
categories and styles, while still modeling the commonalities explained by
`global' structures in order to combat the sparsity (e.g. cold-start),
variability, and scale of real-world datasets. Here, we address these
challenges by building such structures to model the visual dimensions across
different product categories. With a novel hierarchical embedding architecture,
our method accounts for both high-level (colorfulness, darkness, etc.) and
subtle (e.g. casualness) visual characteristics simultaneously.Comment: 7 pages, 3 figure
Refining Recency Search Results with User Click Feedback
Traditional machine-learned ranking systems for web search are often trained
to capture stationary relevance of documents to queries, which has limited
ability to track non-stationary user intention in a timely manner. In recency
search, for instance, the relevance of documents to a query on breaking news
often changes significantly over time, requiring effective adaptation to user
intention. In this paper, we focus on recency search and study a number of
algorithms to improve ranking results by leveraging user click feedback. Our
contributions are three-fold. First, we use real search sessions collected in a
random exploration bucket for \emph{reliable} offline evaluation of these
algorithms, which provides an unbiased comparison across algorithms without
online bucket tests. Second, we propose a re-ranking approach to improve search
results for recency queries using user clicks. Third, our empirical comparison
of a dozen algorithms on real-life search data suggests importance of a few
algorithmic choices in these applications, including generalization across
different query-document pairs, specialization to popular queries, and
real-time adaptation of user clicks.Comment: 22 pages, 9 figures, 1 table. A preliminary and shorter version
presented at CIKM-201
Etymo: A New Discovery Engine for AI Research
We present Etymo (https://etymo.io), a discovery engine to facilitate
artificial intelligence (AI) research and development. It aims to help readers
navigate a large number of AI-related papers published every week by using a
novel form of search that finds relevant papers and displays related papers in
a graphical interface. Etymo constructs and maintains an adaptive
similarity-based network of research papers as an all-purpose knowledge graph
for ranking, recommendation, and visualisation. The network is constantly
evolving and can learn from user feedback to adjust itself.Comment: 7 pages, 2 figure
Exploration of gaps in Bitly's spam detection and relevant counter measures
Existence of spam URLs over emails and Online Social Media (OSM) has become a
growing phenomenon. To counter the dissemination issues associated with long
complex URLs in emails and character limit imposed on various OSM (like
Twitter), the concept of URL shortening gained a lot of traction. URL
shorteners take as input a long URL and give a short URL with the same landing
page in return. With its immense popularity over time, it has become a prime
target for the attackers giving them an advantage to conceal malicious content.
Bitly, a leading service in this domain is being exploited heavily to carry out
phishing attacks, work from home scams, pornographic content propagation, etc.
This imposes additional performance pressure on Bitly and other URL shorteners
to be able to detect and take a timely action against the illegitimate content.
In this study, we analyzed a dataset marked as suspicious by Bitly in the month
of October 2013 to highlight some ground issues in their spam detection
mechanism. In addition, we identified some short URL based features and coupled
them with two domain specific features to classify a Bitly URL as malicious /
benign and achieved a maximum accuracy of 86.41%. To the best of our knowledge,
this is the first large scale study to highlight the issues with Bitly's spam
detection policies and proposing a suitable countermeasure
A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems
Between matrix factorization or Random Walk with Restart (RWR), which method
works better for recommender systems? Which method handles explicit or implicit
feedback data better? Does additional information help recommendation?
Recommender systems play an important role in many e-commerce services such as
Amazon and Netflix to recommend new items to a user. Among various
recommendation strategies, collaborative filtering has shown good performance
by using rating patterns of users. Matrix factorization and random walk with
restart are the most representative collaborative filtering methods. However,
it is still unclear which method provides better recommendation performance
despite their extensive utility.
In this paper, we provide a comparative study of matrix factorization and RWR
in recommender systems. We exactly formulate each correspondence of the two
methods according to various tasks in recommendation. Especially, we newly
devise an RWR method using global bias term which corresponds to a matrix
factorization method using biases. We describe details of the two methods in
various aspects of recommendation quality such as how those methods handle
cold-start problem which typically happens in collaborative filtering. We
extensively perform experiments over real-world datasets to evaluate the
performance of each method in terms of various measures. We observe that matrix
factorization performs better with explicit feedback ratings while RWR is
better with implicit ones. We also observe that exploiting global popularities
of items is advantageous in the performance and that side information produces
positive synergy with explicit feedback but gives negative effects with
implicit one.Comment: 10 pages, Appears in IEEE International Conference on Big Data 2017
(IEEE BigData 2017
- …