Search CORE

4,104 research outputs found

Adaptive, Personalized Diversity for Visual Discovery

Author: Goodman Mitchell
Hill Daniel
Mohan Vijai
Nassif Houssam
Srinavasan Sriram
Teo Choon Hui
Vishwanathan SVN
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/10/2018
Field of study

Search queries are appropriate when users have explicit intent, but they perform poorly when the intent is difficult to express or if the user is simply looking to be inspired. Visual browsing systems allow e-commerce platforms to address these scenarios while offering the user an engaging shopping experience. Here we explore extensions in the direction of adaptive personalization and item diversification within Stream, a new form of visual browsing and discovery by Amazon. Our system presents the user with a diverse set of interesting items while adapting to user interactions. Our solution consists of three components (1) a Bayesian regression model for scoring the relevance of items while leveraging uncertainty, (2) a submodular diversification framework that re-ranks the top scoring items based on category, and (3) personalized category preferences learned from the user's behavior. When tested on live traffic, our algorithms show a strong lift in click-through-rate and session duration.Comment: Best Paper Awar

arXiv.org e-Print Archive

Gamifying Video Object Segmentation

Author: Giordano Daniela
Palazzo Simone
Spampinato Concetto
Publication venue
Publication date: 05/01/2016
Field of study

Video object segmentation can be considered as one of the most challenging computer vision problems. Indeed, so far, no existing solution is able to effectively deal with the peculiarities of real-world videos, especially in cases of articulated motion and object occlusions; limitations that appear more evident when we compare their performance with the human one. However, manually segmenting objects in videos is largely impractical as it requires a lot of human time and concentration. To address this problem, in this paper we propose an interactive video object segmentation method, which exploits, on one hand, the capability of humans to identify correctly objects in visual scenes, and on the other hand, the collective human brainpower to solve challenging tasks. In particular, our method relies on a web game to collect human inputs on object locations, followed by an accurate segmentation phase achieved by optimizing an energy function encoding spatial and temporal constraints between object regions as well as human-provided input. Performance analysis carried out on challenging video datasets with some users playing the game demonstrated that our method shows a better trade-off between annotation times and segmentation accuracy than interactive video annotation and automated video object segmentation approaches.Comment: Submitted to PAM

arXiv.org e-Print Archive

Query Representation with Global Consistency on User Click Graph

Author: Men Shuqiqiu
Raychoudhury Vaskar
Zhang Daqiang
Zhu Rongbo
Publication venue
Publication date: 25/05/2013
Field of study

Extensive research has been conducted on query log analysis. A query log is generally represented as a bipartite graph on a query set and a URL set. Most of the traditional methods used the raw click frequency to weigh the link between a query and a URL on the click graph. In order to address the disadvantages of raw click frequency, researchers proposed the entropy-biased model, which incorporates raw click frequency with inverse query frequency of the URL as the weighting scheme for query representation. In this paper, we observe that the inverse query frequency can be considered a global property of the URL on the click graph, which is more informative than raw click frequency, which can be considered a local property of the URL. Based on this insight, we develop the global consistency model for query representation, which utilizes the click frequency and the inverse query frequency of a URL in a consistent manner. Furthermore, we propose a new scheme called inverse URL frequency as an effective way to capture the global property of a URL. Experiments have been conducted on the AOL search engine log data. The result shows that our global consistency model achieved better performance than the current models.Comment: accepted by Journal of Internet Technology on Sep. 9, 2012. To appear in Vol. 4, September, 201

arXiv.org e-Print Archive

Personalized Neural Embeddings for Collaborative Filtering with Text

Author: Hu Guangneng
Publication venue
Publication date: 19/03/2019
Field of study

Collaborative filtering (CF) is a core technique for recommender systems. Traditional CF approaches exploit user-item relations (e.g., clicks, likes, and views) only and hence they suffer from the data sparsity issue. Items are usually associated with unstructured text such as article abstracts and product reviews. We develop a Personalized Neural Embedding (PNE) framework to exploit both interactions and words seamlessly. We learn such embeddings of users, items, and words jointly, and predict user preferences on items based on these learned representations. PNE estimates the probability that a user will like an item by two terms---behavior factors and semantic factors. On two real-world datasets, PNE shows better performance than four state-of-the-art baselines in terms of three metrics. We also show that PNE learns meaningful word embeddings by visualization.Comment: NAACL 2019 short papers, oral presentatio

arXiv.org e-Print Archive

Modeling Perceived Relevance for Tail Queries without Click-Through Data

Author: Chang Yi
Kang Changsung
Lin Xiaotong
Tseng Belle
Wang Xuanhui
Publication venue
Publication date: 05/10/2011
Field of study

Click-through data has been used in various ways in Web search such as estimating relevance between documents and queries. Since only search snippets are perceived by users before issuing any clicks, the relevance induced by clicks are usually called \emph{perceived relevance} which has proven to be quite useful for Web search. While there is plenty of click data for popular queries, very little information is available for unpopular tail ones. These tail queries take a large portion of the search volume but search accuracy for these queries is usually unsatisfactory due to data sparseness such as limited click information. In this paper, we study the problem of modeling perceived relevance for queries without click-through data. Instead of relying on users' click data, we carefully design a set of snippet features and use them to approximately capture the perceived relevance. We study the effectiveness of this set of snippet features in two settings: (1) predicting perceived relevance and (2) enhancing search engine ranking. Experimental results show that our proposed model is effective to predict the relative perceived relevance of Web search results. Furthermore, our proposed snippet features are effective to improve search accuracy for longer tail queries without click-through data

arXiv.org e-Print Archive

Sherlock: Sparse Hierarchical Embeddings for Visually-aware One-class Collaborative Filtering

Author: He Ruining
Lin Chunbin
McAuley Julian
Wang Jianguo
Publication venue
Publication date: 20/04/2016
Field of study

Building successful recommender systems requires uncovering the underlying dimensions that describe the properties of items as well as users' preferences toward them. In domains like clothing recommendation, explaining users' preferences requires modeling the visual appearance of the items in question. This makes recommendation especially challenging, due to both the complexity and subtlety of people's 'visual preferences,' as well as the scale and dimensionality of the data and features involved. Ultimately, a successful model should be capable of capturing considerable variance across different categories and styles, while still modeling the commonalities explained by `global' structures in order to combat the sparsity (e.g. cold-start), variability, and scale of real-world datasets. Here, we address these challenges by building such structures to model the visual dimensions across different product categories. With a novel hierarchical embedding architecture, our method accounts for both high-level (colorfulness, darkness, etc.) and subtle (e.g. casualness) visual characteristics simultaneously.Comment: 7 pages, 3 figure

arXiv.org e-Print Archive

Refining Recency Search Results with User Click Feedback

Author: Chang Yi
Chu Wei
Li Lihong
Moon Taesup
Zheng Zhaohui
Publication venue
Publication date: 18/03/2011
Field of study

Traditional machine-learned ranking systems for web search are often trained to capture stationary relevance of documents to queries, which has limited ability to track non-stationary user intention in a timely manner. In recency search, for instance, the relevance of documents to a query on breaking news often changes significantly over time, requiring effective adaptation to user intention. In this paper, we focus on recency search and study a number of algorithms to improve ranking results by leveraging user click feedback. Our contributions are three-fold. First, we use real search sessions collected in a random exploration bucket for \emph{reliable} offline evaluation of these algorithms, which provides an unbiased comparison across algorithms without online bucket tests. Second, we propose a re-ranking approach to improve search results for recency queries using user clicks. Third, our empirical comparison of a dozen algorithms on real-life search data suggests importance of a few algorithmic choices in these applications, including generalization across different query-document pairs, specialization to popular queries, and real-time adaptation of user clicks.Comment: 22 pages, 9 figures, 1 table. A preliminary and shorter version presented at CIKM-201

arXiv.org e-Print Archive

Etymo: A New Discovery Engine for AI Research

Author: Deakin Jonathan
Higham Nicholas J.
Wang Shuaiqiang
Zhang Weijian
Publication venue
Publication date: 25/01/2018
Field of study

We present Etymo (https://etymo.io), a discovery engine to facilitate artificial intelligence (AI) research and development. It aims to help readers navigate a large number of AI-related papers published every week by using a novel form of search that finds relevant papers and displays related papers in a graphical interface. Etymo constructs and maintains an adaptive similarity-based network of research papers as an all-purpose knowledge graph for ranking, recommendation, and visualisation. The network is constantly evolving and can learn from user feedback to adjust itself.Comment: 7 pages, 2 figure

arXiv.org e-Print Archive

Exploration of gaps in Bitly's spam detection and relevant counter measures

Author: Gupta Neha
Kumaraguru Ponnurangam
Publication venue
Publication date: 07/05/2014
Field of study

Existence of spam URLs over emails and Online Social Media (OSM) has become a growing phenomenon. To counter the dissemination issues associated with long complex URLs in emails and character limit imposed on various OSM (like Twitter), the concept of URL shortening gained a lot of traction. URL shorteners take as input a long URL and give a short URL with the same landing page in return. With its immense popularity over time, it has become a prime target for the attackers giving them an advantage to conceal malicious content. Bitly, a leading service in this domain is being exploited heavily to carry out phishing attacks, work from home scams, pornographic content propagation, etc. This imposes additional performance pressure on Bitly and other URL shorteners to be able to detect and take a timely action against the illegitimate content. In this study, we analyzed a dataset marked as suspicious by Bitly in the month of October 2013 to highlight some ground issues in their spam detection mechanism. In addition, we identified some short URL based features and coupled them with two domain specific features to classify a Bitly URL as malicious / benign and achieved a maximum accuracy of 86.41%. To the best of our knowledge, this is the first large scale study to highlight the issues with Bitly's spam detection policies and proposing a suitable countermeasure

arXiv.org e-Print Archive

A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems

Author: Jung Jinhong
Kang U
Park Haekyu
Publication venue
Publication date: 05/11/2017
Field of study

Between matrix factorization or Random Walk with Restart (RWR), which method works better for recommender systems? Which method handles explicit or implicit feedback data better? Does additional information help recommendation? Recommender systems play an important role in many e-commerce services such as Amazon and Netflix to recommend new items to a user. Among various recommendation strategies, collaborative filtering has shown good performance by using rating patterns of users. Matrix factorization and random walk with restart are the most representative collaborative filtering methods. However, it is still unclear which method provides better recommendation performance despite their extensive utility. In this paper, we provide a comparative study of matrix factorization and RWR in recommender systems. We exactly formulate each correspondence of the two methods according to various tasks in recommendation. Especially, we newly devise an RWR method using global bias term which corresponds to a matrix factorization method using biases. We describe details of the two methods in various aspects of recommendation quality such as how those methods handle cold-start problem which typically happens in collaborative filtering. We extensively perform experiments over real-world datasets to evaluate the performance of each method in terms of various measures. We observe that matrix factorization performs better with explicit feedback ratings while RWR is better with implicit ones. We also observe that exploiting global popularities of items is advantageous in the performance and that side information produces positive synergy with explicit feedback but gives negative effects with implicit one.Comment: 10 pages, Appears in IEEE International Conference on Big Data 2017 (IEEE BigData 2017

arXiv.org e-Print Archive