6,890 research outputs found
Learning Tree-based Deep Model for Recommender Systems
Model-based methods for recommender systems have been studied extensively in
recent years. In systems with large corpus, however, the calculation cost for
the learnt model to predict all user-item preferences is tremendous, which
makes full corpus retrieval extremely difficult. To overcome the calculation
barriers, models such as matrix factorization resort to inner product form
(i.e., model user-item preference as the inner product of user, item latent
factors) and indexes to facilitate efficient approximate k-nearest neighbor
searches. However, it still remains challenging to incorporate more expressive
interaction forms between user and item features, e.g., interactions through
deep neural networks, because of the calculation cost.
In this paper, we focus on the problem of introducing arbitrary advanced
models to recommender systems with large corpus. We propose a novel tree-based
method which can provide logarithmic complexity w.r.t. corpus size even with
more expressive models such as deep neural networks. Our main idea is to
predict user interests from coarse to fine by traversing tree nodes in a
top-down fashion and making decisions for each user-node pair. We also show
that the tree structure can be jointly learnt towards better compatibility with
users' interest distribution and hence facilitate both training and prediction.
Experimental evaluations with two large-scale real-world datasets show that the
proposed method significantly outperforms traditional methods. Online A/B test
results in Taobao display advertising platform also demonstrate the
effectiveness of the proposed method in production environments.Comment: Accepted by KDD 201
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Providing systems the ability to relate linguistic and visual content is one
of the hallmarks of computer vision. Tasks such as text-based image retrieval
and image captioning were designed to test this ability but come with
evaluation measures that have a high variance or are difficult to interpret. We
study an alternative task for systems that match text and images: given a text
query, the system is asked to select the image that best matches the query from
a pair of semantically similar images. The system's accuracy on this Binary
Image SelectiON (BISON) task is interpretable, eliminates the reliability
problems of retrieval evaluations, and focuses on the system's ability to
understand fine-grained visual structure. We gather a BISON dataset that
complements the COCO dataset and use it to evaluate modern text-based image
retrieval and image captioning systems. Our results provide novel insights into
the performance of these systems. The COCO-BISON dataset and corresponding
evaluation code are publicly available from \url{http://hexianghu.com/bison/}
Unsupervised Hashing via Similarity Distribution Calibration
Existing unsupervised hashing methods typically adopt a feature similarity
preservation paradigm. As a result, they overlook the intrinsic similarity
capacity discrepancy between the continuous feature and discrete hash code
spaces. Specifically, since the feature similarity distribution is
intrinsically biased (e.g., moderately positive similarity scores on negative
pairs), the hash code similarities of positive and negative pairs often become
inseparable (i.e., the similarity collapse problem). To solve this problem, in
this paper a novel Similarity Distribution Calibration (SDC) method is
introduced. Instead of matching individual pairwise similarity scores, SDC
aligns the hash code similarity distribution towards a calibration distribution
(e.g., beta distribution) with sufficient spread across the entire similarity
capacity/range, to alleviate the similarity collapse problem. Extensive
experiments show that our SDC outperforms the state-of-the-art alternatives on
both coarse category-level and instance-level image retrieval tasks, often by a
large margin. Code is available at https://github.com/kamwoh/sdc
Autoencoding beyond pixels using a learned similarity metric
We present an autoencoder that leverages learned representations to better
measure similarities in data space. By combining a variational autoencoder with
a generative adversarial network we can use learned feature representations in
the GAN discriminator as basis for the VAE reconstruction objective. Thereby,
we replace element-wise errors with feature-wise errors to better capture the
data distribution while offering invariance towards e.g. translation. We apply
our method to images of faces and show that it outperforms VAEs with
element-wise similarity measures in terms of visual fidelity. Moreover, we show
that the method learns an embedding in which high-level abstract visual
features (e.g. wearing glasses) can be modified using simple arithmetic
- …