38,245 research outputs found
Scalable Semantic Matching of Queries to Ads in Sponsored Search Advertising
Sponsored search represents a major source of revenue for web search engines.
This popular advertising model brings a unique possibility for advertisers to
target users' immediate intent communicated through a search query, usually by
displaying their ads alongside organic search results for queries deemed
relevant to their products or services. However, due to a large number of
unique queries it is challenging for advertisers to identify all such relevant
queries. For this reason search engines often provide a service of advanced
matching, which automatically finds additional relevant queries for advertisers
to bid on. We present a novel advanced matching approach based on the idea of
semantic embeddings of queries and ads. The embeddings were learned using a
large data set of user search sessions, consisting of search queries, clicked
ads and search links, while utilizing contextual information such as dwell time
and skipped ads. To address the large-scale nature of our problem, both in
terms of data and vocabulary size, we propose a novel distributed algorithm for
training of the embeddings. Finally, we present an approach for overcoming a
cold-start problem associated with new ads and queries. We report results of
editorial evaluation and online tests on actual search traffic. The results
show that our approach significantly outperforms baselines in terms of
relevance, coverage, and incremental revenue. Lastly, we open-source learned
query embeddings to be used by researchers in computational advertising and
related fields.Comment: 10 pages, 4 figures, 39th International ACM SIGIR Conference on
Research and Development in Information Retrieval, SIGIR 2016, Pisa, Ital
Collaborative Representation based Classification for Face Recognition
By coding a query sample as a sparse linear combination of all training
samples and then classifying it by evaluating which class leads to the minimal
coding residual, sparse representation based classification (SRC) leads to
interesting results for robust face recognition. It is widely believed that the
l1- norm sparsity constraint on coding coefficients plays a key role in the
success of SRC, while its use of all training samples to collaboratively
represent the query sample is rather ignored. In this paper we discuss how SRC
works, and show that the collaborative representation mechanism used in SRC is
much more crucial to its success of face classification. The SRC is a special
case of collaborative representation based classification (CRC), which has
various instantiations by applying different norms to the coding residual and
coding coefficient. More specifically, the l1 or l2 norm characterization of
coding residual is related to the robustness of CRC to outlier facial pixels,
while the l1 or l2 norm characterization of coding coefficient is related to
the degree of discrimination of facial features. Extensive experiments were
conducted to verify the face recognition accuracy and efficiency of CRC with
different instantiations.Comment: It is a substantial revision of a previous conference paper (L.
Zhang, M. Yang, et al. "Sparse Representation or Collaborative
Representation: Which Helps Face Recognition?" in ICCV 2011
Scalable Image Retrieval by Sparse Product Quantization
Fast Approximate Nearest Neighbor (ANN) search technique for high-dimensional
feature indexing and retrieval is the crux of large-scale image retrieval. A
recent promising technique is Product Quantization, which attempts to index
high-dimensional image features by decomposing the feature space into a
Cartesian product of low dimensional subspaces and quantizing each of them
separately. Despite the promising results reported, their quantization approach
follows the typical hard assignment of traditional quantization methods, which
may result in large quantization errors and thus inferior search performance.
Unlike the existing approaches, in this paper, we propose a novel approach
called Sparse Product Quantization (SPQ) to encoding the high-dimensional
feature vectors into sparse representation. We optimize the sparse
representations of the feature vectors by minimizing their quantization errors,
making the resulting representation is essentially close to the original data
in practice. Experiments show that the proposed SPQ technique is not only able
to compress data, but also an effective encoding technique. We obtain
state-of-the-art results for ANN search on four public image datasets and the
promising results of content-based image retrieval further validate the
efficacy of our proposed method.Comment: 12 page
- …