98 research outputs found
Efficient Large-Scale Visual Representation Learning
In this article, we present our approach to single-modality visual
representation learning. Understanding visual representations of product
content is vital for recommendations, search, and advertising applications in
e-commerce. We detail and contrast techniques used to fine-tune large-scale
visual representation learning models in an efficient manner under low-resource
settings, including several pretrained backbone architectures, both in the
convolutional neural network as well as the vision transformer family. We
highlight the challenges for e-commerce applications at-scale and highlight the
efforts to more efficiently train, evaluate, and serve visual representations.
We present ablation studies evaluating the representation offline performance
for several downstream tasks, including our visually similar ad
recommendations. To this end, we present a novel text-to-image generative
offline evaluation method for visually similar recommendation systems. Finally,
we include online results from deployed machine learning systems in production
at Etsy
RUEL: Retrieval-Augmented User Representation with Edge Browser Logs for Sequential Recommendation
Online recommender systems (RS) aim to match user needs with the vast amount
of resources available on various platforms. A key challenge is to model user
preferences accurately under the condition of data sparsity. To address this
challenge, some methods have leveraged external user behavior data from
multiple platforms to enrich user representation. However, all of these methods
require a consistent user ID across platforms and ignore the information from
similar users. In this study, we propose RUEL, a novel retrieval-based
sequential recommender that can effectively incorporate external anonymous user
behavior data from Edge browser logs to enhance recommendation. We first
collect and preprocess a large volume of Edge browser logs over a one-year
period and link them to target entities that correspond to candidate items in
recommendation datasets. We then design a contrastive learning framework with a
momentum encoder and a memory bank to retrieve the most relevant and diverse
browsing sequences from the full browsing log based on the semantic similarity
between user representations. After retrieval, we apply an item-level attentive
selector to filter out noisy items and generate refined sequence embeddings for
the final predictor. RUEL is the first method that connects user browsing data
with typical recommendation datasets and can be generalized to various
recommendation scenarios and datasets. We conduct extensive experiments on four
real datasets for sequential recommendation tasks and demonstrate that RUEL
significantly outperforms state-of-the-art baselines. We also conduct ablation
studies and qualitative analysis to validate the effectiveness of each
component of RUEL and provide additional insights into our method.Comment: CIKM 2023 AD
Scaling Law for Recommendation Models: Towards General-purpose User Representations
Recent advancement of large-scale pretrained models such as BERT, GPT-3,
CLIP, and Gopher, has shown astonishing achievements across various task
domains. Unlike vision recognition and language models, studies on
general-purpose user representation at scale still remain underexplored. Here
we explore the possibility of general-purpose user representation learning by
training a universal user encoder at large scales. We demonstrate that the
scaling law is present in user representation learning areas, where the
training error scales as a power-law with the amount of computation. Our
Contrastive Learning User Encoder (CLUE), optimizes task-agnostic objectives,
and the resulting user embeddings stretch our expectation of what is possible
to do in various downstream tasks. CLUE also shows great transferability to
other domains and companies, as performances on an online experiment shows
significant improvements in Click-Through-Rate (CTR). Furthermore, we also
investigate how the model performance is influenced by the scale factors, such
as training data size, model capacity, sequence length, and batch size.
Finally, we discuss the broader impacts of CLUE in general.Comment: Accepted at AAAI 2023. This version includes the technical appendi
Multi-modal Extreme Classification
This paper develops the MUFIN technique for extreme classification (XC) tasks
with millions of labels where datapoints and labels are endowed with visual and
textual descriptors. Applications of MUFIN to product-to-product recommendation
and bid query prediction over several millions of products are presented.
Contemporary multi-modal methods frequently rely on purely embedding-based
methods. On the other hand, XC methods utilize classifier architectures to
offer superior accuracies than embedding only methods but mostly focus on
text-based categorization tasks. MUFIN bridges this gap by reformulating
multi-modal categorization as an XC problem with several millions of labels.
This presents the twin challenges of developing multi-modal architectures that
can offer embeddings sufficiently expressive to allow accurate categorization
over millions of labels; and training and inference routines that scale
logarithmically in the number of labels. MUFIN develops an architecture based
on cross-modal attention and trains it in a modular fashion using pre-training
and positive and negative mining. A novel product-to-product recommendation
dataset MM-AmazonTitles-300K containing over 300K products was curated from
publicly available amazon.com listings with each product endowed with a title
and multiple images. On the all datasets MUFIN offered at least 3% higher
accuracy than leading text-based, image-based and multi-modal techniques. Code
for MUFIN is available at https://github.com/Extreme-classification/MUFI
Big Brother: A Drop-In Website Interaction Logging Service
Fine-grained logging of interactions in user studies is important for studying user behaviour, among other reasons. However, in many research scenarios, the way interactions are logged are usually tied to a monolithic system. We present a generic, application independent service for logging interactions in web-pages, specifically targeting user studies. Our service, Big Brother, can be dropped-in to existing user interfaces with almost no configuration required by researchers. Big Brother has already been used in several user studies to record interactions in a number of user study research scenarios, such as lab-based and crowdsourcing environments. We further demonstrate the ability for Big Brother to scale to very large user studies through benchmarking experiments. Big Brother also provides a number of additional tools for visualising and analysing interactions.
Big Brother significantly lowers the barrier to entry for logging user interactions by providing a minimal but powerful, no configuration necessary, service for researchers and practitioners of user studies that can scale to thousands of concurrent sessions. We have made the source code and releases for Big Brother available for download at https://github.com/hscells/bigbro
- …