192 research outputs found
Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems
Modern deep learning-based recommendation systems exploit hundreds to
thousands of different categorical features, each with millions of different
categories ranging from clicks to posts. To respect the natural diversity
within the categorical data, embeddings map each category to a unique dense
representation within an embedded space. Since each categorical feature could
take on as many as tens of millions of different possible categories, the
embedding tables form the primary memory bottleneck during both training and
inference. We propose a novel approach for reducing the embedding size in an
end-to-end fashion by exploiting complementary partitions of the category set
to produce a unique embedding vector for each category without explicit
definition. By storing multiple smaller embedding tables based on each
complementary partition and combining embeddings from each table, we define a
unique embedding for each category at smaller memory cost. This approach may be
interpreted as using a specific fixed codebook to ensure uniqueness of each
category's representation. Our experimental results demonstrate the
effectiveness of our approach over the hashing trick for reducing the size of
the embedding tables in terms of model loss and accuracy, while retaining a
similar reduction in the number of parameters.Comment: 11 pages, 7 figures, 1 tabl
Learning Compact Compositional Embeddings via Regularized Pruning for Recommendation
Latent factor models are the dominant backbones of contemporary recommender
systems (RSs) given their performance advantages, where a unique vector
embedding with a fixed dimensionality (e.g., 128) is required to represent each
entity (commonly a user/item). Due to the large number of users and items on
e-commerce sites, the embedding table is arguably the least memory-efficient
component of RSs. For any lightweight recommender that aims to efficiently
scale with the growing size of users/items or to remain applicable in
resource-constrained settings, existing solutions either reduce the number of
embeddings needed via hashing, or sparsify the full embedding table to switch
off selected embedding dimensions. However, as hash collision arises or
embeddings become overly sparse, especially when adapting to a tighter memory
budget, those lightweight recommenders inevitably have to compromise their
accuracy. To this end, we propose a novel compact embedding framework for RSs,
namely Compositional Embedding with Regularized Pruning (CERP). Specifically,
CERP represents each entity by combining a pair of embeddings from two
independent, substantially smaller meta-embedding tables, which are then
jointly pruned via a learnable element-wise threshold. In addition, we
innovatively design a regularized pruning mechanism in CERP, such that the two
sparsified meta-embedding tables are encouraged to encode information that is
mutually complementary. Given the compatibility with agnostic latent factor
models, we pair CERP with two popular recommendation models for extensive
experiments, where results on two real-world datasets under different memory
budgets demonstrate its superiority against state-of-the-art baselines. The
codebase of CERP is available in https://github.com/xurong-liang/CERP.Comment: Accepted by ICDM'2
Clustering the Sketch: A Novel Approach to Embedding Table Compression
Embedding tables are used by machine learning systems to work with
categorical features. In modern Recommendation Systems, these tables can be
very large, necessitating the development of new methods for fitting them in
memory, even during training. We suggest Clustered Compositional Embeddings
(CCE) which combines clustering-based compression like quantization to
codebooks with dynamic methods like The Hashing Trick and Compositional
Embeddings (Shi et al., 2020). Experimentally CCE achieves the best of both
worlds: The high compression rate of codebook-based quantization, but
*dynamically* like hashing-based methods, so it can be used during training.
Theoretically, we prove that CCE is guaranteed to converge to the optimal
codebook and give a tight bound for the number of iterations required
Recommended from our members
Modeling the Multi-mode Distribution in Self-Supervised Language Models
Self-supervised large language models (LMs) have become a highly-influential and foundational tool for many NLP models. For this reason, their expressivity is an important topic of study. In near-universal practice, given the language context, the model predicts a word from the vocabulary using a single embedded vector representation of both context and dictionary entries. Note that the context sometimes implies that the distribution over predicted words should be multi-modal in embedded space. However, the context’s single-vector representation provably fails to capture such a distribution. To address this limitation, we propose to represent context with multiple vector embeddings, which we term facets. This is distinct from previous work on multi-sense vocabulary embeddings, which employs multiple vectors for the dictionary entries, not the context.
In this dissertation, we first present the theoretical limitations of the single context embedding in LMs and how the theoretical analyses suggest new alternative softmax layers that encode a context as multiple embeddings. The proposed alternatives achieve better perplexity than the mixture of softmax (MoS), especially given an ambiguous context, without adding significant computational cost to LMs. Our approaches also let GPT-2 learn to properly copy the entities from the context, which increases the coherence of the generated text without requiring any labels.
In addition to predicting the next word, we also use multiple CLS embeddings to improve state-of-the-art pretraining methods for BERT on natural language understanding (NLU) benchmarks without introducing significant extra parameters or computations, especially when the training datasets are small. Furthermore, we show that our multi-facet embeddings improve the sequential recommendation, scientific paper embeddings, measurement of sentence similarity, distantly supervised relation extraction, unsupervised text pattern entailment detection, and cold-start citation recommendation. Finally, we use the multiple vector embeddings to predict the future topics of a context, and build on the basis, we propose a novel interactive language generation framework
Hessian-aware Quantized Node Embeddings for Recommendation
Graph Neural Networks (GNNs) have achieved state-of-the-art performance in
recommender systems. Nevertheless, the process of searching and ranking from a
large item corpus usually requires high latency, which limits the widespread
deployment of GNNs in industry-scale applications. To address this issue, many
methods compress user/item representations into the binary embedding space to
reduce space requirements and accelerate inference. Also, they use the
Straight-through Estimator (STE) to prevent vanishing gradients during
back-propagation. However, the STE often causes the gradient mismatch problem,
leading to sub-optimal results.
In this work, we present the Hessian-aware Quantized GNN (HQ-GNN) as an
effective solution for discrete representations of users/items that enable fast
retrieval. HQ-GNN is composed of two components: a GNN encoder for learning
continuous node embeddings and a quantized module for compressing
full-precision embeddings into low-bit ones. Consequently, HQ-GNN benefits from
both lower memory requirements and faster inference speeds compared to vanilla
GNNs. To address the gradient mismatch problem in STE, we further consider the
quantized errors and its second-order derivatives for better stability. The
experimental results on several large-scale datasets show that HQ-GNN achieves
a good balance between latency and performance
Continuous Input Embedding Size Search For Recommender Systems
Latent factor models are the most popular backbones for today's recommender
systems owing to their prominent performance. Latent factor models represent
users and items as real-valued embedding vectors for pairwise similarity
computation, and all embeddings are traditionally restricted to a uniform size
that is relatively large (e.g., 256-dimensional). With the exponentially
expanding user base and item catalog in contemporary e-commerce, this design is
admittedly becoming memory-inefficient. To facilitate lightweight
recommendation, reinforcement learning (RL) has recently opened up
opportunities for identifying varying embedding sizes for different
users/items. However, challenged by search efficiency and learning an optimal
RL policy, existing RL-based methods are restricted to highly discrete,
predefined embedding size choices. This leads to a largely overlooked potential
of introducing finer granularity into embedding sizes to obtain better
recommendation effectiveness under a given memory budget. In this paper, we
propose continuous input embedding size search (CIESS), a novel RL-based method
that operates on a continuous search space with arbitrary embedding sizes to
choose from. In CIESS, we further present an innovative random walk-based
exploration strategy to allow the RL policy to efficiently explore more
candidate embedding sizes and converge to a better decision. CIESS is also
model-agnostic and hence generalizable to a variety of latent factor RSs,
whilst experiments on two real-world datasets have shown state-of-the-art
performance of CIESS under different memory budgets when paired with three
popular recommendation models.Comment: To appear in SIGIR'2
- …