226 research outputs found
Recommended from our members
Super-Efficient Cross-Correlation (SEC-C): A Fast Matched Filtering Code Suitable for Desktop Computers
Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation
Transformer and its variants are a powerful class of architectures for
sequential recommendation, owing to their ability of capturing a user's dynamic
interests from their past interactions. Despite their success,
Transformer-based models often require the optimization of a large number of
parameters, making them difficult to train from sparse data in sequential
recommendation. To address the problem of data sparsity, previous studies have
utilized self-supervised learning to enhance Transformers, such as pre-training
embeddings from item attributes or contrastive data augmentations. However,
these approaches encounter several training issues, including initialization
sensitivity, manual data augmentations, and large batch-size memory
bottlenecks.
In this work, we investigate Transformers from the perspective of loss
geometry, aiming to enhance the models' data efficiency and generalization in
sequential recommendation. We observe that Transformers (e.g., SASRec) can
converge to extremely sharp local minima if not adequately regularized.
Inspired by the recent Sharpness-Aware Minimization (SAM), we propose SAMRec,
which significantly improves the accuracy and robustness of sequential
recommendation. SAMRec performs comparably to state-of-the-art self-supervised
Transformers, such as SRec and CL4SRec, without the need for pre-training
or strong data augmentations
EmbeddingTree: Hierarchical Exploration of Entity Features in Embedding
Embedding learning transforms discrete data entities into continuous
numerical representations, encoding features/properties of the entities.
Despite the outstanding performance reported from different embedding learning
algorithms, few efforts were devoted to structurally interpreting how features
are encoded in the learned embedding space. This work proposes EmbeddingTree, a
hierarchical embedding exploration algorithm that relates the semantics of
entity features with the less-interpretable embedding vectors. An interactive
visualization tool is also developed based on EmbeddingTree to explore
high-dimensional embeddings. The tool helps users discover nuance features of
data entities, perform feature denoising/injecting in embedding training, and
generate embeddings for unseen entities. We demonstrate the efficacy of
EmbeddingTree and our visualization tool through embeddings generated for
industry-scale merchant data and the public 30Music listening/playlists
dataset.Comment: 5 pages, 3 figures, accepted by PacificVis 202
TinyKG: Memory-Efficient Training Framework for Knowledge Graph Neural Recommender Systems
There has been an explosion of interest in designing various Knowledge Graph
Neural Networks (KGNNs), which achieve state-of-the-art performance and provide
great explainability for recommendation. The promising performance is mainly
resulting from their capability of capturing high-order proximity messages over
the knowledge graphs. However, training KGNNs at scale is challenging due to
the high memory usage. In the forward pass, the automatic differentiation
engines (\textsl{e.g.}, TensorFlow/PyTorch) generally need to cache all
intermediate activation maps in order to compute gradients in the backward
pass, which leads to a large GPU memory footprint. Existing work solves this
problem by utilizing multi-GPU distributed frameworks. Nonetheless, this poses
a practical challenge when seeking to deploy KGNNs in memory-constrained
environments, especially for industry-scale graphs.
Here we present TinyKG, a memory-efficient GPU-based training framework for
KGNNs for the tasks of recommendation. Specifically, TinyKG uses exact
activations in the forward pass while storing a quantized version of
activations in the GPU buffers. During the backward pass, these low-precision
activations are dequantized back to full-precision tensors, in order to compute
gradients. To reduce the quantization errors, TinyKG applies a simple yet
effective quantization algorithm to compress the activations, which ensures
unbiasedness with low variance. As such, the training memory footprint of KGNNs
is largely reduced with negligible accuracy loss. To evaluate the performance
of our TinyKG, we conduct comprehensive experiments on real-world datasets. We
found that our TinyKG with INT2 quantization aggressively reduces the memory
footprint of activation maps with , only with loss in accuracy,
allowing us to deploy KGNNs on memory-constrained devices
Masked Graph Transformer for Large-Scale Recommendation
Graph Transformers have garnered significant attention for learning
graph-structured data, thanks to their superb ability to capture long-range
dependencies among nodes. However, the quadratic space and time complexity
hinders the scalability of Graph Transformers, particularly for large-scale
recommendation. Here we propose an efficient Masked Graph Transformer, named
MGFormer, capable of capturing all-pair interactions among nodes with a linear
complexity. To achieve this, we treat all user/item nodes as independent
tokens, enhance them with positional embeddings, and feed them into a
kernelized attention module. Additionally, we incorporate learnable relative
degree information to appropriately reweigh the attentions. Experimental
results show the superior performance of our MGFormer, even with a single
attention layer
- …