Search CORE

226 research outputs found

Recommended from our members

Super-Efficient Cross-Correlation (SEC-C): A Fast Matched Filtering Code Suitable for Desktop Computers

Author: Funning Gareth J
Keogh Eamonn
Mueen Abdullah
Shakibay Senobari Nader
Yeh Chin-Chia Michael
Zhu Yan
Zimmerman Zachary
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

eScholarship - University of California

Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation

Author: Cai Yiwei
Chen Huiyuan
Lai Vivian
Xu Minghua
Yang Hao
Yeh Chin-Chia Michael
Publication venue
Publication date: 20/08/2023
Field of study

Transformer and its variants are a powerful class of architectures for sequential recommendation, owing to their ability of capturing a user's dynamic interests from their past interactions. Despite their success, Transformer-based models often require the optimization of a large number of parameters, making them difficult to train from sparse data in sequential recommendation. To address the problem of data sparsity, previous studies have utilized self-supervised learning to enhance Transformers, such as pre-training embeddings from item attributes or contrastive data augmentations. However, these approaches encounter several training issues, including initialization sensitivity, manual data augmentations, and large batch-size memory bottlenecks. In this work, we investigate Transformers from the perspective of loss geometry, aiming to enhance the models' data efficiency and generalization in sequential recommendation. We observe that Transformers (e.g., SASRec) can converge to extremely sharp local minima if not adequately regularized. Inspired by the recent Sharpness-Aware Minimization (SAM), we propose SAMRec, which significantly improves the accuracy and robustness of sequential recommendation. SAMRec performs comparably to state-of-the-art self-supervised Transformers, such as S

^3

Rec and CL4SRec, without the need for pre-training or strong data augmentations

arXiv.org e-Print Archive

EmbeddingTree: Hierarchical Exploration of Entity Features in Embedding

Author: Chen Huiyuan
Fan Yujie
Wang Junpeng
Wang Liang
Yeh Chin-Chia Michael
Zhang Wei
Zheng Yan
Publication venue
Publication date: 02/08/2023
Field of study

Embedding learning transforms discrete data entities into continuous numerical representations, encoding features/properties of the entities. Despite the outstanding performance reported from different embedding learning algorithms, few efforts were devoted to structurally interpreting how features are encoded in the learned embedding space. This work proposes EmbeddingTree, a hierarchical embedding exploration algorithm that relates the semantics of entity features with the less-interpretable embedding vectors. An interactive visualization tool is also developed based on EmbeddingTree to explore high-dimensional embeddings. The tool helps users discover nuance features of data entities, perform feature denoising/injecting in embedding training, and generate embeddings for unseen entities. We demonstrate the efficacy of EmbeddingTree and our visualization tool through embeddings generated for industry-scale merchant data and the public 30Music listening/playlists dataset.Comment: 5 pages, 3 figures, accepted by PacificVis 202

arXiv.org e-Print Archive

TinyKG: Memory-Efficient Training Framework for Knowledge Graph Neural Recommender Systems

Author: Chen Huiyuan
Hu Xia
Li Xiaoting
Yang Hao
Yeh Chin-Chia Michael
Zheng Yan
Zhou Kaixiong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/12/2022
Field of study

There has been an explosion of interest in designing various Knowledge Graph Neural Networks (KGNNs), which achieve state-of-the-art performance and provide great explainability for recommendation. The promising performance is mainly resulting from their capability of capturing high-order proximity messages over the knowledge graphs. However, training KGNNs at scale is challenging due to the high memory usage. In the forward pass, the automatic differentiation engines (\textsl{e.g.}, TensorFlow/PyTorch) generally need to cache all intermediate activation maps in order to compute gradients in the backward pass, which leads to a large GPU memory footprint. Existing work solves this problem by utilizing multi-GPU distributed frameworks. Nonetheless, this poses a practical challenge when seeking to deploy KGNNs in memory-constrained environments, especially for industry-scale graphs. Here we present TinyKG, a memory-efficient GPU-based training framework for KGNNs for the tasks of recommendation. Specifically, TinyKG uses exact activations in the forward pass while storing a quantized version of activations in the GPU buffers. During the backward pass, these low-precision activations are dequantized back to full-precision tensors, in order to compute gradients. To reduce the quantization errors, TinyKG applies a simple yet effective quantization algorithm to compress the activations, which ensures unbiasedness with low variance. As such, the training memory footprint of KGNNs is largely reduced with negligible accuracy loss. To evaluate the performance of our TinyKG, we conduct comprehensive experiments on real-world datasets. We found that our TinyKG with INT2 quantization aggressively reduces the memory footprint of activation maps with

7 \times

, only with

2\%

loss in accuracy, allowing us to deploy KGNNs on memory-constrained devices

arXiv.org e-Print Archive

Masked Graph Transformer for Large-Scale Recommendation

Author: Chen Huiyuan
Lai Vivian
Tong Hanghang
Xu Minghua
Xu Zhe
Yeh Chin-Chia Michael
Zheng Yan
Publication venue
Publication date: 07/05/2024
Field of study

Graph Transformers have garnered significant attention for learning graph-structured data, thanks to their superb ability to capture long-range dependencies among nodes. However, the quadratic space and time complexity hinders the scalability of Graph Transformers, particularly for large-scale recommendation. Here we propose an efficient Masked Graph Transformer, named MGFormer, capable of capturing all-pair interactions among nodes with a linear complexity. To achieve this, we treat all user/item nodes as independent tokens, enhance them with positional embeddings, and feed them into a kernelized attention module. Additionally, we incorporate learnable relative degree information to appropriately reweigh the attentions. Experimental results show the superior performance of our MGFormer, even with a single attention layer

arXiv.org e-Print Archive