2 research outputs found
Developing a Recommendation Benchmark for MLPerf Training and Inference
Deep learning-based recommendation models are used pervasively and broadly,
for example, to recommend movies, products, or other information most relevant
to users, in order to enhance the user experience. Among various application
domains which have received significant industry and academia research
attention, such as image classification, object detection, language and speech
translation, the performance of deep learning-based recommendation models is
less well explored, even though recommendation tasks unarguably represent
significant AI inference cycles at large-scale datacenter fleets. To advance
the state of understanding and enable machine learning system development and
optimization for the commerce domain, we aim to define an industry-relevant
recommendation benchmark for the MLPerf Training andInference Suites. The paper
synthesizes the desirable modeling strategies for personalized recommendation
systems. We lay out desirable characteristics of recommendation model
architectures and data sets. We then summarize the discussions and advice from
the MLPerf Recommendation Advisory Board
Understanding Training Efficiency of Deep Learning Recommendation Models at Scale
The use of GPUs has proliferated for machine learning workflows and is now
considered mainstream for many deep learning models. Meanwhile, when training
state-of-the-art personal recommendation models, which consume the highest
number of compute cycles at our large-scale datacenters, the use of GPUs came
with various challenges due to having both compute-intensive and
memory-intensive components. GPU performance and efficiency of these
recommendation models are largely affected by model architecture configurations
such as dense and sparse features, MLP dimensions. Furthermore, these models
often contain large embedding tables that do not fit into limited GPU memory.
The goal of this paper is to explain the intricacies of using GPUs for training
recommendation models, factors affecting hardware efficiency at scale, and
learnings from a new scale-up GPU server design, Zion.Comment: To appear in IEEE International Symposium on High-Performance
Computer Architecture (HPCA 2021