Search CORE

2 research outputs found

The Architectural Implications of Facebook's DNN-based Personalized Recommendation

Author: Brooks David
Cottel Bradford
Gupta Udit
Hazelwood Kim
Jia Bill
Lee Hsien-Hsin S.
Malevich Andrey
Mudigere Dheevatsa
Naumov Maxim
Reagen Brandon
Smelyanskiy Mikhail
Wang Xiaodong
Wu Carole-Jean
Xiong Liang
Zhang Xuan
Publication venue
Publication date: 15/02/2020
Field of study

The widespread application of deep learning has changed the landscape of computation in the data center. In particular, personalized recommendation for content ranking is now largely accomplished leveraging deep neural networks. However, despite the importance of these models and the amount of compute cycles they consume, relatively little research attention has been devoted to systems for recommendation. To facilitate research and to advance the understanding of these workloads, this paper presents a set of real-world, production-scale DNNs for personalized recommendation coupled with relevant performance metrics for evaluation. In addition to releasing a set of open-source workloads, we conduct in-depth analysis that underpins future system design and optimization for at-scale recommendation: Inference latency varies by 60% across three Intel server generations, batching and co-location of inferences can drastically improve latency-bounded throughput, and the diverse composition of recommendation models leads to different optimization strategies.Comment: 11 page

arXiv.org e-Print Archive

Automatic Microprocessor Performance Bug Detection

Author: Barboza Erick Carvajal
Gratz Paul
Hu Jiang
Jacob Sara
Ketkar Mahesh
Kishinevsky Michael
Publication venue
Publication date: 19/11/2020
Field of study

Processor design validation and debug is a difficult and complex task, which consumes the lion's share of the design process. Design bugs that affect processor performance rather than its functionality are especially difficult to catch, particularly in new microarchitectures. This is because, unlike functional bugs, the correct processor performance of new microarchitectures on complex, long-running benchmarks is typically not deterministically known. Thus, when performance benchmarking new microarchitectures, performance teams may assume that the design is correct when the performance of the new microarchitecture exceeds that of the previous generation, despite significant performance regressions existing in the design. In this work, we present a two-stage, machine learning-based methodology that is able to detect the existence of performance bugs in microprocessors. Our results show that our best technique detects 91.5% of microprocessor core performance bugs whose average IPC impact across the studied applications is greater than 1% versus a bug-free design with zero false positives. When evaluated on memory system bugs, our technique achieves 100% detection with zero false positives. Moreover, the detection is automatic, requiring very little performance engineer time.Comment: 14 pages, 13 figures, to appear in the 27th International Symposium on High-Performance Computer Architecture (HPCA 2021

arXiv.org e-Print Archive