2 research outputs found
The Architectural Implications of Facebook's DNN-based Personalized Recommendation
The widespread application of deep learning has changed the landscape of
computation in the data center. In particular, personalized recommendation for
content ranking is now largely accomplished leveraging deep neural networks.
However, despite the importance of these models and the amount of compute
cycles they consume, relatively little research attention has been devoted to
systems for recommendation. To facilitate research and to advance the
understanding of these workloads, this paper presents a set of real-world,
production-scale DNNs for personalized recommendation coupled with relevant
performance metrics for evaluation. In addition to releasing a set of
open-source workloads, we conduct in-depth analysis that underpins future
system design and optimization for at-scale recommendation: Inference latency
varies by 60% across three Intel server generations, batching and co-location
of inferences can drastically improve latency-bounded throughput, and the
diverse composition of recommendation models leads to different optimization
strategies.Comment: 11 page
Automatic Microprocessor Performance Bug Detection
Processor design validation and debug is a difficult and complex task, which
consumes the lion's share of the design process. Design bugs that affect
processor performance rather than its functionality are especially difficult to
catch, particularly in new microarchitectures. This is because, unlike
functional bugs, the correct processor performance of new microarchitectures on
complex, long-running benchmarks is typically not deterministically known.
Thus, when performance benchmarking new microarchitectures, performance teams
may assume that the design is correct when the performance of the new
microarchitecture exceeds that of the previous generation, despite significant
performance regressions existing in the design. In this work, we present a
two-stage, machine learning-based methodology that is able to detect the
existence of performance bugs in microprocessors. Our results show that our
best technique detects 91.5% of microprocessor core performance bugs whose
average IPC impact across the studied applications is greater than 1% versus a
bug-free design with zero false positives. When evaluated on memory system
bugs, our technique achieves 100% detection with zero false positives.
Moreover, the detection is automatic, requiring very little performance
engineer time.Comment: 14 pages, 13 figures, to appear in the 27th International Symposium
on High-Performance Computer Architecture (HPCA 2021