Search CORE

2 research outputs found

A Quantitative Analysis and Guideline of Data Streaming Accelerator in Intel 4th Gen Xeon Scalable Processors

Author: Hu Jiayu
Jeong Ipoom
Kim Nam Sung
Kuper Reese
Ranganathan Narayan
Wang Ren
Yuan Yifan
Publication venue
Publication date: 31/05/2023
Field of study

As semiconductor power density is no longer constant with the technology process scaling down, modern CPUs are integrating capable data accelerators on chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel Data Streaming Accelerator (DSA) introduced in Intel 4th Generation Xeon Scalable CPUs (Sapphire Rapids). DSA targets data movement operations in memory that are common sources of overhead in datacenter workloads and infrastructure. In addition, it becomes much more versatile by supporting a wider range of operations on streaming data, such as CRC32 calculations, delta record creation/merging, and data integrity field (DIF) operations. This paper sets out to introduce the latest features supported by DSA, deep-dive into its versatility, and analyze its throughput benefits through a comprehensive evaluation. Along with the analysis of its characteristics, and the rich software ecosystem of DSA, we summarize several insights and guidelines for the programmer to make the most out of DSA, and use an in-depth case study of DPDK Vhost to demonstrate how these guidelines benefit a real application

arXiv.org e-Print Archive

Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices

Author: Jeong Ipoom
Kim Nam Sung
Kuper Reese
Sun Yan
Wang Ren
Yu Zeduo
Yuan Yifan
Publication venue
Publication date: 27/03/2023
Field of study

The high demand for memory capacity in modern datacenters has led to multiple lines of innovation in memory expansion and disaggregation. One such effort is Compute eXpress Link (CXL)-based memory expansion, which has gained significant attention. To better leverage CXL memory, researchers have built several emulation and experimental platforms to study its behavior and characteristics. However, due to the lack of commercial hardware supporting CXL memory, the full picture of its capabilities may still be unclear to the community. In this work, we explore CXL memory's performance characterization on a state-of-the-art experimental platform. First, we study the basic performance characteristics of CXL memory using our proposed microbenchmark. Based on our observations and comparisons to standard DRAM connected to local and remote NUMA nodes, we also study the impact of CXL memory on end-to-end applications with different offloading and interleaving policies. Finally, we provide several guidelines for future programmers to realized the full potential of CXL memor

arXiv.org e-Print Archive