Near-Data-Processing (NDP) architectures present a promising way to alleviate
data movement costs and can provide significant performance and energy benefits
to parallel applications. Typically, NDP architectures support several NDP
units, each including multiple simple cores placed close to memory. To fully
leverage the benefits of NDP and achieve high performance for parallel
workloads, efficient synchronization among the NDP cores of a system is
necessary. However, supporting synchronization in many NDP systems is
challenging because they lack shared caches and hardware cache coherence
support, which are commonly used for synchronization in multicore systems, and
communication across different NDP units can be expensive.
This paper comprehensively examines the synchronization problem in NDP
systems, and proposes SynCron, an end-to-end synchronization solution for NDP
systems. SynCron adds low-cost hardware support near memory for synchronization
acceleration, and avoids the need for hardware cache coherence support. SynCron
has three components: 1) a specialized cache memory structure to avoid memory
accesses for synchronization and minimize latency overheads, 2) a hierarchical
message-passing communication protocol to minimize expensive communication
across NDP units of the system, and 3) a hardware-only overflow management
scheme to avoid performance degradation when hardware resources for
synchronization tracking are exceeded.
We evaluate SynCron using a variety of parallel workloads, covering various
contention scenarios. SynCron improves performance by 1.27× on average
(up to 1.78×) under high-contention scenarios, and by 1.35× on
average (up to 2.29×) under low-contention real applications, compared
to state-of-the-art approaches. SynCron reduces system energy consumption by
2.08× on average (up to 4.25×).Comment: To appear in the 27th IEEE International Symposium on
High-Performance Computer Architecture (HPCA-27