We propose the first Reversible Coherence Protocol (RCP), a new protocol
designed from ground up that enables invisible speculative load. RCP takes a
bold approach by including the speculative loads and merge/purge operation in
the interface between processor and cache coherence, and allowing them to
participate in the coherence protocol. It means, speculative load, ordinary
load/store, and merge/purge can all affect the state of a given cache line. RCP
is the first coherence protocol that enables the commit and squash of the
speculative load among distributed cache components in a general memory
hierarchy. RCP incurs an average slowdown of (3.0%,8.3%,7.4%) on
(SPEC2006,SPEC2017,PARSEC), which is lower compared to (26.5%,12%,18.3%) in
InvisiSpec and (3.2%,9.4%,24.2%) in CleanupSpec. The coherence traffic overhead
is on average 46%, compared to 40% and 27% of InvisiSpec and CleanupSpec,
respectively. Even with higher traffic overhead (~46%), the performance
overhead of RCP is lower than InvisiSpec and comparable to CleanupSpec. It
reveals a key advantage of RCP: the coherence actions triggered by the merge
and purge operations are not in the critical path of the execution and can be
performed in the cache hierarchy concurrently with processor executio