5 research outputs found
Author retrospective for the dual data cache
In this paper we present a retrospective on our paper published in ICS 1995, which to best of our knowledge was the first paper that introduced the concept of a cache memory with multiple subcaches, each tuned for a different type of locality. In this retrospective, we summarize the main ideas of the original paper and outline some of the later work that exploited similar ideas and could have been influenced by our original paper, including two actual industrial microprocessors.Peer ReviewedPostprint (author’s final draft
Application-Specific Memory Subsystems
The disparity in performance between processors and main memories has
led computer architects to incorporate large cache hierarchies in
modern computers. These cache hierarchies are designed to be
general-purpose in that they strive to provide the best possible
performance across a wide range of applications. However, such a memory
subsystem does not necessarily provide the best possible performance for
a particular application.
Although general-purpose memory subsystems are desirable when the
work-load is unknown and the memory subsystem must remain fixed,
when this is not the case a custom memory subsystem may be beneficial.
For example, in an application-specific integrated circuit (ASIC) or
a field-programmable gate array (FPGA) designed to run a particular
application, a custom memory subsystem optimized for that application
would be desirable. In addition, when there are tunable
parameters in the memory subsystem, it may make sense to change these
parameters depending on the application being run. Such a situation
arises today with FPGAs and, to a lesser extent, GPUs, and it is
plausible that general-purpose computers will begin to support
greater flexibility in the memory subsystem in the future.
In this dissertation, we first show that it is possible to create
application-specific memory subsystems that provide much better
performance than a general-purpose memory subsystem. In addition,
we show a way to discover such memory subsystems automatically using
a superoptimization technique on memory address traces gathered
from applications. This allows one to generate a custom memory subsystem
with little effort.
We next show that our memory subsystem superoptimization technique can
be used to optimize for objectives other than performance. As an example,
we show that it is possible to reduce the number of writes to the main
memory, which can be useful for main memories with limited write
durability, such as flash or Phase-Change Memory (PCM).
Finally, we show how to superoptimize memory subsystems for streaming
applications, which are a class of parallel applications. In particular, we
show that, through the use of ScalaPipe, we can author and deploy streaming
applications targeting FPGAs with superoptimized memory subsystems.
ScalaPipe is a domain-specific language (DSL) embedded in the Scala
programming language for generating streaming applications that can be
implemented on CPUs and FPGAs. Using the ScalaPipe implementation, we
are able to demonstrate actual performance improvements using the
superoptimized memory subsystem with applications implemented in hardware
Design and Performance Evaluation of a Cache Assist to implement Selective Caching
Efficient instruction and data caches are extremely important for achieving good performance from modern high performance processors. Conventional cache architectures exploit locality, but do so rather blindly. By forcing all references through a single structure, the cache's effectiveness on many references is reduced. This paper presents a selective caching scheme for improving cache performance, implemented using a cache assist namely the annex cache. Except for filling a main cache at cold start, all entries come to the cache via the annex cache. A block from the annex cache gets swapped with a main cache block only if it has been referenced twice after the conflicting main cache block was referenced. Essentially, low usage items are not allowed to create conflict misses in the main cache. Items referenced only rarely will be excluded from the main cache, eliminating several conflict misses and swaps. The basic premise is that an item deserves to be in the main cache only if it can..