How to effectively use the increasing number of transistors available on a single chip while avoiding the wire delay problem? This is one of the most interesting research questions for the microarchitecture community. We have finally arrived at the point where the time needed for signals to reach the opposite edge of a chip is becoming longer than one cycle. This leads to the impossibility of gaining performance improvements via the scaling of superscalar architectures. One possible solution for using the available transistors efficiently and effectively, while hiding wire delay as much as possible is to parallelize resource usage through resource clustering and decoupling. For example, using on chip multiprocessor architectures is the most natural way to increase performance beyond what we can obtain from a single processor core. A generalization of this concept has led to several solutions for chip multiprocessors. The focus of this paper is to review some recent proposals that employ the clusterization/tiling paradigm, at different extents, in a comparative fashion, and highlight their main features and advantages. Recently, a good number of tiled/clustered architectures have been proposed, indicating that this field is gathering high interest from both academia and industry: WaveScala
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.