Abstract — This paper analyzes a range of architectures for efficient delivery of VLIW instructions for embedded media kernels. The analysis takes an efficient Filter Cache as a baseline and examines the benefits from 1) removing the tag overhead, 2) distributing the storage, 3) adding indirection, 4) adding efficient NOP generation, and 5) sharing instruction memory. The result is a hierarchical instruction register organization that provides a 56 % energy and 40 % area savings over an already efficient Filter Cache. Index Terms — energy-efficient embedded processor architecture, hierarchical and distributed instruction register organization, VLIW instruction delivery I
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.