Skip to main content
Article thumbnail
Location of Repository

Generating permutation instructions from a high-level description

By Manikandan Narayanan and Katherine A. Yelick


Fine-grained data parallelism, from media extensions to full streaming or vector instruction sets, offer enormous performance potential, if they can be effectively used from the application level. One critical aspect of their design is the organization of the registers and the generality of operations that move data between registers. In this paper we focus on this data-movement problem and demonstrate that starting with a high-level description of a data-parallel application, we can automatically map certain data-movements in the program onto a regular set of vector permutation instructions. Our approach to the problem is novel and significantly different from existing approaches in commercial vectorizing compilers that generate vector reduction instructions like sum reductions. Our language and compiler are based on StreamIt from MIT, and our target machine is the VIRAM processor from Berkeley. We devise new intermediate representations and operators for analysing data-movements, and demonstrate our technique on two benchmarks. We show that data-movement operations give an enormous performance boost for the benchmarks, and the performance of our technique is close to, and sometimes better than, handcoded assembly

Topics: permutation instructions, StreamIt, VIRAM
Year: 2008
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.