Program optimization for highly parallel systems has historically been considered an art, with experts doing much of the performance tuning by hand. With the introduction of inexpensive, single-chip, massively parallel platforms, more developers will be creating highly data-parallel applications for these platforms while lacking the substantial experience and knowledge needed to maximize application performance. In addition, hand-optimization even by motivated and informed developers takes a significant amount of time and generally still underutilizes the performance of the hardware by double-digit percentages. This creates a need for structured and automatable optimization techniques that are capable of finding a near-optimal program configuration for this new class of architecture. My work discusses various strategies for optimizing programs on a highly dataparallel architecture with fine-grained sharing of resources. I first investigate useful strategies in optimizing a suite of applications. I then introduce progra
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.