potential performance available in a modern microprocessor. For example, on an Intel Core 2 class workstation—a microprocessor capable of executing 4 instructions per cycle—the average instructions per cycle for the single-threaded DaCapo benchmark suite is 0.98. In other words, even in the multi-core era, there is still enormous potential to increase single-core program performance. Recent work in PLDI has focused on multi-core performance: by our count PLDI’10 has around 6 papers on multi-core optimizations and PLDI’11 has 6 papers as well. We believe this aggressive focus on multi-core may miss a critical trend in computing environments: power, or performance per watt. The issues of power consumption and thermal dissipation are now major limiting factors in performance, even in environments with unconstrained power and cooling systems (e.g., desktops or servers). However, power consumption and performance per watt are even more critical in mobile computing (e.g., phones or tablets) and data centers, which are increasingly important computing environments. We argue that, while research on multi-core optimizations is valuable, improvements in single-core performance via improved resource utilization are key to increasing performance and doing so with minimal impact on power consumption. Increasing Performance Per Watt via SIMD. In order to understand the performance and efficiency space better we examine two approaches, the Thread Level Parallelism (TLP) approach using OpenMP and the Single Instruction Multiple Data (SIMD) approach using Intel SSE Intrinsics. Using an Intel Core 2 series workstation with4cores and a SIMD vector width of128 bits (4 integers) we optimized the following simple loop, which is amenable to both the TLP and SIMD approaches
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.