The performance of the emerging commercial chip multithreaded multiprocessors is of great importance to the high performance computing community. However, the growing power consumption of such systems is of increasing concern, and techniques that could be effectively used to increase overall system power efficiency while sustaining performance are very desirable. In essence, researchers have recently proposed various mechanisms to achieve this goal for multithreaded applications [2, 3, 4] .
The performance of the emerging commercial chip multithreaded multiprocessors is of great importance to the high performance computing community. However, the growing power consumption of such systems is of increasing concern, and techniques that could be effectively used to increase overall system power efficiency while sustaining performance are very desirable. In essence, researchers have recently proposed various mechanisms to achieve this goal for multithreaded applications [2, 3, 4] .
Previous research has shown that system noise may have a dramatic effect on memory hierarchy and consequently on performance [3, 5] . The effect will be more pronounced with the introduction of larger chip multiprocessors. The competition for resources between highly CPU intensive tasks and the operating system yield opportunities to increase overall system efficiency. Effectively handling the smaller operating system tasks while simultaneously preserving application thread synchronicity, can lead to gains in overall efficiency.
I. PROPOSED SCHEDULERS
The first method of thread management, originally proposed in [4] , masks off a single logical/physical core for operating system tasks only and scales its frequency in order to save power, while running user threads on the remaining cores at maximum frequency. The difference in frequency for cores resembles an asymmetric multiprocessor (AMP) [1] . In the second method proposed here, the system has one core in the system running at its full clock speed performing OS and user tasks while the remaining cores run user threads at lower operating frequencies. The difference in operating frequency between the OS/user core and the other cores helps to ensure that the OS core can maintain synchronicity while being interrupted to perform OS tasks.
II. EXPERIMENTAL RESULTS
We present the results on a two-way dual-core SMTcapable Intel Xeon system. The results are differentiated by a naming scheme made up of three parts. The first part indicates if Hyper-Threading (SMT) is enabled or not, the second part indicates the number of threads that the system is running and the third part shows the number of physically independent chips that the threads are being executed on.
The slowdown and resultant energy savings for both methods for a select group of architectures is shown in Figure   1 . The results for the average improvement for each architecture at their best operating point are detailed in Table  1 , with energy savings based on real power measurements. Although method 2 performs better on average for the non-SMT architectures and at some operating points for SMTs, method 1 is superior in terms of overall average slowdown and energy savings for the SMT architectures. The AMP normalized energy-delay results for each of the configurations with each method are shown in Figure 2 . We see the same trend in energy-delay products as was observed for Figure 1 ; that method 1 is better on average for SMTenabled systems while method 2 is better on average for non-SMT systems.
The improvement that the second method provides over the first could be a direct result of proper synchronization of execution threads with efficient handling of operating system noise. Of course, the speed of the cores is playing an essential role in the performance difference between the two methods. This leads us to the conclusion that the methods studied are effective, but must be carefully managed depending on the system load, operating system noise and real power consumption of the system in any given phase. 
