Thread imbalance is inevitable for multithreaded applications due to the necessity of synchronization
primitives to coordinate access to memory and system resources. This imbalance leads to
a bounding of application performance, but, more importantly for mobile devices, this imbalance
also leads to energy inefficiencies. Recent works have begun to quantify this imbalance and look
to leverage it not only for performance improvements, but for energy savings as well. All these
works, though, test the theory through the use of simulators and power estimation tools. These
results may show that the theory is sound, but the complexities of how a real machine handles synchronization
may lead to diminished results by either having too large of a performance impact,
or too little energy savings. In this work, we implement one such algorithm, PCSLB, and improve
upon it in order to see if the results shown for this technique are feasible for use in real machines.
With the improved algorithm, PCSLB-Max, and the CritScale Linux kernel module, we show that,
in fact, there are energy saving available to us while mitigating the performance