Addressing bandwidth contention in SMT multicores through scheduling by Feliu-Pérez, Josué et al.
 
Document downloaded from: 
 























 © Owner/Author 2014. This is the author's version of the work. It is posted here for your
personal use. Not for redistribution. The definitive Version of Record was published in ICS





Feliu-Pérez, J.; Sahuquillo Borrás, J.; Petit Martí, SV.; Duato Marín, JF. (2014). Addressing
bandwidth contention in SMT multicores through scheduling. ACM.
doi:10.1145/2597652.2600109.
Addressing Bandwidth Contention in SMT Multicores
Through Scheduling
Josué Feliu, Julio Sahuquillo, Salvador Petit, and José Duato
Department of Computer Engineering




To mitigate the impact of bandwidth contention, which in
some processes can yield to performance degradations up
to 40%, we devise a scheduling algorithm that tackles main
memory and L1 bandwidth contention. Experimental eval-
uation on a real system shows that the proposal achieves an
average speedup by 5% with respect to Linux.
Categories and Subject Descriptors
D.4.1 [Operating Systems]: Process management—Scheduling
Keywords
bandwidth-aware scheduling; bandwidth contention
1. PROPOSED SCHEDULER
Algorithm 1 presents the pseudocode of the devised sched-
uler. It consists of process selection (lines 2-8) and process
allocation (lines 9-12), which deal with main memory and L1
bandwidth contention, respectively, by balancing the mem-
ory requests over the workload execution time and the L1
requests among the L1 caches. Previously, the scheduler
calculates the average main memory transaction rate of the
workload following a similar approach to [2].
In the process selection, the proper set of processes is se-
lected to be run during the following quantum. The process
not executed for longer is always selected to avoid process
starvation. Then, the remaining processes are selected us-
ing the fitness function, which quantifies the gap between the
TRMM required by a given process and the average band-
width remaining for each unallocated hardware thread [2].
In the process allocation, the selected processes are al-
located to the cores. Since the experimental platform imple-
ments dual-threaded cores, the L1 bandwidth can be easily
balanced by sorting the processes according to its TRL1 and
then, reiteratively, assigning the processes with highest and
lowest bandwidth utilization to the same core [1].
2. EXPERIMENTAL EVALUATION
The experimental evaluation is carried out in an Intel
Xeon E5645 processor, with six dual-thread SMT cores, a
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage, and that copies bear this notice and the full ci-
tation on the first page. Copyrights for third-party components of this work must be
honored. For all other uses, contact the owner/author(s). Copyright is held by the
author/owner(s).
ICS’14, June 10–13 2014, Munich, Germany.
ACM 978-1-4503-2642-1/14/06.
http://dx.doi.org/10.1145/2597652.2600109.
Algorithm 1 Bandwidth-Aware Scheduler
Require: Prior calculation of the AVG WK TRMM
1: while there are unfinished processes do
2: Gather TRMM and TRL1 of the processes
3: BWRemain = AVG WK TRMM , CPURemain = #CPUs
4: Select the process p at the process queue head and update
BWRemain and CPURemain
5: while # selected process < #CPUs do
6: Select the processes p that maximizes:
FITNESS(p) = 1∣∣∣∣ BWRemainCPURemain −TRpMM
∣∣∣∣
7: Update BWRemain and CPURemain
8: end while
9: Sort the selected processes in ascending TRL1
10: while there are unallocated processes do
11: Assign the processes Phead and Ptail with maximum and













Figure 1: Speedup relative to Linux.
private L1 cache per core and a shared LLC. The algorithm
has been implemented in a user-level scheduler. To evaluate
the performance of the proposal, a set of ten 24-benchmark
mixes was designed.
Figure 1 presents the speedup the devised scheduler
achieves compared to the Linux scheduler across all the
mixes using the average IPC with 95% confidence intervals.
Results show that the scheduler effectively addresses band-
width contention and improves the Linux performance by
5% on average.
3. ACKNOWLEDGMENTS
This work was supported by the Spanish Ministerio de
Economı́a y Competitividad (MINECO) and Plan E funds,
under Grant TIN2012-38341-C04-01, and by the Intel Early
Career Faculty Honor Program Award.
4. REFERENCES
[1] J. Feliu, J. Sahuquillo, S. Petit, and J. Duato.
L1-Bandwidth Aware Thread Allocation in Multicore
SMT Processors. In PACT, pages 123–132, 2013.
[2] D. Xu, C. Wu, and P.-C. Yew. On mitigating memory
bandwidth contention through bandwidth-aware
scheduling. In PACT, pages 237–248, 2010.
