Performance models that statically predict the steady-state throughput of basic blocks on particular microarchitectures, such as IACA,
Ithemal, llvm-mca, OSACA, or CQA, can guide optimizing compilers and aid manual software optimization. However, their utility
heavily depends on the accuracy of their predictions. The average
error of existing models compared to measurements on the actual
hardware has been shown to lie between 9% and 36%. But how
good is this? To answer this question, we propose an extremely
simple analytical throughput model that may serve as a baseline.
Surprisingly, this model is already competitive with the state of the
art, indicating that there is significant potential for improvement.
To explore this potential, we develop a simulation-based throughput predictor. To this end, we propose a detailed parametric pipeline
model that supports all Intel Core microarchitecture generations
released between 2011 and 2021. We evaluate our predictor on an
improved version of the BHive benchmark suite and show that
its predictions are usually within 1% of measurement results, improving upon prior models by roughly an order of magnitude. The
experimental evaluation also demonstrates that several microarchitectural details considered to be rather insignificant in previous
work, are in fact essential for accurate prediction.
Our throughput predictor is available as open source