After Amdahl's trailblazing work, many other authors proposed analytical
speedup models but none have considered the limiting effect of the memory wall.
These models exploited aspects such as problem-size variation, memory size,
communication overhead, and synchronization overhead, but data-access delays
are assumed to be constant. Nevertheless, such delays can vary, for example,
according to the number of cores used and the ratio between processor and
memory frequencies. Given the large number of possible configurations of
operating frequency and number of cores that current architectures can offer,
suitable speedup models to describe such variations among these configurations
are quite desirable for off-line or on-line scheduling decisions. This work
proposes new parallel speedup models that account for variations of the average
data-access delay to describe the limiting effect of the memory wall on
parallel speedups. Analytical results indicate that the proposed modeling can
capture the desired behavior while experimental hardware results validate the
former. Additionally, we show that when accounting for parameters that reflect
the intrinsic characteristics of the applications, such as degree of
parallelism and susceptibility to the memory wall, our proposal has significant
advantages over machine-learning-based modeling. Moreover, besides being
black-box modeling, our experiments show that conventional machine-learning
modeling needs about one order of magnitude more measurements to reach the same
level of accuracy achieved in our modeling.Comment: 24 page