315 research outputs found
When parallel speedups hit the memory wall
After Amdahl's trailblazing work, many other authors proposed analytical
speedup models but none have considered the limiting effect of the memory wall.
These models exploited aspects such as problem-size variation, memory size,
communication overhead, and synchronization overhead, but data-access delays
are assumed to be constant. Nevertheless, such delays can vary, for example,
according to the number of cores used and the ratio between processor and
memory frequencies. Given the large number of possible configurations of
operating frequency and number of cores that current architectures can offer,
suitable speedup models to describe such variations among these configurations
are quite desirable for off-line or on-line scheduling decisions. This work
proposes new parallel speedup models that account for variations of the average
data-access delay to describe the limiting effect of the memory wall on
parallel speedups. Analytical results indicate that the proposed modeling can
capture the desired behavior while experimental hardware results validate the
former. Additionally, we show that when accounting for parameters that reflect
the intrinsic characteristics of the applications, such as degree of
parallelism and susceptibility to the memory wall, our proposal has significant
advantages over machine-learning-based modeling. Moreover, besides being
black-box modeling, our experiments show that conventional machine-learning
modeling needs about one order of magnitude more measurements to reach the same
level of accuracy achieved in our modeling.Comment: 24 page
Limits on Fundamental Limits to Computation
An indispensable part of our lives, computing has also become essential to
industries and governments. Steady improvements in computer hardware have been
supported by periodic doubling of transistor densities in integrated circuits
over the last fifty years. Such Moore scaling now requires increasingly heroic
efforts, stimulating research in alternative hardware and stirring controversy.
To help evaluate emerging technologies and enrich our understanding of
integrated-circuit scaling, we review fundamental limits to computation: in
manufacturing, energy, physical space, design and verification effort, and
algorithms. To outline what is achievable in principle and in practice, we
recall how some limits were circumvented, compare loose and tight limits. We
also point out that engineering difficulties encountered by emerging
technologies may indicate yet-unknown limits.Comment: 15 pages, 4 figures, 1 tabl
Speedup and efficiency of computational parallelization: A unifying approach and asymptotic analysis
In high performance computing environments, we observe an ongoing increase in
the available numbers of cores. This development calls for re-emphasizing
performance (scalability) analysis and speedup laws as suggested in the
literature (e.g., Amdahl's law and Gustafson's law), with a focus on asymptotic
performance. Understanding speedup and efficiency issues of algorithmic
parallelism is useful for several purposes, including the optimization of
system operations, temporal predictions on the execution of a program, and the
analysis of asymptotic properties and the determination of speedup bounds.
However, the literature is fragmented and shows a large diversity and
heterogeneity of speedup models and laws. These phenomena make it challenging
to obtain an overview of the models and their relationships, to identify the
determinants of performance in a given algorithmic and computational context,
and, finally, to determine the applicability of performance models and laws to
a particular parallel computing setting. In this work, we provide a generic
speedup (and thus also efficiency) model for homogeneous computing
environments. Our approach generalizes many prominent models suggested in the
literature and allows showing that they can be considered special cases of a
unifying approach. The genericity of the unifying speedup model is achieved
through parameterization. Considering combinations of parameter ranges, we
identify six different asymptotic speedup cases and eight different asymptotic
efficiency cases. Jointly applying these speedup and efficiency cases, we
derive eleven scalability cases, from which we build a scalability typology.
Researchers can draw upon our typology to classify their speedup model and to
determine the asymptotic behavior when the number of parallel processing units
increases. In addition, our results may be used to address various extensions
of our setting
- …