2 research outputs found
Speedup and efficiency of computational parallelization: A unifying approach and asymptotic analysis
In high performance computing environments, we observe an ongoing increase in
the available numbers of cores. This development calls for re-emphasizing
performance (scalability) analysis and speedup laws as suggested in the
literature (e.g., Amdahl's law and Gustafson's law), with a focus on asymptotic
performance. Understanding speedup and efficiency issues of algorithmic
parallelism is useful for several purposes, including the optimization of
system operations, temporal predictions on the execution of a program, and the
analysis of asymptotic properties and the determination of speedup bounds.
However, the literature is fragmented and shows a large diversity and
heterogeneity of speedup models and laws. These phenomena make it challenging
to obtain an overview of the models and their relationships, to identify the
determinants of performance in a given algorithmic and computational context,
and, finally, to determine the applicability of performance models and laws to
a particular parallel computing setting. In this work, we provide a generic
speedup (and thus also efficiency) model for homogeneous computing
environments. Our approach generalizes many prominent models suggested in the
literature and allows showing that they can be considered special cases of a
unifying approach. The genericity of the unifying speedup model is achieved
through parameterization. Considering combinations of parameter ranges, we
identify six different asymptotic speedup cases and eight different asymptotic
efficiency cases. Jointly applying these speedup and efficiency cases, we
derive eleven scalability cases, from which we build a scalability typology.
Researchers can draw upon our typology to classify their speedup model and to
determine the asymptotic behavior when the number of parallel processing units
increases. In addition, our results may be used to address various extensions
of our setting
Investigation into scalable energy and performance models for many-core systems
PhD ThesisIt is likely that many-core processor systems will continue to penetrate
emerging embedded and high-performance applications. Scalable energy and
performance models are two critical aspects that provide insights into the
conflicting trade-offs between them with growing hardware and software
complexity. Traditional performance models, such as Amdahl’s Law,
Gustafson’s and Sun-Ni’s, have helped the research community and industry
to better understand the system performance bounds with given processing
resources, which is otherwise known as speedup. However, these models and
their existing extensions have limited applicability for energy and/or
performance-driven system optimization in practical systems. For instance,
these are typically based on software characteristics, assuming ideal and
homogeneous hardware platforms or limited forms of processor
heterogeneity. In addition, the measurement of speedup and parallelization
factors of an application running on a specific hardware platform require
instrumenting the original software codes. Indeed, practical speedup and
parallelizability models of application workloads running on modern
heterogeneous hardware are critical for energy and performance models, as
they can be used to inform design and control decisions with an aim to
improve system throughput and energy efficiency.
This thesis addresses the limitations by firstly developing novel and
scalable speedup and energy consumption models based on a more general
representation of heterogeneity, referred to as the normal form heterogeneity.
A method is developed whereby standard performance counters found in
modern many-core platforms can be used to derive speedup, and therefore
the parallelizability of the software, without instrumenting applications. This
extends the usability of the new models to scenarios where the
parallelizability of software is unknown, leading to potentially Run-Time
Management (RTM) speedup and/or energy efficiency optimization. The
models and optimization methods presented in this thesis are validated
through extensive experimentation, by running a number of different
applications in wide-ranging concurrency scenarios on a number of different
homogeneous and heterogeneous Multi/Many Core Processor (M/MCP)
systems. These include homogeneous and heterogeneous architectures and
viii
range from existing off-the-shelf platforms to potential future system
extensions. The practical use of these models and methods is demonstrated
through real examples such as studying the effectiveness of the system load
balancer.
The models and methodologies proposed in this thesis provide guidance to
a new opportunities for improving the energy efficiency of M/MCP systemsHigher Committee of Education Development
(HCED) in Ira