4 research outputs found
On Characterizing the Data Movement Complexity of Computational DAGs for Parallel Execution
Technology trends are making the cost of data movement increasingly dominant,
both in terms of energy and time, over the cost of performing arithmetic
operations in computer systems. The fundamental ratio of aggregate data
movement bandwidth to the total computational power (also referred to the
machine balance parameter) in parallel computer systems is decreasing. It is
there- fore of considerable importance to characterize the inherent data
movement requirements of parallel algorithms, so that the minimal architectural
balance parameters required to support it on future systems can be well
understood. In this paper, we develop an extension of the well-known red-blue
pebble game to develop lower bounds on the data movement complexity for the
parallel execution of computational directed acyclic graphs (CDAGs) on parallel
systems. We model multi-node multi-core parallel systems, with the total
physical memory distributed across the nodes (that are connected through some
interconnection network) and in a multi-level shared cache hierarchy for
processors within a node. We also develop new techniques for lower bound
characterization of non-homogeneous CDAGs. We demonstrate the use of the
methodology by analyzing the CDAGs of several numerical algorithms, to develop
lower bounds on data movement for their parallel execution
ENERGY-TIME PERFORMANCE OF HETEROGENEOUS COMPUTING SYSTEMS: MODELS AND ANALYSIS
Ph.DDOCTOR OF PHILOSOPH
A theoretical framework for algorithm-architecture co-design
Abstract—We consider the problem of how to enable computer architects and algorithm designers to reason directly and analytically about the relationship between high-level architectural features and algorithm characteristics. We propose a modeling framework designed to help understand the long-term and high-level impacts of algorithmic and technology trends. This model connects abstract communication complexity analysis—with respect to both the inter-core and inter-processor networks and the memory hierarchy—with current technology proposals and projections. We illustrate how one might use the framework by instantiating a particular model for a class of architectures and sample algorithms (threedimensional fast Fourier transforms, matrix multiply, and three-dimensional stencil). Then, as a suggestive demonstration, we analyze a number of what-if scenarios within the model in light of these trends to suggest broader statements and alternative futures for power-constrained architectures and algorithms. I