7 research outputs found

    A Survey of Phase Classification Techniques for Characterizing Variable Application Behavior

    Full text link
    Adaptable computing is an increasingly important paradigm that specializes system resources to variable application requirements, environmental conditions, or user requirements. Adapting computing resources to variable application requirements (or application phases) is otherwise known as phase-based optimization. Phase-based optimization takes advantage of application phases, or execution intervals of an application, that behave similarly, to enable effective and beneficial adaptability. In order for phase-based optimization to be effective, the phases must first be classified to determine when application phases begin and end, and ensure that system resources are accurately specialized. In this paper, we present a survey of phase classification techniques that have been proposed to exploit the advantages of adaptable computing through phase-based optimization. We focus on recent techniques and classify these techniques with respect to several factors in order to highlight their similarities and differences. We divide the techniques by their major defining characteristics---online/offline and serial/parallel. In addition, we discuss other characteristics such as prediction and detection techniques, the characteristics used for prediction, interval type, etc. We also identify gaps in the state-of-the-art and discuss future research directions to enable and fully exploit the benefits of adaptable computing.Comment: To appear in IEEE Transactions on Parallel and Distributed Systems (TPDS

    Hybrid Designs for Caches and Cores.

    Full text link
    Processor power constraints have come to the forefront over the last decade, heralded by the stagnation of clock frequency scaling. High-performance core and cache designs often utilize power-hungry techniques to increase parallelism. Conversely, the most energy-efficient designs opt for a serial execution to avoid unnecessary overheads. While both of these extremes constitute one-size-fits-all approaches, a judicious mix of parallel and serial execution has the potential to achieve the best of both high-performing and energy-efficient designs. This dissertation examines such hybrid designs for cores and caches. Firstly, we introduce a novel, hybrid out-of-order/in-order core microarchitecture. Instructions that are steered towards in-order execution skip register allocation, reordering and dynamic scheduling. At the same time, these instructions can interleave on an instruction-by-instruction basis with instructions that continue to benefit from these conventional out-of-order mechanisms. Secondly, this dissertation revisits a hybrid technique introduced for L1 caches, way-prediction, in the context of last-level caches that are larger, have higher associativity, and experience less locality.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113484/1/sleimanf_1.pd

    Maximizing heterogeneous processor performance under power constraints

    Get PDF

    Heterogeneous processor composition: metrics and methods

    Get PDF
    Heterogeneous processors intended for mobile devices are composed of a number of different CPU cores that enable the processor to optimize performance under strict power limits that vary over time. Design space exploration techniques can be used to discover a candidate set of potential cores that could be implemented on a heterogeneous processor. However, candidate sets contain far more cores than can feasibly be implemented. Heterogeneous processor composition therefore requires solutions to the selection problem and the evaluation problem. Cores must be selected from the candidate set, and these cores must be shown to be quantitatively superior to alternative selections. The qualitative criterion for a selection of cores is diversity. A diverse set of heterogeneous cores allows a processor to execute tasks with varying dynamic behaviors at a range of power and performance levels that are appropriate for conditions during runtime. This thesis presents a detailed description of the selection and evaluation problems, and establishes a theoretical framework for reasoning about the runtime behavior of power-limited, heterogeneous processors. The evaluation problem is specifically concerned with evaluating the collective attributes of selections of cores rather than evaluating the features of individual cores. A suite of metrics is defined to address the evaluation problem. The metrics quantify considerations that could otherwise only be evaluated subjectively. The selection problem is addressed with an iterative, diversity-preserving algorithm that emphasizes the flexibility available to programs at runtime. The algorithm includes facilities for guiding the selection process with information from an expert, when available. Three variations on the selection algorithm are defined. A thorough analysis of the proposed selection algorithm is presented using data from a large-scale simulation involving 33 benchmarks and 3000 core types. The three variations of the algorithm are compared to each other and to current, state-of-the-art selection techniques. The analysis serves as both an evaluation of the proposed algorithm as well as a case study of the metrics

    Architectural Contesting

    No full text
    This paper presents results showing that workload behavior tends to vary considerably at granularities of less than a thousand instructions. If it were possible to adjust the microarchitecture to suit the workload behavior at such rates, significant single-thread performance enhancement would be achievable. However, previous techniques are too sluggish to be able to effectively respond to such fine-grain change. An approach is proposed that exploits the multicore trend to enable swift adjustment in the employed microarchitecture upon variation in workload behavior. A number of cores that are each custom-designed for optimum performance under a class of workloads concurrently execute code in a leader-follower arrangement. In this manner, effective execution automatically and fluidly transfers to the most suitable microarchitecture as the workload behavior varies. We refer to this approach as architectural contesting. Two-way contesting yields an average speedup of 15 % (maximum speedup of 25%) over a benchmark’s own customized core. The paper also explores the interplay between contesting and the number of core types available in the heterogeneous multi-core. This exposes the broader issue of constrained heterogeneous multi-core design and how it influences, and may be influenced by, contesting. 1

    Architectural contesting

    No full text
    corecore