Search CORE

7 research outputs found

A Survey of Phase Classification Techniques for Characterizing Variable Application Behavior

Author: Adegbija Tosiron
Criswell Keeley
Publication venue
Publication date: 16/07/2019
Field of study

Adaptable computing is an increasingly important paradigm that specializes system resources to variable application requirements, environmental conditions, or user requirements. Adapting computing resources to variable application requirements (or application phases) is otherwise known as phase-based optimization. Phase-based optimization takes advantage of application phases, or execution intervals of an application, that behave similarly, to enable effective and beneficial adaptability. In order for phase-based optimization to be effective, the phases must first be classified to determine when application phases begin and end, and ensure that system resources are accurately specialized. In this paper, we present a survey of phase classification techniques that have been proposed to exploit the advantages of adaptable computing through phase-based optimization. We focus on recent techniques and classify these techniques with respect to several factors in order to highlight their similarities and differences. We divide the techniques by their major defining characteristics---online/offline and serial/parallel. In addition, we discuss other characteristics such as prediction and detection techniques, the characteristics used for prediction, interval type, etc. We also identify gaps in the state-of-the-art and discuss future research directions to enable and fully exploit the benefits of adaptable computing.Comment: To appear in IEEE Transactions on Parallel and Distributed Systems (TPDS

arXiv.org e-Print Archive

Hybrid Designs for Caches and Cores.

Author: Sleiman Faissal
Publication venue
Publication date
Field of study

Processor power constraints have come to the forefront over the last decade, heralded by the stagnation of clock frequency scaling. High-performance core and cache designs often utilize power-hungry techniques to increase parallelism. Conversely, the most energy-efficient designs opt for a serial execution to avoid unnecessary overheads. While both of these extremes constitute one-size-fits-all approaches, a judicious mix of parallel and serial execution has the potential to achieve the best of both high-performing and energy-efficient designs. This dissertation examines such hybrid designs for cores and caches. Firstly, we introduce a novel, hybrid out-of-order/in-order core microarchitecture. Instructions that are steered towards in-order execution skip register allocation, reordering and dynamic scheduling. At the same time, these instructions can interleave on an instruction-by-instruction basis with instructions that continue to benefit from these conventional out-of-order mechanisms. Secondly, this dissertation revisits a hybrid technique introduced for L1 caches, way-prediction, in the context of last-level caches that are larger, have higher associativity, and experience less locality.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113484/1/sleimanf_1.pd

Deep Blue Documents at the University of Michigan

Recommended from our members

Dynamic Processor Reconfiguration for Power, Performance and Reliability Management

Author: Srinivasan Sudarshan
Publication venue: ScholarWorks@UMass Amherst
Publication date: 10/11/2016
Field of study

Technology advancements allowed more transistors to be packed in a smaller area, while the improved performance helped in achieving higher clock frequencies. This, unfortunately led to a power density problem, forcing processor industry to lower the clock frequency and integrate multiple cores on the same die. Depending on core characteristics, the multiple cores in the die could be symmetric or asymmetric. Asymmetric multi-core processors (AMPs) have been proposed as an alternative to symmetric multi-cores to improve power efficiency. AMPs comprise of cores that implement the same ISA, but differ in performance and power characteristics due to varying sizes of micro-architectural resources. As the computational bottleneck of a workload shifts from one resource to another during its course of execution, reassigning it to another core (where it runs more efficiently), can improve the overall power efficiency. Thus achieving high power efficiency in AMPs requires (i) a diverse set of cores that are optimized for various program phases, (ii) runtime analysis to determine the best core to run on, and (iii) low overhead of re-assigning a thread to a different core type. Decisions to swap threads between AMPs are made at coarse grain granularity of millions of instructions, to mitigate the impact of thread migration overhead. But the computational needs of the program rapidly change during the course of its execution. The best core configuration for an application such that, both power consumption and performance are optimized, changes over time rapidly at fine granularity of thousands of instructions. This dissertation explores ways to design core micro-architecture such that high power efficiency could be achieved, if switching overhead could be lowered, enabling fine grain switching. To take advantage of power saving opportunities at fine grain granularity, this thesis explores reconfigurable/morphable architectures where core resources are reconfigured on demand to suit the needs of the executing application. At first, we explore reconfigurable architectures consisting of two kinds of cores: out-of-order (OOO) big cores and in-order (InO) small cores. The big cores provide higher performance while the small cores are more power efficient. In this proposed architecture, OOO core reconfigures into InO core at run time. Our proposed online management scheme decides to switch between these core types such that we obtain significant power benefits without impacting performance. We also observe that, resource requirements of applications can be quite diverse and consequently, resource bottlenecks or excesses can vary considerably. Thus, reconfiguration between just two core modes may not fully exploit power and performance improvement opportunities. We therefore, explore reconfigurable architectures consisting of diverse core types that not limited to big and little cores. A single core can reconfigure into multiple core modes where each mode has unique power and performance characteristics. Workload performance on a particular core mode depends on a large set of processor resources. Some workloads are highly memory intensive, some exhibit large instruction dependency, some experience high rates of branch mis-prediction, while other workloads exhibit large exploitable instruction level parallelism. A diverse set of core modes is needed, that could address shifting resource needs during various program phases of an application. Different trade-offs in power and performance could be achieved by reducing or expanding the size of various resource. Trade-offs for each core mode are also affected by operating voltage and frequency. We therefore, propose joint core resource resizing with dynamic voltage and frequency scaling (DVFS), which is important for applications whose performance is sensitive to changes in frequency. Thus, at fine granularity, the core should adapt to varying instruction window sizes, execution bandwidth and frequency to meet the demands of the workload at run-time to improve power efficiency. Many current processors employ DVFS aggressively to improve power efficiency and maximize performance. This dissertation studies the tradeoff in power efficiency in using fine grain DVFS and reconfigurable architectures mentioned above.We also explore another important problem due to continued scaling of devices which results in higher vulnerability to soft-errors. We consider dynamic core reconfiguration from the perspectives of both power efficiency and vulnerability to soft-errors. An online management scheme is proposed such that core reconfiguration upon a thread switch not only improves power efficiency but also does not increase the vulnerability to soft errors. In summary, we propose in this thesis several solutions for improving power efficiency by integrating heterogeneity within the core. We also address how popular power reduction techniques like DVFS are comparable to our approach. Finally, we address reliability challenges along with improving power efficiency

ScholarWorks@UMass Amherst

Maximizing heterogeneous processor performance under power constraints

Author: Adileh Almutaz
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

Heterogeneous processor composition: metrics and methods

Author: Tomusk Erik-Arne
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/11/2016
Field of study

Heterogeneous processors intended for mobile devices are composed of a number of different CPU cores that enable the processor to optimize performance under strict power limits that vary over time. Design space exploration techniques can be used to discover a candidate set of potential cores that could be implemented on a heterogeneous processor. However, candidate sets contain far more cores than can feasibly be implemented. Heterogeneous processor composition therefore requires solutions to the selection problem and the evaluation problem. Cores must be selected from the candidate set, and these cores must be shown to be quantitatively superior to alternative selections. The qualitative criterion for a selection of cores is diversity. A diverse set of heterogeneous cores allows a processor to execute tasks with varying dynamic behaviors at a range of power and performance levels that are appropriate for conditions during runtime. This thesis presents a detailed description of the selection and evaluation problems, and establishes a theoretical framework for reasoning about the runtime behavior of power-limited, heterogeneous processors. The evaluation problem is specifically concerned with evaluating the collective attributes of selections of cores rather than evaluating the features of individual cores. A suite of metrics is defined to address the evaluation problem. The metrics quantify considerations that could otherwise only be evaluated subjectively. The selection problem is addressed with an iterative, diversity-preserving algorithm that emphasizes the flexibility available to programs at runtime. The algorithm includes facilities for guiding the selection process with information from an expert, when available. Three variations on the selection algorithm are defined. A thorough analysis of the proposed selection algorithm is presented using data from a large-scale simulation involving 33 benchmarks and 3000 core types. The three variations of the algorithm are compared to each other and to current, state-of-the-art selection techniques. The analysis serves as both an evaluation of the proposed algorithm as well as a case study of the metrics

Edinburgh Research Archive

Architectural Contesting

Author: Eric Rotenberg
Hashem H. Najaf-abadi
Publication venue
Publication date: 28/12/2009
Field of study

This paper presents results showing that workload behavior tends to vary considerably at granularities of less than a thousand instructions. If it were possible to adjust the microarchitecture to suit the workload behavior at such rates, significant single-thread performance enhancement would be achievable. However, previous techniques are too sluggish to be able to effectively respond to such fine-grain change. An approach is proposed that exploits the multicore trend to enable swift adjustment in the employed microarchitecture upon variation in workload behavior. A number of cores that are each custom-designed for optimum performance under a class of workloads concurrently execute code in a leader-follower arrangement. In this manner, effective execution automatically and fluidly transfers to the most suitable microarchitecture as the workload behavior varies. We refer to this approach as architectural contesting. Two-way contesting yields an average speedup of 15 % (maximum speedup of 25%) over a benchmark’s own customized core. The paper also explores the interplay between contesting and the number of core types available in the heterogeneous multi-core. This exposes the broader issue of constrained heterogeneous multi-core design and how it influences, and may be influenced by, contesting. 1

CiteSeerX

Crossref

Architectural contesting

Author: Chen L.
Eric Rotenberg
Hashem H. Najaf-abadi
Huang M.
Yang S.-H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref