896 research outputs found
A Detailed Analysis of Contemporary ARM and x86 Architectures
RISC vs. CISC wars raged in the 1980s when chip area and processor design complexity were the primary constraints and desktops and servers exclusively dominated the computing landscape. Today, energy and power are the primary design constraints and the computing landscape is significantly different: growth in tablets and smartphones running ARM (a RISC ISA) is surpassing that of desktops and laptops running x86 (a CISC ISA). Further, the traditionally low-power ARM ISA is entering the high-performance server market, while the traditionally high-performance x86 ISA is entering the mobile low-power device market. Thus, the question of whether ISA plays an intrinsic role in performance or energy efficiency is becoming important, and we seek to answer this question through a detailed measurement based study on real hardware running real applications. We analyze measurements on the ARM Cortex-A8 and Cortex-A9 and Intel Atom and Sandybridge i7 microprocessors over workloads spanning mobile, desktop, and server computing. Our methodical investigation demonstrates the role of ISA in modern microprocessors? performance and energy efficiency. We find that ARM and x86 processors are simply engineering design points optimized for different levels of performance, and there is nothing fundamentally more energy efficient in one ISA class or the other. The ISA being RISC or CISC seems irrelevant
Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware
Emerging scale-out workloads require extensive amounts of computational resources. However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the computational density per server and in the per-operation energy. Continuing to improve the computational resources of the cloud while staying within physical constraints mandates optimizing server efficiency to ensure that server hardware closely matches the needs of scale-out workloads. We use performance counters on modern servers to study a wide range of scale-out workloads, finding that today’s predominant processor micro-architecture is inefficient for running these workloads. We find that inefficiency comes from the mismatch between the workload needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core micro-architecture. Moreover, while today’s predominant micro-architecture is inefficient when executing scale-out workloads, we find that continuing the current trends will further exacerbate the inefficiency in the future. In this work, we identify the key micro-architectural needs of scale-out workloads, calling for a change in the trajectory of server processors that would lead to improved computational density and power efficiency in data centers
Recommended from our members
Predictive power management for multi-core processors
textEnergy consumption by computing systems is rapidly increasing due to the growth of data centers and pervasive computing. In 2006 data center energy usage in the United States reached 61 billion kilowatt-hours (KWh) at an annual cost of 4.5 billion USD [Pl08]. It is projected to reach 100 billion KWh by 2011 at a cost of 7.4 billion USD. The nature of energy usage in these systems provides an opportunity to reduce consumption.
Specifically, the power and performance demand of computing systems vary widely in time and across workloads. This has led to the design of dynamically adaptive or power managed systems. At runtime, these systems can be reconfigured to provide optimal performance and power capacity to match workload demand. This causes the system to frequently be over or under provisioned. Similarly, the power demand of the system is difficult to account for. The aggregate power consumption of a system is composed of many heterogeneous systems, each with a unique power consumption characteristic.
This research addresses the problem of when to apply dynamic power management in multi-core processors by accounting for and predicting power and performance demand at the core-level. By tracking performance events at the processor core or thread-level, power consumption can be accounted for at each of the major components of the computing system through empirical, power models. This also provides accounting for individual components within a shared resource such as a power plane or top-level cache. This view of the system exposes the fundamental performance and power phase behavior, thus making prediction possible.
This dissertation also presents an extensive analysis of complete system power accounting for systems and workloads ranging from servers to desktops and laptops. The analysis leads to the development of a simple, effective prediction scheme for controlling power adaptations. The proposed Periodic Power Phase Predictor (PPPP) identifies patterns of activity in multi-core systems and predicts transitions between activity levels. This predictor is shown to increase performance and reduce power consumption compared to reactive, commercial power management schemes by achieving higher average frequency in active phases and lower average frequency in idle phases.Electrical and Computer Engineerin
Fast, Accurate Processor Evaluation Through Heterogeneous, Sample-Based Benchmarking
Performance evaluation is a key task in computing and communication systems. Benchmarking is one of the most common techniques for evaluation purposes, where the performance of a set of representative applications is used to infer system responsiveness in a general usage scenario. Unfortunately, most benchmarking suites are limited to a reduced number of applications, and in some cases, rigid execution configurations. This makes it hard to extrapolate performance metrics for a general-purpose architecture, supposed to have a multi-year lifecycle, running dissimilar applications concurrently. The main culprit of this situation is that current benchmark-derived metrics lack generality, statistical soundness and fail to represent general-purpose environments. Previous attempts to overcome these limitations through random app mixes significantly increase computational cost (workload population shoots up), making the evaluation process barely affordable. To circumvent this problem, in this article we present a more elaborate performance evaluation methodology named BenchCast. Our proposal provides more representative performance metrics,
but with a drastic reduction of computational cost, limiting app execution to a small and representative fraction marked through code annotation. Thanks to this labeling and making use of synchronization techniques, we generate heterogeneous workloads where every app runs simultaneously inside its Region Of Interest, making a few execution seconds highly representative of full application execution
Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware
Emerging scale-out cloud applications need extensive amounts of computational resources. However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the computational density per server and in the per-operation energy use. Therefore, continuing to improve the computational resources of the cloud while staying within physical constraints mandates optimizing server efficiency to ensure that server hardware closely matches the needs of scale-out cloud applications. We use performance counters on modern servers to study a wide range of cloud applications, finding that today’s predominant processor architecture is inefficient for running these workloads. We find that inefficiency comes from the mismatch between the application needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core architecture. Moreover, while today’s predominant architectures are inefficient when executing scale-out cloud applications, we find that the current hardware trends further exacerbate the mismatch. In this work, we identify the key micro-architectural needs of cloud applications, calling for a change in the trajectory of server processors that would lead to improved computational density and power efficiency in data centers
- …