227 research outputs found
An Experimental Evaluation of Datacenter Workloads On Low-Power Embedded Micro Servers
This paper presents a comprehensive evaluation of an ultra-low power cluster, built upon the Intel Edison based micro servers. The improved performance and high energy efficiency of micro servers have driven both academia and industry to explore the possibility of replacing conventional brawny servers with a larger swarm of embedded micro servers. Existing attempts mostly focus on mobile-class micro servers, whose capacities are similar to mobile phones. We, on the other hand, target on sensor-class micro servers, which are originally intended for uses in wearable technologies, sensor networks, and Internet-of-Things. Although sensor-class micro servers have much less capacity, they are touted for minimal power consumption (< 1 Watt), which opens new possibilities of achieving higher energy efficiency in datacenter workloads. Our systematic evaluation of the Edison cluster and comparisons to conventional brawny clusters involve careful workload choosing and laborious parameter tuning, which ensures maximum server utilization and thus fair comparisons. Results show that the Edison cluster achieves up to 3.5× improvement on work-done-per-joule for web service applications and data-intensive MapReduce jobs. In terms of scalability, the Edison cluster scales linearly on the throughput of web service workloads, and also shows satisfactory scalability for MapReduce workloads despite coordination overhead.This research was supported in part by NSF grant 13-20209.Ope
Recommended from our members
QoS-aware mechanisms for improving cost-efficiency of datacenters
Warehouse Scale Computers (WSCs) promise high cost-efficiency by amortizing power, cooling, and management overheads. WSCs today host a large variety of jobs with two broad performance requirements categories: latency-critical (LC) and best-effort (BE). Ideally, to fully utilize all hardware resources, WSC operators can simply fill all the nodes with computing jobs. Unfortunately, because colocated jobs contend for shared resources, systems with high loads often experience performance degradation, which negatively impacts the Quality of Service (QoS) for LC jobs. In fact, service providers usually over-provision resources to avoid any interference with LC jobs, leading to significant resource inefficiencies. In this dissertation, I explore opportunities across different system-abstraction layers to improve the cost-efficiency of dataceters by increasing resource utilization of WSCs with little or no impact on the performance of LC jobs. The dissertation has three main components. First, I explore opportunities to improve the throughput of multicore systems by reducing the performance variation of LC jobs. The main insight is that by reshaping the latency distribution curve, performance headroom of LC jobs can be effectively converted to improved BE throughput. I develop, implement, and evaluate a runtime system that achieves this goal with existing hardware. I leverage the cache partitioning, per-core frequency scaling, and thread masking of server processors. Evaluation results show the proposed solution enables 30% higher system throughput compared to solutions proposed in prior works while maintaining at least as good QoS for LC jobs. Second, I study resource contention in near-future heterogeneous memory architectures (HMA). This study is motivated by recent developments in non-volatile memory (NVM) technologies, which enable higher storage density at the cost of same performance. To understand the performance and QoS impact of HMAs, I design and implement a performance emulator in the Linux kernel that runs unmodified workloads with high accuracy, low overhead, and complete transparency. I further propose and evaluate multiple data and resource management QoS mechanisms, such as locality-aware page admission, occupancy management, and write buffer jailing. Third, I focus on accelerated machine learning (ML) systems. By profiling the performance of production workloads and accelerators, I show that accelerated ML tasks are highly sensitive to main memory interference due to fine-grained interaction between CPU and accelerator tasks. As a result, memory resource contention can significantly decreases the performance and efficiency gains of accelerators. I propose a runtime system that leverages existing hardware capabilities and show 17% higher system efficiency compared to previous approaches. This study further exposes opportunities for future processor architecturesElectrical and Computer Engineerin
SoC-Cluster as an Edge Server: an Application-driven Measurement Study
Huge electricity consumption is a severe issue for edge data centers. To this
end, we propose a new form of edge server, namely SoC-Cluster, that
orchestrates many low-power mobile system-on-chips (SoCs) through an on-chip
network. For the first time, we have developed a concrete SoC-Cluster server
that consists of 60 Qualcomm Snapdragon 865 SoCs in a 2U rack. Such a server
has been commercialized successfully and deployed in large scale on edge
clouds. The current dominant workload on those deployed SoC-Clusters is cloud
gaming, as mobile SoCs can seamlessly run native mobile games.
The primary goal of this work is to demystify whether SoC-Cluster can
efficiently serve more general-purpose, edge-typical workloads. Therefore, we
built a benchmark suite that leverages state-of-the-art libraries for two
killer edge workloads, i.e., video transcoding and deep learning inference. The
benchmark comprehensively reports the performance, power consumption, and other
application-specific metrics. We then performed a thorough measurement study
and directly compared SoC-Cluster with traditional edge servers (with Intel CPU
and NVIDIA GPU) with respect to physical size, electricity, and billing. The
results reveal the advantages of SoC-Cluster, especially its high energy
efficiency and the ability to proportionally scale energy consumption with
various incoming loads, as well as its limitations. The results also provide
insightful implications and valuable guidance to further improve SoC-Cluster
and land it in broader edge scenarios
RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design
Software-defined networking (SDN) and software-defined flash (SDF) have been
serving as the backbone of modern data centers. They are managed separately to
handle I/O requests. At first glance, this is a reasonable design by following
the rack-scale hierarchical design principles. However, it suffers from
suboptimal end-to-end performance, due to the lack of coordination between SDN
and SDF.
In this paper, we co-design the SDN and SDF stack by redefining the functions
of their control plane and data plane, and splitting up them within a new
architecture named RackBlox. RackBlox decouples the storage management
functions of flash-based solid-state drives (SSDs), and allow the SDN to track
and manage the states of SSDs in a rack. Therefore, we can enable the state
sharing between SDN and SDF, and facilitate global storage resource management.
RackBlox has three major components: (1) coordinated I/O scheduling, in which
it dynamically adjusts the I/O scheduling in the storage stack with the
measured and predicted network latency, such that it can coordinate the effort
of I/O scheduling across the network and storage stack for achieving
predictable end-to-end performance; (2) coordinated garbage collection (GC), in
which it will coordinate the GC activities across the SSDs in a rack to
minimize their impact on incoming I/O requests; (3) rack-scale wear leveling,
in which it enables global wear leveling among SSDs in a rack by periodically
swapping data, for achieving improved device lifetime for the entire rack. We
implement RackBlox using programmable SSDs and switch. Our experiments
demonstrate that RackBlox can reduce the tail latency of I/O requests by up to
5.8x over state-of-the-art rack-scale storage systems.Comment: 14 pages. Published in published in ACM SIGOPS 29th Symposium on
Operating Systems Principles (SOSP'23
- …