Search CORE

541 research outputs found

Workload characterization and synthesis for data center optimization

Author: Polfliet Stijn
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2013
Field of study

MLPerf Inference Benchmark

Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.Comment: ISCA 202

arXiv.org e-Print Archive

Crossref

Oracle SuperCluster: Taking Oracle Clustered Engineering Systems to the Next Level

Author: Aaqib Kerawalla, Arvind Kanojiya
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/06/2015
Field of study

Oracle's Super Cluster is robust and coherent Oracle Database and application environment. Oracle SuperCluster is an engineered and homogeneous server, with storage, consistent networking and software system which provides extreme end-to-end database, application capacity also minimal initial, ongoing assist and maintenance effort and convolution at the low total cost of possession. It is ideal for Oracle Database that is best for Oracle application customers who need to maximize return on the software investments, increase their IT agility and improve the application usability and overall IT productivity. DOI: 10.17762/ijritcc2321-8169.150610

International Journal on Recent and Innovation Trends in Computing and Communication

Workload Behavior Driven Memory Subsystem Design for Hyperscale

Author: Dhanotia Abhishek
Mahar Suyash
Shu Wei
Wang Hao
Publication venue
Publication date: 02/05/2023
Field of study

Hyperscalars run services across a large fleet of servers, serving billions of users worldwide. These services, however, behave differently than commonly available benchmark suites, resulting in server architectures that are not optimized for cloud workloads. With datacenters becoming a primary server processor market, optimizing server processors for cloud workloads by better understanding their behavior has become crucial. To address this, in this paper, we present MemProf, a memory profiler that profiles the three major reasons for stalls in cloud workloads: code-fetch, memory bandwidth, and memory latency. We use MemProf to understand the behavior of cloud workloads and propose and evaluate micro-architectural and memory system design improvements that help cloud workloads' performance. MemProf's code analysis shows that cloud workloads execute the same code across CPU cores. Using this, we propose shared micro-architectural structures--a shared L2 I-TLB and a shared L2 cache. Next, to help with memory bandwidth stalls, using workloads' memory bandwidth distribution, we find that only a few pages contribute to most of the system bandwidth. We use this finding to evaluate a new high-bandwidth, small-capacity memory tier and show that it performs 1.46x better than the current baseline configuration. Finally, we look into ways to improve memory latency for cloud workloads. Profiling using MemProf reveals that L2 hardware prefetchers, a common solution to reduce memory latency, have very low coverage and consume a significant amount of memory bandwidth. To help improve hardware prefetcher performance, we built a memory tracing tool to collect and validate production memory access traces

arXiv.org e-Print Archive

SoC-Cluster as an Edge Server: an Application-driven Measurement Study

Author: Chen Chenyang
Fu Zhe
Lai Rujin
Li Xiang
Ma Xiao
Shi Boqing
Wang Shangguang
Xu Mengwei
Zhang Li
Zhou Ao
Publication venue
Publication date: 24/12/2022
Field of study

Huge electricity consumption is a severe issue for edge data centers. To this end, we propose a new form of edge server, namely SoC-Cluster, that orchestrates many low-power mobile system-on-chips (SoCs) through an on-chip network. For the first time, we have developed a concrete SoC-Cluster server that consists of 60 Qualcomm Snapdragon 865 SoCs in a 2U rack. Such a server has been commercialized successfully and deployed in large scale on edge clouds. The current dominant workload on those deployed SoC-Clusters is cloud gaming, as mobile SoCs can seamlessly run native mobile games. The primary goal of this work is to demystify whether SoC-Cluster can efficiently serve more general-purpose, edge-typical workloads. Therefore, we built a benchmark suite that leverages state-of-the-art libraries for two killer edge workloads, i.e., video transcoding and deep learning inference. The benchmark comprehensively reports the performance, power consumption, and other application-specific metrics. We then performed a thorough measurement study and directly compared SoC-Cluster with traditional edge servers (with Intel CPU and NVIDIA GPU) with respect to physical size, electricity, and billing. The results reveal the advantages of SoC-Cluster, especially its high energy efficiency and the ability to proportionally scale energy consumption with various incoming loads, as well as its limitations. The results also provide insightful implications and valuable guidance to further improve SoC-Cluster and land it in broader edge scenarios

arXiv.org e-Print Archive

An Analysis for Evaluating the Cost/Profit Effectiveness of Parallel Systems

Author: Teran Maria
Publication venue: Scholars Junction
Publication date: 13/12/2002
Field of study

A new domain of commercial applications demands the development of inexpensive parallel computing platforms to lower the cost of operations and increase the business profit. The calculation of returns on an IT investment is now important to justify the decision of upgrading or replacing parallel systems. This thesis presents a framework of the performance and economic factors that are considered when evaluating a parallel system. We introduce a metric called the cost/profit effective metric, which measures the effectiveness of a parallel system in terms of performance, cost and profit. This metric describes the profit obtained from the performance of three different domains for scaling: speed-up, throughput and/or scale-up. Cost is measured by the actual costs of a parallel system. We present two cases of study to demonstrate the application of this metric and analyze the results to support the evaluation of the parallel system on each case

Scholars Junction - Mississippi State University Institutional Repository

Energy Efficient Servers

Author: Gough Corey
Saunders Winston
Steiner Ian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2020
Field of study

Computer scienc

Directory of Open Access Books (DOAB)