Search CORE

665 research outputs found

An Adaptable Optimal Network Topology Model for Efficient Data Centre Design in Storage Area Networks

Author: Joseph Linda
Rajan Sanju
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/01/2023
Field of study

In this research, we look at how different network topologies affect the energy consumption of modular data centre (DC) setups. We use a combined-input directed approach to assess the benefits of rack-scale and pod-scale fragmentation across a variety of electrical, optoelectronic, and composite network architectures in comparison to a conventional DC. When the optical transport architecture is implemented and the appropriate resource components are distributed, the findings reveal fragmentation at the layer level is adequate, even compared to a pod-scale DC. Composable DCs can operate at peak efficiency because of the optical network topology. Logical separation of conventional DC servers across an optical network architecture is also investigated in this article. When compared to physical decentralisation at the rack size, logical decomposition of data centers inside each rack offers a small decrease in the overall DC energy usage thanks to better resource needs allocation. This allows for a flexible, composable architecture that can accommodate performance based in-memory applications. Moreover, we look at the state of fundamentalmodel and its use in both static and dynamic data centres. According to our findings, typical DCs become more energy efficient when workload modularity increases, although excessive resource use still exists. By enabling optimal resource use and energy savings, disaggregation and micro-services were able to reduce the typical DC's up to 30%. Furthermore, we offer a heuristic to duplicate the Mixed integer model's output trends for energy-efficient allocation of caseloads in modularized DCs

International Journal on Recent and Innovation Trends in Computing and Communication

Venice: Exploring Server Architectures for Effective Resource Sharing

Author: Cui Xiaosong
Dong Jianbo
Hou Rui
Huang Michael
Jiang Tao
McKee Sally A
Wang Haibin
Zhang Lixin
Zhao Boyan
Publication venue
Publication date: 01/01/2016
Field of study

Consolidated server racks are quickly becoming the backbone of IT infrastructure for science, engineering, and business, alike. These servers are still largely built and organized as when they were distributed, individual entities. Given that many fields increasingly rely on analytics of huge datasets, it makes sense to support flexible resource utilization across servers to improve cost-effectiveness and performance. We introduce Venice, a family of data-center server architectures that builds a strong communication substrate as a first-class resource for server chips. Venice provides a diverse set of resource-joining mechanisms that enables user programs to efficiently leverage non-local resources. To better understand the implications of design decisions about system support for resource sharing we have constructed a hardware prototype that allows us to more accurately measure end-to-end performance of at-scale applications and to explore tradeoffs among performance, power, and resource-sharing transparency. We present results from our initial studies analyzing these tradeoffs when sharing memory, accelerators, or NICs. We find that it is particularly important to reduce or hide latency, that data-sharing access patterns should match the features of the communication channels employed, and that inter-channel collaboration can be exploited for better performance

Chalmers Research

Chalmers Publication Library

Towards Power- and Energy-Efficient Datacenters

Author: Hsu Chang-Hong
Publication venue
Publication date
Field of study

As the Internet evolves, cloud computing is now a dominant form of computation in modern lives. Warehouse-scale computers (WSCs), or datacenters, comprising the foundation of this cloud-centric web have been able to deliver satisfactory performance to both the Internet companies and the customers. With the increased focus and popularity of the cloud, however, datacenter loads rise and grow rapidly, and Internet companies are in need of boosted computing capacity to serve such demand. Unfortunately, power and energy are often the major limiting factors prohibiting datacenter growth: it is often the case that no more servers can be added to datacenters without surpassing the capacity of the existing power infrastructure. This dissertation aims to investigate the issues of power and energy usage in a modern datacenter environment. We identify the source of power and energy inefficiency at three levels in a modern datacenter environment and provides insights and solutions to address each of these problems, aiming to prepare datacenters for critical future growth. We start at the datacenter-level and find that the peak provisioning and improper service placement in multi-level power delivery infrastructures fragment the power budget inside production datacenters, degrading the compute capacity the existing infrastructure can support. We find that the heterogeneity among datacenter workloads is key to address this issue and design systematic methods to reduce the fragmentation and improve the utilization of the power budget. This dissertation then narrow the focus to examine the energy usage of individual servers running cloud workloads. Especially, we examine the power management mechanisms employed in these servers and find that the coarse time granularity of these mechanisms is one critical factor that leads to excessive energy consumption. We propose an intelligent and low overhead solution on top of the emerging finer granularity voltage/frequency boosting circuit to effectively pinpoints and boosts queries that are likely to increase the tail distribution and can reap more benefit from the voltage/frequency boost, improving energy efficiency without sacrificing the quality of services. The final focus of this dissertation takes a further step to investigate how using a fundamentally more efficient computing substrate, field programmable gate arrays (FPGAs), benefit datacenter power and energy efficiency. Different from other types of hardware accelerations, FPGAs can be reconfigured on-the-fly to provide fine-grain control over hardware resource allocation and presents a unique set of challenges for optimal workload scheduling and resource allocation. We aim to design a set coordinated algorithms to manage these two key factors simultaneously and fully explore the benefit of deploying FPGAs in the highly varying cloud environment.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144043/1/hsuch_1.pd

Deep Blue Documents at the University of Michigan

A cross-stack, network-centric architectural design for next-generation datacenters

Author: Alian Mohammad
Publication venue
Publication date: 01/08/2020
Field of study

This thesis proposes a full-stack, cross-layer datacenter architecture based on in-network computing and near-memory processing paradigms. The proposed datacenter architecture is built atop two principles: (1) utilizing commodity, off-the-shelf hardware (i.e., processor, DRAM, and network devices) with minimal changes to their architecture, and (2) providing a standard interface to the programmers for using the novel hardware. More specifically, the proposed datacenter architecture enables a smart network adapter to collectively compress/decompress data exchange between distributed DNN training nodes and assist the operating system in performing aggressive processor power management. It also deploys specialized memory modules in the servers, capable of performing general-purpose computation and network connectivity. This thesis unlocks the potentials of hardware and operating system co-design in architecting application-transparent, near-data processing hardware for improving datacenter's performance, energy efficiency, and scalability. We evaluate the proposed datacenter architecture using a combination of full-system simulation, FPGA prototyping, and real-system experiments

Illinois Digital Environment for Access to Learning and Scholarship Repository

SGER: dynamic partitioned global address spaces for future large scale systems

Author: Yalamanchili Sudhakar
Publication venue: Georgia Institute of Technology
Publication date: 01/03/2010
Field of study

Issued as final reportNational Science Foundation (U.S.

Scholarly Materials And Research @ Georgia Tech

Disaggregated Memory Architectures for Blade Servers.

Author: Lim Kevin Te-Ming
Publication venue
Publication date
Field of study

Current trends in memory capacity and power of servers indicate the need for memory system redesign. Memory capacity is projected to grow at a smaller rate relative to the growth in compute capacity, leading to a potential memory capacity wall in future systems. Furthermore, per-server memory demands are increasing due to large-memory applications, virtual machine consolidation, and bigger operating system footprints. The large amount of memory required is leading to memory power being a substantial and growing portion of server power budgets. As these capacity and power trends continue, a new memory architecture is needed that provides increased capacity and maximizes resource efficiency. This thesis presents the design of a disaggregated memory architecture for blade servers that provides expanded memory capacity and dynamic capacity sharing across multiple servers. Unlike traditional architectures that co-locate compute and memory resources, the proposed design disaggregates a portion of the servers’ memory, which is then assembled in separate memory blades optimized for both capacity and power usage. The servers access memory blades through a redesigned memory hierarchy that is extended to include a remote level that augments local memory. Through the shared interconnect of blade enclosures, multiple compute blades can connect to a single memory blade and dynamically share its capacity. This sharing increases resource efficiency by taking advantage of the differing memory utilization patterns of the compute blades. This thesis evaluates two system architectures that provide operating system-transparent access to the memory blade; one uses virtualization and a commodity-based interconnect, and the other uses minor hardware additions and a high-speed interconnect. The ability to extend and share memory can achieve orders of magnitude performance improvements in cases where applications run out of memory capacity, and similar improvements in performance-per-dollar in cases where systems are overprovisioned for peak memory usage. To complement the evaluation, a hypervisor-based prototype of one system architecture is developed. Finally, by extending the principles of disaggregation to both compute and memory resources, new server architectures are proposed for large-scale data centers that can double performance-per-dollar when considering total cost of ownership compared to traditional servers.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/76007/1/ktlim_1.pd

Deep Blue Documents at the University of Michigan

System Design for Intelligent Web Services

Author: Hauswald Johann
Publication venue
Publication date
Field of study

The devices and software systems we interact with on a daily basis are more intelligent than ever. The computing required to deliver these experiences for end-users is hosted in Warehouse Scale Computers (WSC) where intelligent web services are employed to process user images, speech, and text. These intelligent web services are emerging as one of the fastest growing class of web services. Given the expectation of users moving forward is an experience that uses intelligent web services, the demand for this type of processing is only going to increase. However, today’s cloud infrastructures, tuned for traditional workloads such as Web Search and social networks, are not adequately equipped to sustain this increase in demand. This dissertation shows that applications that use intelligent web service processing on the path of a single query require orders of magnitude more computational resources than traditional Web Search. Intelligent web services use large pretrained machine learning models to process image, speech, and text based inputs and generate a prediction. As this dissertation investigates, we find that hosting intelligent web services in today’s infrastructures exposes three critical problems: 1) current infrastructures are computationally inadequate to host this new class of services, 2) system designers are unaware of the bottlenecks exposed by these services and the implications on future designs, 3) the rapid algorithmic churn of these intelligent services deprecates current designs at an even faster rate. This dissertation investigates and addresses each of these problems. After building a representative workload to show the computational resources required by an application composed of three intelligent web services, this dissertation first argues that hardware acceleration is required on the path of a query to sustain demand moving forward. We show that GPU- and FPGA-accelerated servers can improve the query latency on average by 10x and 16x. Leveraging the latency reduction, GPU- and FPGA-accelerated servers reduce the Total Cost of Ownership (TCO) by 2.6x and 1.4x, respectively. Second, we focus on Deep Neural Networks (DNN), a state-of-the- art algorithm for intelligent web services and design a DNN-as-a-Service infrastructure enabling application-agnostic acceleration and single-point of optimization. We identify compute bottlenecks that inform the design of a Graphics Processing Unit (GPU) based system; addressing the compute bottlenecks translates to a throughput improvement of 133x across seven DNN based applications. GPU-enabled datacenters show a TCO improvement over CPU-only designs by 4-20x. Finally, we design a runtime system based on a GPU equipped server that improves current systems accounting for recent advances in intelligent web service algorithms. Specifically, we identify asynchronous processing key for accelerating dynamically configured in- telligent services. We achieve on average 7.6x throughput improvements over an optimized CPU baseline and 2.8x over the current GPU system. By thoroughly addressing these problems, we produce designs for WSCs that are equipped to handle the future demand for intelligent web services. The investigations in this thesis address significant computational bottlenecks and lead to system designs that are more efficient and cost-effective for this new class of web services.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137055/1/jahausw_1.pd

Deep Blue Documents at the University of Michigan

Global value chain breadth and firm productivity:the enhancing effect of Industry 4.0

Author: Bustinza Oscar F.
Marić Josip
Opazo-Basáez Marco
Vendrell-Herrero Ferran
Publication venue: 'Emerald'
Publication date: 06/09/2021
Field of study

Purpose: Global value chains (GVC) incorporate internationally fragmented sources of knowledge so as to increase global competitiveness and performance. This paper sheds light on the role of Industry 4.0 technological capabilities in facilitating knowledge access from international linkages and improving firm productivity. Design/methodology/approach: Drawing on organizational learning research, the present study argues that the relationship between GVC breadth, analyzed in respect to the geographical fragmentation of production facilities and productivity follows an inverted U-shaped pattern that can be explained by the interplay between external knowledge access and the coordination costs associated with GVC breadth. We test our predictions using a purpose-built survey that was carried out among a sample of 426 Spanish manufacturing firms. Findings: Our results indicate that organizations adhering to a traditional manufacturing system are able to benefit from fewer transnational relationships (concretely 11 foreign facilities) in the search for productivity improvements. This can be largely attributed to the marginal value of the knowledge accessed and the costs of coordinating international counterparts' production and knowledge transfer. However, our study reveals that the adoption of Industry 4.0 technologies has the potential to broaden optimal GVC breadth, in terms of the number of linkages to interrelate with (concretely 131 foreign facilities) so as to obtain productivity gains while mitigating the complexities associated with coordination. Originality/value: The study unveils that Industry 4.0 technologies enable management of broader GVC breadth, facilitating knowledge access and counteracting coordination costs from international counterparts. \textcopyright 2021, Emerald Publishing Limited

HAL - Normandie Université

University of Birmingham Research Portal

Repositorio Institucional Universidad de Granada

Edinburgh Research Explorer

Mitigating the Performance-Efficiency Tradeoff in Resilient Memory Disaggregation

Author: Chowdhury Mosharaf
Lee Youngmoon
Maruf Hassan Al
Shin Kang G.
Publication venue
Publication date: 21/10/2019
Field of study

Memory disaggregation has received attention in recent years as a promising idea to reduce the total cost of ownership (TCO) of memory in modern datacenters. However, relying on remote memory expands an application's failure domain and makes it susceptible to tail latency variations. In attempts to making disaggregated memory resilient, stateof-the-art solutions face the classic tradeoff between performance and efficiency: some double the memory overhead of disaggregation by replicating to remote memory, while many others limit performance by replicating to the local disk. We present Hydra, a configurable, erasure-coded resilience mechanism for common memory disaggregation solutions. It can transparently handle uncertainties arising from remote failures, evictions, memory corruptions, and stragglers from network imbalance with a significantly better performance-efficiency tradeoff than the state-of-the-art. We design a fine-tuned data path to achieve single us read/write latency to remote memory, develop decentralized algorithms for cluster-wide memory management, and analyze how to select parameters to mitigate independent and correlated uncertainties. Our integration of Hydra with two major memory disaggregation systems and evaluation on a 50-machine RDMA cluster demonstrates that it achieves the best of both worlds: it improves the latency and throughput of memory-intensive applications by up to 64.78X and 20.61X, respectively, over the state-of-the-art disk backup-based solution. At the same time, it provides performance similar to that of in-memory replication with 1.6X lower memory overhead

arXiv.org e-Print Archive

Recommended from our members

QoS-aware mechanisms for improving cost-efficiency of datacenters

Author: Zhu Haishan
Publication venue
Publication date: 22/01/2021
Field of study

Warehouse Scale Computers (WSCs) promise high cost-efficiency by amortizing power, cooling, and management overheads. WSCs today host a large variety of jobs with two broad performance requirements categories: latency-critical (LC) and best-effort (BE). Ideally, to fully utilize all hardware resources, WSC operators can simply fill all the nodes with computing jobs. Unfortunately, because colocated jobs contend for shared resources, systems with high loads often experience performance degradation, which negatively impacts the Quality of Service (QoS) for LC jobs. In fact, service providers usually over-provision resources to avoid any interference with LC jobs, leading to significant resource inefficiencies. In this dissertation, I explore opportunities across different system-abstraction layers to improve the cost-efficiency of dataceters by increasing resource utilization of WSCs with little or no impact on the performance of LC jobs. The dissertation has three main components. First, I explore opportunities to improve the throughput of multicore systems by reducing the performance variation of LC jobs. The main insight is that by reshaping the latency distribution curve, performance headroom of LC jobs can be effectively converted to improved BE throughput. I develop, implement, and evaluate a runtime system that achieves this goal with existing hardware. I leverage the cache partitioning, per-core frequency scaling, and thread masking of server processors. Evaluation results show the proposed solution enables 30% higher system throughput compared to solutions proposed in prior works while maintaining at least as good QoS for LC jobs. Second, I study resource contention in near-future heterogeneous memory architectures (HMA). This study is motivated by recent developments in non-volatile memory (NVM) technologies, which enable higher storage density at the cost of same performance. To understand the performance and QoS impact of HMAs, I design and implement a performance emulator in the Linux kernel that runs unmodified workloads with high accuracy, low overhead, and complete transparency. I further propose and evaluate multiple data and resource management QoS mechanisms, such as locality-aware page admission, occupancy management, and write buffer jailing. Third, I focus on accelerated machine learning (ML) systems. By profiling the performance of production workloads and accelerators, I show that accelerated ML tasks are highly sensitive to main memory interference due to fine-grained interaction between CPU and accelerator tasks. As a result, memory resource contention can significantly decreases the performance and efficiency gains of accelerators. I propose a runtime system that leverages existing hardware capabilities and show 17% higher system efficiency compared to previous approaches. This study further exposes opportunities for future processor architecturesElectrical and Computer Engineerin

Texas ScholarWorks