Search CORE

95 research outputs found

Enhancing HPC on Virtual Systems in Clouds through Optimizing Virtual Overlay Networks

Author: Cui Zheng
Publication venue: UNM Digital Repository
Publication date: 01/07/2013
Field of study

Virtual Ethernet overlay provides a powerful model for realizing virtual distributed and parallel computing systems with strong isolation, portability, and recoverability properties. However, in extremely high throughput and low latency networks, such overlays can suffer from bandwidth and latency limitations, which is of particular concern in HPC environments. Through a careful and quantitative analysis, I iden- tify three core issues limiting performance: delayed and excessive virtual interrupt delivery into guests, copies between host and guest data buffers during encapsulation, and the semantic gap between virtual Ethernet features and underlying physical network features. I propose three novel optimizations in response: optimistic timer- free virtual interrupt injection, zero-copy cut-through data forwarding, and virtual TCP offload. These optimizations improve the latency and bandwidth of the overlay network on 10 Gbps Ethernet and InfiniBand interconnects, resulting in near-native performance for a wide range of microbenchmarks and MPI application benchmarks

Optimizing Network Virtualization in Xen

Author: Cox Alan L.
Menon Aravind
Zwaenepoel Willy
Publication venue
Publication date: 08/05/2006
Field of study

BEST PAPER AWARDIn this paper, we propose and evaluate three techniques for optimizing network performance in the Xen virtualized environment. Our techniques retain the basic Xen architecture of locating device drivers in a privileged `driver' domain with access to I/O devices, and providing network access to unprivileged `guest' domains through virtualized network interfaces. First, we redefine the virtual network interfaces of guest domains to incorporate high-level network offfload features available in most modern network cards. We demonstrate the performance benefits of high-level offload functionality in the virtual interface, even when such functionality is not supported in the underlying physical interface. Second, we optimize the implementation of the data transfer path between guest and driver domains. The optimization avoids expensive data remapping operations on the transmit path, and replaces page remapping by data copying on the receive path. Finally, we provide support for guest operating systems to effectively utilize advanced virtual memory features such as superpages and global page mappings. The overall impact of these optimizations is an improvement in transmit performance of guest domains by a factor of 4.4. The receive performance of the driver domain is improved by 35% and reaches within 7% of native Linux performance. The receive performance in guest domains improves by 18%, but still trails the native Linux performance by 61%. We analyse the performance improvements in detail, and quantify the contribution of each optimization to the overall performance

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

New Architectures and Mechanisms for the Network Subsystem in Virtualized Servers

Author: Ram Kaushik Kumar
Publication venue
Publication date: 24/07/2013
Field of study

Machine virtualization has become a cornerstone of modern datacenters. It enables server consolidation as a means to reduce costs and increase efficiencies. The communication endpoints within the datacenter are now virtual machines (VMs), not physical servers. Consequently, the datacenter network now extends into the server and last hop switching occurs inside the server. Today, thanks to increasing core counts on processors, server VM densities are on the rise. This trend is placing enormous pressure on the network I/O subsystem and the last hop virtual switch to support efficient communication, both internal and external to the server. But the current state-of-the-art solutions fall short of these requirements. This thesis presents new architectures and mechanisms for the network subsystem in virtualized servers to build efficient virtualization platforms. Specifically, there are three primary contributions in this thesis. First, it presents a new mechanism to reduce memory sharing overheads in driver domain-based I/O architectures. The key idea is to enable a guest operating system to reuse its I/O buffers that are shared with a driver domain. Second, it describes Hyper-Switch, a highly streamlined, efficient, and scalable software-based virtual switching architecture, specifically for hypervisors that support driver domains. The Hyper-Switch combines the best of the existing architectures by hosting the device drivers in a driver domain to isolate any faults and placing the virtual switch in the hypervisor to perform efficient packet switching. Further, the Hyper-Switch implements several optimizations, such as virtual machine state-aware batching, preemptive copying, and dynamic offloading of packet processing to idle CPU cores, to enable efficient packet processing, better utilization of the available CPU resources, and higher concurrency. This architecture eliminates the memory sharing overheads associated with driver domains. Third, this thesis proposes an alternate virtual switching architecture, called sNICh, which explores the idea of server/switch integration. The sNICh is a combined network interface card (NIC) and datacenter switching accelerator. This takes the Hyper-Switch architecture one step further. It offloads the data plane of the switch to the network device, eliminating driver domains entirely

DSpace at Rice University

Unikernels Everywhere: The Case for Elastic CDNs

Author: Honda Michio
Huici Felipe
Ivanov Anton
Kuenzer Simon
Manco Filipe
Mendes Jose
Schmidt Florian
Volchkov Yuri
Yasukata Kenichi
Publication venue: ACM Press
Publication date: 01/01/2017
Field of study

peer reviewedVideo streaming dominates the Internet’s overall traffic mix, with reports stating that it will constitute 90% of all consumer traffic by 2019. Most of this video is delivered by Content Delivery Networks (CDNs), and, while they optimize QoE metrics such as buffering ratio and start-up time, no single CDN provides optimal performance. In this paper we make the case for elastic CDNs, the ability to build virtual CDNs on-the-fly on top of shared, third-party infrastructure at a scale. To bring this idea closer to reality we begin by large-scale simulations to quantify the effects that elastic CDNs would have if deployed, and build and evaluate MiniCache, a specialized, minimalistic virtualized content cache that runs on the Xen hypervisor. MiniCache is able to serve content at rates of up to 32 Gb/s and handle up to 600K reqs/sec on a single CPU core, as well as boot in about 90 milliseconds on x86 and around 370 milliseconds on ARM32

Open Repository and Bibliography - Liège

Evaluating the Performance Impact of Xen on MPI and Process Execution For HPC Systems

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Supporting soft real-time tasks in the xen hypervisor

Author: A. S. Krishnakumar
Min Lee
Navjot Singh
P. Krishnan
Shalini Yajnik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Virtualization technology enables server consolidation and has given an impetus to low-cost green data centers. However, current hypervisors do not provide adequate support for real-time applications, and this has limited the adoption of virtualization in some domains. Soft real-time applications, such as media-based ones, are impeded by components of virtualization including low-performance virtualization I/O, increased scheduling latency, and shared-cache contention. The virtual machine scheduler is central to all these issues. The goal in this paper is to adapt the virtual machine scheduler to be more soft-real-time friendly. We improve two aspects of the VMM scheduler – managing scheduling latency as a first-class resource and managing shared caches. We use enterprise IP telephony as an illustrative soft real-time workload and design a scheduler S that incorporates th

CiteSeerX

Crossref

Virtuoso: High Resource Utilization and {\mu}s-scale Performance Isolation in a Shared Virtual Machine TCP Network Stack

Author: Arzola Liam
Kaufmann Antoine
Peter Simon
Stolet Matheus
Publication venue
Publication date: 25/09/2023
Field of study

Virtualization improves resource efficiency and ensures security and performance isolation for cloud applications. To that end, operators today use a layered architecture that runs a separate network stack instance in each VM and container connected to a separate virtual switch. Decoupling through layering reduces complexity, but induces performance and resource overheads that are at odds with increasing demands for network bandwidth, communication requirements for large distributed applications, and low latency. We present Virtuoso, a new software networking stack for VMs and containers. Virtuoso performs a fundamental re-organization of the networking stack to maximize CPU utilization, enforce isolation, and minimize networking stack overheads. We maximize utilization by running one elastically shared network stack instance on dedicated cores; we enforce isolation by performing central and fine-grained per-packet resource accounting and scheduling; we reduce overheads by building a single-layer data path with a one-shot fast-path incorporating all processing from the TCP transport layer through network virtualization and virtual switching. Virtuoso improves resource utilization by up to 50%, latencies by up to 42% compared to other virtualized network stacks without sacrificing isolation, and keeps processing overhead within 11.5% of unvirtualized network stacks.Comment: Under submission for conference peer revie

arXiv.org e-Print Archive

GMEM: Generalized Memory Management for Peripheral Devices

Author: Cox Alan L.
Rixner Scott
Zhu Weixi
Publication venue
Publication date: 19/10/2023
Field of study

This paper presents GMEM, generalized memory management, for peripheral devices. GMEM provides OS support for centralized memory management of both CPU and devices. GMEM provides a high-level interface that decouples MMU-specific functions. Device drivers can thus attach themselves to a process's address space and let the OS take charge of their memory management. This eliminates the need for device drivers to "reinvent the wheel" and allows them to benefit from general memory optimizations integrated by GMEM. Furthermore, GMEM internally coordinates all attached devices within each virtual address space. This drastically improves user-level programmability, since programmers can use a single address space within their program, even when operating across the CPU and multiple devices. A case study on device drivers demonstrates these benefits. A GMEM-based IOMMU driver eliminates around seven hundred lines of code and obtains 54% higher network receive throughput utilizing 32% less CPU compared to the state-of-the-art. In addition, the GMEM-based driver of a simulated GPU takes less than 70 lines of code, excluding its MMU functions.Comment: Finished before Weixi left Rice and submitted to ASPLOS'2

arXiv.org e-Print Archive

Re-designing Dynamic Content Delivery in the Light of a Virtualized Infrastructure

Author: Bifulco Roberto
Blefari-Melazzi Nicola
Huici Felipe
Jacobs Tobias
Kuenzer Simon
Salsano Stefano
Siracusano Giuseppe
Trevisan Martino
Publication venue
Publication date: 13/09/2017
Field of study

We explore the opportunities and design options enabled by novel SDN and NFV technologies, by re-designing a dynamic Content Delivery Network (CDN) service. Our system, named MOSTO, provides performance levels comparable to that of a regular CDN, but does not require the deployment of a large distributed infrastructure. In the process of designing the system, we identify relevant functions that could be integrated in the future Internet infrastructure. Such functions greatly simplify the design and effectiveness of services such as MOSTO. We demonstrate our system using a mixture of simulation, emulation, testbed experiments and by realizing a proof-of-concept deployment in a planet-wide commercial cloud system.Comment: Extended version of the paper accepted for publication in JSAC special issue on Emerging Technologies in Software-Driven Communication - November 201

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Trieste

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

ART

Recommended from our members

Elastic Resource Management in Distributed Clouds

Author: Guo Tian
Publication venue: ScholarWorks@UMass Amherst
Publication date: 14/11/2016
Field of study

The ubiquitous nature of computing devices and their increasing reliance on remote resources have driven and shaped public cloud platforms into unprecedented large-scale, distributed data centers. Concurrently, a plethora of cloud-based applications are experiencing multi-dimensional workload dynamics---workload volumes that vary along both time and space axes and with higher frequency. The interplay of diverse workload characteristics and distributed clouds raises several key challenges for efficiently and dynamically managing server resources. First, current cloud platforms impose certain restrictions that might hinder some resource management tasks. Second, an application-agnostic approach might not entail appropriate performance goals, therefore, requires numerous specific methods. Third, provisioning resources outside LAN boundary might incur huge delay which would impact the desired agility. In this dissertation, I investigate the above challenges and present the design of automated systems that manage resources for various applications in distributed clouds. The intermediate goal of these automated systems is to fully exploit potential benefits such as reduced network latency offered by increasingly distributed server resources. The ultimate goal is to improve end-to-end user response time with novel resource management approaches, within a certain cost budget. Centered around these two goals, I first investigate how to optimize the location and performance of virtual machines in distributed clouds. I use virtual desktops, mostly serving a single user, as an example use case for developing a black-box approach that ranks virtual machines based on their dynamic latency requirements. Those with high latency sensitivities have a higher priority of being placed or migrated to a cloud location closest to their users. Next, I relax the assumption of well-provisioned virtual machines and look at how to provision enough resources for applications that exhibit both temporal and spatial workload fluctuations. I propose an application-agnostic queueing model that captures the resource utilization and server response time. Building upon this model, I present a geo-elastic provisioning approach---referred as geo-elasticity---for replicable multi-tier applications that can spin up an appropriate amount of server resources in any cloud locations. Last, I explore the benefits of providing geo-elasticity for database clouds, a popular platform for hosting application backends. Performing geo-elastic provisioning for backend database servers entails several challenges that are specific to database workload, and therefore requires tailored solutions. In addition, cloud platforms offer resources at various prices for different locations. Towards this end, I propose a cost-aware geo-elasticity that combines a regression-based workload model and a queueing network capacity model for database clouds. In summary, hosting a diverse set of applications in an increasingly distributed cloud makes it interesting and necessary to develop new, efficient and dynamic resource management approaches

ScholarWorks@UMass Amherst