1,302 research outputs found

    On the Fly Orchestration of Unikernels: Tuning and Performance Evaluation of Virtual Infrastructure Managers

    Full text link
    Network operators are facing significant challenges meeting the demand for more bandwidth, agile infrastructures, innovative services, while keeping costs low. Network Functions Virtualization (NFV) and Cloud Computing are emerging as key trends of 5G network architectures, providing flexibility, fast instantiation times, support of Commercial Off The Shelf hardware and significant cost savings. NFV leverages Cloud Computing principles to move the data-plane network functions from expensive, closed and proprietary hardware to the so-called Virtual Network Functions (VNFs). In this paper we deal with the management of virtual computing resources (Unikernels) for the execution of VNFs. This functionality is performed by the Virtual Infrastructure Manager (VIM) in the NFV MANagement and Orchestration (MANO) reference architecture. We discuss the instantiation process of virtual resources and propose a generic reference model, starting from the analysis of three open source VIMs, namely OpenStack, Nomad and OpenVIM. We improve the aforementioned VIMs introducing the support for special-purpose Unikernels and aiming at reducing the duration of the instantiation process. We evaluate some performance aspects of the VIMs, considering both stock and tuned versions. The VIM extensions and performance evaluation tools are available under a liberal open source licence

    Virtual Organization Clusters: Self-Provisioned Clouds on the Grid

    Get PDF
    Virtual Organization Clusters (VOCs) provide a novel architecture for overlaying dedicated cluster systems on existing grid infrastructures. VOCs provide customized, homogeneous execution environments on a per-Virtual Organization basis, without the cost of physical cluster construction or the overhead of per-job containers. Administrative access and overlay network capabilities are granted to Virtual Organizations (VOs) that choose to implement VOC technology, while the system remains completely transparent to end users and non-participating VOs. Unlike alternative systems that require explicit leases, VOCs are autonomically self-provisioned according to configurable usage policies. As a grid computing architecture, VOCs are designed to be technology agnostic and are implementable by any combination of software and services that follows the Virtual Organization Cluster Model. As demonstrated through simulation testing and evaluation of an implemented prototype, VOCs are a viable mechanism for increasing end-user job compatibility on grid sites. On existing production grids, where jobs are frequently submitted to a small subset of sites and thus experience high queuing delays relative to average job length, the grid-wide addition of VOCs does not adversely affect mean job sojourn time. By load-balancing jobs among grid sites, VOCs can reduce the total amount of queuing on a grid to a level sufficient to counteract the performance overhead introduced by virtualization

    Kestrel: Job Distribution and Scheduling using XMPP

    Get PDF
    A new distributed computing framework, named Kestrel, for Many-Task Computing (MTC) applications and implementing Virtual Organization Clusters (VOCs) is proposed. Kestrel is a lightweight, highly available system based on the Extensible Messaging and Presence Protocol (XMPP), and has been developed to explore XMPP-based techniques for improving MTC and VOC tolerance to faults due to scaling and intermittently connected heterogeneous resources. Kestrel provides a VOC with a special purpose scheduler for VOCs which can provide better scalability under certain workload assumptions, namely CPU bound processes and bag-of-task applications. Experimental results have shown that Kestrel is capable of operating a VOC of at least 1600 worker nodes with all nodes visible to the scheduler at once. When using multiple sites located in both North America and Europe, the latencies introduced to the round trip time of messages were on the order of 0.3 seconds. To offset the overhead of XMPP processing, a task execution time of 2 seconds is sufficient for a pool of 900 workers on a single site to operate at near 100% use. Requiring tasks that take on the order of 30 seconds to a minute to execute would compensate for increased latency during job dispatch across multiple sites. Kestrel\u27s architecture is rooted in pilot job frameworks heavily used in Grid computing, it is also modeled after the use of IRC by botnets to communicate between compromised machines and command and control servers. For Kestrel, the extensibility of XMPP has allowed development of protocols for identifying manager nodes, discovering the capabilities of worker agents, and for distributing tasks. The presence notifications provided by XMPP allow Kestrel to monitor the global state of the pool and to perform task dispatching based on worker availability. In this work it is argued that XMPP is by design a very good fit for cloud computing frameworks. It offers scalability, federation between servers and some autonomicity of the agents. During the summer of 2010, Kestrel was used and modified based on feedback from the STAR group at Brookhaven National Laboratories. STAR provided a virtual machine image with applications for simulating proton collisions using PYTHIA and GEANT3. A Kestrel-based virtual organization cluster, created on top of Clemson University\u27s Palmetto cluster, was able to provide over 400,000 CPU hours of computation over the course of a month using an average of 800 virtual machine instances every day, generating nearly seven terabytes of data and the largest PYTHIA production run that STAR ever achieved. Several architectural issues were encountered during the course of the experiment and were resolved by moving from the original JSON protocols used by Kestrel to native XMPP equivalents that offered better message delivery confirmation and integration with existing tools

    iCanCloud: a flexible and scalable cloud infrastructure simulator

    Get PDF
    Simulation techniques have become a powerful tool for deciding the best starting conditions on pay-as-you-go scenarios. This is the case of public cloud infrastructures, where a given number and type of virtual machines (in short VMs) are instantiated during a specified time, being this reflected in the final budget. With this in mind, this paper introduces and validates iCanCloud, a novel simulator of cloud infrastructures with remarkable features such as flexibility, scalability, performance and usability. Furthermore, the iCanCloud simulator has been built on the following design principles: (1) it's targeted to conduct large experiments, as opposed to others simulators from literature; (2) it provides a flexible and fully customizable global hypervisor for integrating any cloud brokering policy; (3) it reproduces the instance types provided by a given cloud infrastructure; and finally, (4) it contains a user-friendly GUI for configuring and launching simulations, that goes from a single VM to large cloud computing systems composed of thousands of machines.This research was partially supported by the following projects: Spanish MEC project TESIS (TIN2009-14312-C02-01), and Spanish Ministry of Science and Innovation under the grant TIN2010-16497.Publicad

    A Design of a Generic Profile-Based Queue System

    Get PDF
    Website and server hosting accounts impose resource limits which restrict the processing power available to applications. One technique to bypass these restrictions is to split up large jobs into smaller tasks that can then be queued and processed task by task. This is a fairly common need. However, different application jobs can differ widely in nature and in their requirements. Thus, a queue system built for one job type may not be entirely suitable for another. This situation could result in the having to implement separate, additional queue systems for different needs. This research proposes a generic queue core design that can accommodate a large variety of job types by providing a basic set of features which can be easily extended to add specificity. The design includes a detailed discussion on queue implementation, scheduling, directory structure and business tier logic. Furthermore, it features highly configurable, time-sensitive performance management that can be customized for any job type. This is provided as the ability to indicate desired performance profiles for any given slot of time during the week. Actual performance data based on the usage of a prototype is also included to demonstrate the significant advantage of using the queue system

    Evaluating and Enabling Scalable High Performance Computing Workloads on Commercial Clouds

    Get PDF
    Performance, usability, and accessibility are critical components of high performance computing (HPC). Usability and performance are especially important to academic researchers as they generally have little time to learn a new technology and demand a certain type of performance in order to ensure the quality and quantity of their research results. We have observed that while not all workloads run well in the cloud, some workloads perform well. We have also observed that although commercial cloud adoption by industry has been growing at a rapid pace, its use by academic researchers has not grown as quickly. We aim to help close this gap and enable researchers to utilize the commercial cloud more efficiently and effectively. We present our results on architecting and benchmarking an HPC environment on Amazon Web Services (AWS) where we observe that there are particular types of applications that are and are not suited for the commercial cloud. Then, we present our results on architecting and building a provisioning and workflow management tool (PAW), where we developed an application that enables a user to launch an HPC environment in the cloud, execute a customizable workflow, and after the workflow has completed delete the HPC environment automatically. We then present our results on the scalability of PAW and the commercial cloud for compute intensive workloads by deploying a 1.1 million vCPU cluster. We then discuss our research into the feasibility of utilizing commercial cloud infrastructure to help tackle the large spikes and data-intensive characteristics of Transportation Cyberphysical Systems (TCPS) workloads. Then, we present our research in utilizing the commercial cloud for urgent HPC applications by deploying a 1.5 million vCPU cluster to process 211TB of traffic video data to be utilized by first responders during an evacuation situation. Lastly, we present the contributions and conclusions drawn from this work

    Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

    Get PDF
    International audienceThis paper presents the architecture of the Virtual Imaging Platform sup- porting the execution of medical image simulation workflows on multiple comput- ing infrastructures. The system relies on the MOTEUR engine for workflow execu- tion and on the DIRAC pilot-job system for workload management. The jGASW code wrapper is extended to describe applications running on multiple infrastruc- tures and a DIRAC cluster agent that can securely involve personal cluster re- sources with no administrator intervention is proposed. Grid data management is complemented with local storage used as a failover in case of file transfer errors. Between November 2010 and April 2011 the platform was used by 10 users to run 484 workflow instances representing 10.8 CPU years. Tests show that a small per- sonal cluster can significantly contribute to a simulation running on EGI and that the improved data manager can decrease the job failure rate from 7.7% to 1.5%
    • …
    corecore