194 research outputs found

    Joint dimensioning of server and network infrastructure for resilient optical grids/clouds

    Get PDF
    We address the dimensioning of infrastructure, comprising both network and server resources, for large-scale decentralized distributed systems such as grids or clouds. We design the resulting grid/cloud to be resilient against network link or server failures. To this end, we exploit relocation: Under failure conditions, a grid job or cloud virtual machine may be served at an alternate destination (i.e., different from the one under failure-free conditions). We thus consider grid/cloud requests to have a known origin, but assume a degree of freedom as to where they end up being served, which is the case for grid applications of the bag-of-tasks (BoT) type or hosted virtual machines in the cloud case. We present a generic methodology based on integer linear programming (ILP) that: 1) chooses a given number of sites in a given network topology where to install server infrastructure; and 2) determines the amount of both network and server capacity to cater for both the failure-free scenario and failures of links or nodes. For the latter, we consider either failure-independent (FID) or failure-dependent (FD) recovery. Case studies on European-scale networks show that relocation allows considerable reduction of the total amount of network and server resources, especially in sparse topologies and for higher numbers of server sites. Adopting a failure-dependent backup routing strategy does lead to lower resource dimensions, but only when we adopt relocation (especially for a high number of server sites): Without exploiting relocation, potential savings of FD versus FID are not meaningful

    Resilient network dimensioning for optical grid/clouds using relocation

    Get PDF
    In this paper we address the problem of dimensioning infrastructure, comprising both network and server resources, for large-scale decentralized distributed systems such as grids or clouds. We will provide an overview of our work in this area, and in particular focus on how to design the resulting grid/cloud to be resilient against network link and/or server site failures. To this end, we will exploit relocation: under failure conditions, a request may be sent to an alternate destination than the one under failure-free conditions. We will provide a comprehensive overview of related work in this area, and focus in some detail on our own most recent work. The latter comprises a case study where traffic has a known origin, but we assume a degree of freedom as to where its end up being processed, which is typically the case for e. g., grid applications of the bag-of-tasks (BoT) type or for providing cloud services. In particular, we will provide in this paper a new integer linear programming (ILP) formulation to solve the resilient grid/cloud dimensioning problem using failure-dependent backup routes. Our algorithm will simultaneously decide on server and network capacity. We find that in the anycast routing problem we address, the benefit of using failure-dependent (FD) rerouting is limited compared to failure-independent (FID) backup routing. We confirm our earlier findings in terms of network capacity savings achieved by relocation compared to not exploiting relocation (order of 6-10% in the current case studies)

    Selecting the best locations for data centers in resilient optical grid/cloud dimensioning

    Get PDF
    For optical grid/cloud scenarios, the dimensioning problem comprises not only deciding on the network dimensions (i.e., link bandwidths), but also choosing appropriate locations to install server infrastructure (i.e., data centers), as well as determining the amount of required server resources (for storage and/or processing). Given that users of such grid/cloud systems in general do not care about the exact physical locations of the server resources, a degree of freedom arises in choosing for each of their requests the most appropriate server location. We will exploit this anycast routing principle (i.e., source of traffic is given, but destination can be chosen rather freely) also to provide resilience: traffic may be relocated to alternate destinations in case of network/server failures. In this study, we propose to jointly optimize the link dimensioning and the location of the servers in an optical grid/cloud, where the anycast principle is applied for resiliency against either link or server node failures. While the data center location problem has some resemblance with either the classical p-center or k-means location problems, the anycast principle makes it much more difficult due to the requirement of link disjoint paths for ensuring grid resiliency

    Resilience options for provisioning anycast cloud services with virtual optical networks

    Get PDF
    Optical networks are crucial to support increasingly demanding cloud services. Delivering the requested quality of services (in particular latency) is key to successfully provisioning end-to-end services in clouds. Therefore, as for traditional optical network services, it is of utter importance to guarantee that clouds are resilient to any failure of either network infrastructure (links and/or nodes) or data centers. A crucial concept in establishing cloud services is that of network virtualization: the physical infrastructure is logically partitioned in separate virtual networks. To guarantee end-to-end resilience for cloud services in such a set-up, we need to simultaneously route the services and map the virtual network, in such a way that an alternate routing in case of physical resource failures is always available. Note that combined control of the network and data center resources is exploited, and the anycast routing concept applies: we can choose the data center to provide server resources requested by the customer to optimize resource usage and/or resiliency. This paper investigates the design of scalable optimization models to perform the virtual network mapping resiliently. We compare various resilience options, and analyze their compromise between bandwidth requirements and resiliency quality

    Dimensioning backbone networks for multi-site data centers: exploiting anycast routing for resilience

    Get PDF
    In the current era of big data, applications increasingly rely on powerful computing infrastructure residing in large data centers (DCs), often adopting cloud computing technology. Clearly, this necessitates efficient and resilient networking infrastructure to connect the users of these applications with the data centers hosting them. In this paper, we focus on backbone network infrastructure on large geographical scales (i.e., the so-called wide area networks), which typically adopts optical network technology. In particular, we study the problem of dimensioning such backbone networks: what bandwidth should each of the links provide for the traffic, originating at known sources, to reach the data centers? And possibly even: how many such DCs should we deploy, and at what locations? More concretely, we summarize our recent work that essentially addresses the following fundamental research questions: (1) Does the anycast routing strategy influence the amount of required network resources? (2) Can we exploit anycast routing for resilience purposes, i.e., relocate to a different DC under failure conditions, to reduce resource capacity requirements? (3) Is it advantageous to change anycast request destinations from one DC location to the other, from one time period to the next, if service requests vary over time

    Scalable algorithms for QoS-aware virtual network mapping for cloud services

    Get PDF
    Both business and consumer applications increasingly depend on cloud solutions. Yet, many are still reluctant to move to cloud-based solutions, mainly due to concerns of service quality and reliability. Since cloud platforms depend both on IT resources (located in data centers, DCs) and network infrastructure connecting to it, both QoS and resilience should be offered with end-to-end guarantees up to and including the server resources. The latter currently is largely impeded by the fact that the network and cloud DC domains are typically operated by disjoint entities. Network virtualization, together with combined control of network and IT resources can solve that problem. Here, we formally state the combined network and IT provisioning problem for a set of virtual networks, incorporating resilience as well as QoS in physical and virtual layers. We provide a scalable column generation model, to address real world network sizes. We analyze the latter in extensive case studies, to answer the question at which layer to provision QoS and resilience in virtual networks for cloud services

    Design and optimization of optical grids and clouds

    Get PDF

    Anycast end-to-end resilience for cloud services over virtual optical networks

    Get PDF
    Optical networks are crucial to support increasingly demanding cloud services. Delivering the requested quality of service is key to successfully provisioning end-to-end services in clouds. Therefore, as for traditional optical network services, it is of utter importance to guarantee that clouds are resilient to any failure of either network infrastructure or data centers. A crucial concept in establishing cloud services is that of network virtualization: the physical infrastructure is logically partitioned in separate virtual networks. Also, combined control of the network and data center (IT) resources is exploited. To guarantee end-to-end resilience for cloud services in such a set-up, we need to simultaneously route the services and map the virtual network, while ensuring that an alternate routing is always available. Note that the anycast routing concept applies: assigning server resources requested by the customer to a particular (physical) data center can be done transparently. This paper investigates the design of scalable optimization models to perform the virtual network mapping resiliently (for single bidirectional link failures), thus supporting resilient anycast cloud virtual networks. We compare two resilience approaches: PIP-resilience maps each virtual link to two alternate physical routes, VNO-resilience provides alternate paths in the virtual topology (while enforcing physical link disjointness)