40 research outputs found

    Scheduling of data-intensive workloads in a brokered virtualized environment

    Full text link
    Providing performance predictability guarantees is increasingly important in cloud platforms, especially for data-intensive applications, for which performance depends greatly on the available rates of data transfer between the various computing/storage hosts underlying the virtualized resources assigned to the application. With the increased prevalence of brokerage services in cloud platforms, there is a need for resource management solutions that consider the brokered nature of these workloads, as well as the special demands of their intra-dependent components. In this paper, we present an offline mechanism for scheduling batches of brokered data-intensive workloads, which can be extended to an online setting. The objective of the mechanism is to decide on a packing of the workloads in a batch that minimizes the broker's incurred costs, Moreover, considering the brokered nature of such workloads, we define a payment model that provides incentives to these workloads to be scheduled as part of a batch, which we analyze theoretically. Finally, we evaluate the proposed scheduling algorithm, and exemplify the fairness of the payment model in practical settings via trace-based experiments

    On the placement of security-related Virtualised Network Functions over data center networks

    Get PDF
    Middleboxes are typically hardware-accelerated appliances such as firewalls, proxies, WAN optimizers, and NATs that play an important role in service provisioning over today's data centers. Reports show that the number of middleboxes is on par with the number of routers, and consequently represent a significant commitment from an operator's capital and operational expenditure budgets. Over the past few years, software middleboxes known as Virtual Network Functions (VNFs) are replacing the hardware appliances to reduce cost, improve the flexibility of deployment, and allow for extending network functionality in short timescales. This dissertation aims at identifying the unique characteristics of security modules implementation as VNFs in virtualised environments. We focus on the placement of the security VNFs to minimise resource usage without violating the security imposed constraints as a challenge faced by operators today who want to increase the usable capacity of their infrastructures. The work presented here, focuses on the multi-tenant environment where customised security services are provided to tenants. The services are implemented as a software module deployed as a VNF collocated with network switches to reduce overhead. Furthermore, the thesis presents a formalisation for the resource-aware placement of security VNFs and provides a constraint programming solution along with examining heuristic, meta-heuristic and near-optimal/subset-sum solutions to solve larger size problems in reduced time. The results of this work identify the unique and vital constraints of the placement of security functions. They demonstrate that the granularity of the traffic required by the security functions imposes traffic constraints that increase the resource overhead of the deployment. The work identifies the north-south traffic in data centers as the traffic designed for processing for security functions rather than east-west traffic. It asserts that the non-sharing strategy of security modules will reduce the complexity in case of the multi-tenant environment. Furthermore, the work adopts on-path deployment of security VNF traffic strategy, which is shown to reduce resources overhead compared to previous approaches

    A Pattern-Based Approach to Scaffold the IT Infrastructure Design Process

    Get PDF
    Context. The design of Information Technology (IT) infrastructures is a challenging task since it implies proficiency in several areas that are rarely mastered by a single person, thus raising communication problems among those in charge of conceiving, deploying, operating and maintaining/managing them. Most IT infrastructure designs are based on proprietary models, known as blueprints or product-oriented architectures, defined by vendors to facilitate the configuration of a particular solution, based upon their services and products portfolio. Existing blueprints can be facilitators in the design of solutions for a particular vendor or technology. However, since organizations may have infrastructure components from multiple vendors, the use of blueprints aligned with commercial product(s) may cause integration problems among these components and can lead to vendor lock-in. Additionally, these blueprints have a short lifecycle, due to their association with product version(s) or a specific technology, which hampers their usage as a tool for the reuse of IT infrastructure knowledge. Objectives. The objectives of this dissertation are (i) to mitigate the inability to reuse knowledge in terms of best practices in the design of IT infrastructures and, (ii) to simplify the usage of this knowledge, making the IT infrastructure designs simpler, quicker and better documented, while facilitating the integration of components from different vendors and minimizing the communication problems between teams. Method. We conducted an online survey and performed a systematic literature review to support the state of the art and to provide evidence that this research was relevant and had not been conducted before. A model-driven approach was also used for the formalization and empirical validation of well-formedness rules to enhance the overall process of designing IT infrastructures. To simplify and support the design process, a modeling tool, including its abstract and concrete syntaxes was also extended to include the main contributions of this dissertation. Results. We obtained 123 responses to the online survey. Their majority were from people with more than 15 years experience with IT infrastructures. The respondents confirmed our claims regarding the lack of formality and documentation problems on knowledge transfer and only 19% considered that their current practices to represent IT Infrastructures are efficient. A language for modeling IT Infrastructures including an abstract and concrete syntax is proposed to address the problem of informality in their design. A catalog of IT Infrastructure patterns is also proposed to allow expressing best practices in their design. The modeling tool was also evaluated and according to 84% of the respondents, this approach decreases the effort associated with IT infrastructure design and 89% considered that the use of a repository with infrastructure patterns, will help to improve the overall quality of IT infrastructures representations. A controlled experiment was also performed to assess the effectiveness of both the proposed language and the pattern-based IT infrastructure design process supported by the tool. Conclusion. With this work, we contribute to improve the current state of the art in the design of IT infrastructures replacing the ad-hoc methods with more formal ones to address the problems of ambiguity, traceability and documentation, among others, that characterize most of IT infrastructure representations. Categories and Subject Descriptors:C.0 [Computer Systems Organization]: System architecture; D.2.10 [Software Engineering]: Design-Methodologies; D.2.11 [Software Engineering]: Software Architectures-Patterns

    Multicloud Resource Allocation:Cooperation, Optimization and Sharing

    Get PDF
    Nowadays our daily life is not only powered by water, electricity, gas and telephony but by "cloud" as well. Big cloud vendors such as Amazon, Microsoft and Google have built large-scale centralized data centers to achieve economies of scale, on-demand resource provisioning, high resource availability and elasticity. However, those massive data centers also bring about many other problems, e.g., bandwidth bottlenecks, privacy, security, huge energy consumption, legal and physical vulnerabilities. One of the possible solutions for those problems is to employ multicloud architectures. In this thesis, our work provides research contributions to multicloud resource allocation from three perspectives of cooperation, optimization and data sharing. We address the following problems in the multicloud: how resource providers cooperate in a multicloud, how to reduce information leakage in a multicloud storage system and how to share the big data in a cost-effective way. More specifically, we make the following contributions: Cooperation in the decentralized cloud. We propose a decentralized cloud model in which a group of SDCs can cooperate with each other to improve performance. Moreover, we design a general strategy function for SDCs to evaluate the performance of cooperation based on different dimensions of resource sharing. Through extensive simulations using a realistic data center model, we show that the strategies based on reciprocity are more effective than other strategies, e.g., those using prediction based on historical data. Our results show that the reciprocity-based strategy can thrive in a heterogeneous environment with competing strategies. Multicloud optimization on information leakage. In this work, we firstly study an important information leakage problem caused by unplanned data distribution in multicloud storage services. Then, we present StoreSim, an information leakage aware storage system in multicloud. StoreSim aims to store syntactically similar data on the same cloud, thereby minimizing the user's information leakage across multiple clouds. We design an approximate algorithm to efficiently generate similarity-preserving signatures for data chunks based on MinHash and Bloom filter, and also design a function to compute the information leakage based on these signatures. Next, we present an effective storage plan generation algorithm based on clustering for distributing data chunks with minimal information leakage across multiple clouds. Finally, we evaluate our scheme using two real datasets from Wikipedia and GitHub. We show that our scheme can reduce the information leakage by up to 60% compared to unplanned placement. Furthermore, our analysis in terms of system attackability demonstrates that our scheme makes attacks on information much more complex. Smart data sharing. Moving large amounts of distributed data into the cloud or from one cloud to another can incur high costs in both time and bandwidth. The optimization on data sharing in the multicloud can be conducted from two different angles: inter-cloud scheduling and intra-cloud optimization. We first present CoShare, a P2P inspired decentralized cost effective sharing system for data replication to optimize network transfer among small data centers. Then we propose a data summarization method to reduce the total size of dataset, thereby reducing network transfer

    Change Management Systems for Seamless Evolution in Data Centers

    Get PDF
    Revenue for data centers today is highly dependent on the satisfaction of their enterprise customers. These customers often require various features to migrate their businesses and operations to the cloud. Thus, clouds today introduce new features at a swift pace to onboard new customers and to meet the needs of existing ones. This pace of innovation continues to grow on super linearly, e.g., Amazon deployed 1400 new features in 2017. However, such a rapid pace of evolution adds complexities both for users and the cloud. Clouds struggle to keep up with the deployment speed, and users struggle to learn which features they need and how to use them. The pace of these evolutions has brought us to a tipping point: we can no longer use rules of thumb to deploy new features, and customers need help to identify which features they need. We have built two systems: Janus and Cherrypick, to address these problems. Janus helps data center operators roll out new changes to the data center network. It automatically adapts to the data center topology, routing, traffic, and failure settings. The system reduces the risk of new deployments for network operators as they can now pick deployment strategies which are less likely to impact users’ performance. Cherrypick finds near-optimal cloud configurations for big data analytics. It adapts allows users to search through the new machine types the clouds are constantly introducing and find ones with a near-optimal performance that meets their budget. Cherrypick can adapt to new big-data frameworks and applications as well as the new machine types the clouds are constantly introducing. As the pace of cloud innovations increases, it is critical to have tools that allow operators to deploy new changes as well as those that would enable users to adapt to achieve good performance at low cost. The tools and algorithms discussed in this thesis help accomplish these goals
    corecore