40 research outputs found
Scheduling of data-intensive workloads in a brokered virtualized environment
Providing performance predictability guarantees is increasingly important in cloud platforms, especially for data-intensive applications, for which performance depends greatly on the available rates of data transfer between the various computing/storage hosts underlying the virtualized resources assigned to the application. With the increased prevalence of brokerage services in cloud platforms, there is a need for resource management solutions that consider the brokered nature of these workloads, as well as the special demands of their intra-dependent components. In this paper, we present an offline mechanism for scheduling batches of brokered data-intensive workloads, which can be extended to an online setting. The objective of the mechanism is to decide on a packing of the workloads in a batch that minimizes the broker's incurred costs, Moreover, considering the brokered nature of such workloads, we define a payment model that provides incentives to these workloads to be scheduled as part of a batch, which we analyze theoretically. Finally, we evaluate the proposed scheduling algorithm, and exemplify the fairness of the payment model in practical settings via trace-based experiments
Recommended from our members
Transiency-driven Resource Management for Cloud Computing Platforms
Modern distributed server applications are hosted on enterprise or cloud data centers that provide computing, storage, and networking capabilities to these applications. These applications are built using the implicit assumption that the underlying servers will be stable and normally available, barring for occasional faults. In many emerging scenarios, however, data centers and clouds only provide transient, rather than continuous, availability of their servers. Transiency in modern distributed systems arises in many contexts, such as green data centers powered using renewable intermittent sources, and cloud platforms that provide lower-cost transient servers which can be unilaterally revoked by the cloud operator.
Transient computing resources are increasingly important, and existing fault-tolerance and resource management techniques are inadequate for transient servers because applications typically assume continuous resource availability. This thesis presents research in distributed systems design that treats transiency as a first-class design principle. I show that combining transiency-specific fault-tolerance mechanisms with resource management policies to suit application characteristics and requirements, can yield significant cost and performance benefits. These mechanisms and policies have been implemented and prototyped as part of software systems, which allow a wide range of applications, such as interactive services and distributed data processing, to be deployed on transient servers, and can reduce cloud computing costs by up to 90\%.
This thesis makes contributions to four areas of computer systems research: transiency-specific fault-tolerance, resource allocation, abstractions, and resource reclamation. For reducing the impact of transient server revocations, I develop two fault-tolerance techniques that are tailored to transient server characteristics and application requirements. For interactive applications, I build a derivative cloud platform that masks revocations by transparently moving application-state between servers of different types. Similarly, for distributed data processing applications, I investigate the use of application level periodic checkpointing to reduce the performance impact of server revocations. For managing and reducing the risk of server revocations, I investigate the use of server portfolios that allow transient resource allocation to be tailored to application requirements.
Finally, I investigate how resource providers (such as cloud platforms) can provide transient resource availability without revocation, by looking into alternative resource reclamation techniques. I develop resource deflation, wherein a server\u27s resources are fractionally reclaimed, allowing the application to continue execution albeit with fewer resources. Resource deflation generalizes revocation, and the deflation mechanisms and cluster-wide policies can yield both high cluster utilization and low application performance degradation
On the placement of security-related Virtualised Network Functions over data center networks
Middleboxes are typically hardware-accelerated appliances such as firewalls, proxies, WAN optimizers, and NATs that play an important role in service provisioning over today's data centers. Reports show that the number of middleboxes is on par with the number of routers, and consequently represent a significant commitment from an operator's capital and operational expenditure budgets. Over the past few years, software middleboxes known as Virtual Network Functions (VNFs) are replacing the hardware appliances to reduce cost, improve the flexibility of deployment, and allow for extending network functionality in short timescales.
This dissertation aims at identifying the unique characteristics of security modules implementation as VNFs in virtualised environments. We focus on the placement of the security VNFs to minimise resource usage without violating the security imposed constraints as a challenge faced by operators today who want to increase the usable capacity of their infrastructures. The work presented here, focuses on the multi-tenant environment where customised security services are provided to tenants. The services are implemented as a software module deployed as a VNF collocated with network switches to reduce overhead. Furthermore, the thesis presents a formalisation for the resource-aware placement of security VNFs and provides a constraint programming solution along with examining heuristic, meta-heuristic and near-optimal/subset-sum solutions to solve larger size problems in reduced time.
The results of this work identify the unique and vital constraints of the placement of security functions. They demonstrate that the granularity of the traffic required by the security functions imposes traffic constraints that increase the resource overhead of the deployment. The work identifies the north-south traffic in data centers as the traffic designed for processing for security functions rather than east-west traffic. It asserts that the non-sharing strategy of security modules will reduce the complexity in case of the multi-tenant environment. Furthermore, the work adopts on-path deployment of security VNF traffic strategy, which is shown to reduce resources overhead compared to previous approaches
A Pattern-Based Approach to Scaffold the IT Infrastructure Design Process
Context. The design of Information Technology (IT) infrastructures is a challenging task
since it implies proficiency in several areas that are rarely mastered by a single person,
thus raising communication problems among those in charge of conceiving, deploying,
operating and maintaining/managing them. Most IT infrastructure designs are based
on proprietary models, known as blueprints or product-oriented architectures, defined by vendors to facilitate the configuration of a particular solution, based upon their services and products portfolio. Existing blueprints can be facilitators in the design of solutions for a particular vendor or technology. However, since organizations may have infrastructure components from multiple vendors, the use of blueprints aligned with commercial product(s) may cause integration problems among these components and can lead to vendor lock-in. Additionally, these blueprints have a short lifecycle, due to their association with product version(s) or a specific technology, which hampers their usage as a tool for the reuse of IT infrastructure knowledge.
Objectives. The objectives of this dissertation are (i) to mitigate the inability to reuse
knowledge in terms of best practices in the design of IT infrastructures and, (ii) to simplify the usage of this knowledge, making the IT infrastructure designs simpler, quicker and better documented, while facilitating the integration of components from different vendors and minimizing the communication problems between teams.
Method. We conducted an online survey and performed a systematic literature review
to support the state of the art and to provide evidence that this research was relevant
and had not been conducted before. A model-driven approach was also used for the
formalization and empirical validation of well-formedness rules to enhance the overall
process of designing IT infrastructures. To simplify and support the design process, a
modeling tool, including its abstract and concrete syntaxes was also extended to include the main contributions of this dissertation.
Results. We obtained 123 responses to the online survey. Their majority were from
people with more than 15 years experience with IT infrastructures. The respondents
confirmed our claims regarding the lack of formality and documentation problems on
knowledge transfer and only 19% considered that their current practices to represent IT Infrastructures are efficient. A language for modeling IT Infrastructures including an abstract and concrete syntax is proposed to address the problem of informality in their
design. A catalog of IT Infrastructure patterns is also proposed to allow expressing best
practices in their design. The modeling tool was also evaluated and according to 84%
of the respondents, this approach decreases the effort associated with IT infrastructure
design and 89% considered that the use of a repository with infrastructure patterns, will
help to improve the overall quality of IT infrastructures representations. A controlled
experiment was also performed to assess the effectiveness of both the proposed language and the pattern-based IT infrastructure design process supported by the tool.
Conclusion. With this work, we contribute to improve the current state of the art in
the design of IT infrastructures replacing the ad-hoc methods with more formal ones to
address the problems of ambiguity, traceability and documentation, among others, that
characterize most of IT infrastructure representations. Categories and Subject Descriptors:C.0 [Computer Systems Organization]: System architecture; D.2.10 [Software Engineering]: Design-Methodologies; D.2.11 [Software
Engineering]: Software Architectures-Patterns
Multicloud Resource Allocation:Cooperation, Optimization and Sharing
Nowadays our daily life is not only powered by water, electricity, gas and telephony but by "cloud" as well. Big cloud vendors such as Amazon, Microsoft and Google have built large-scale centralized data centers to achieve economies of scale, on-demand resource provisioning, high resource availability and elasticity. However, those massive data centers also bring about many other problems, e.g., bandwidth bottlenecks, privacy, security, huge energy consumption, legal and physical vulnerabilities. One of the possible solutions for those problems is to employ multicloud architectures. In this thesis, our work provides research contributions to multicloud resource allocation from three perspectives of cooperation, optimization and data sharing. We address the following problems in the multicloud: how resource providers cooperate in a multicloud, how to reduce information leakage in a multicloud storage system and how to share the big data in a cost-effective way. More specifically, we make the following contributions: Cooperation in the decentralized cloud. We propose a decentralized cloud model in which a group of SDCs can cooperate with each other to improve performance. Moreover, we design a general strategy function for SDCs to evaluate the performance of cooperation based on different dimensions of resource sharing. Through extensive simulations using a realistic data center model, we show that the strategies based on reciprocity are more effective than other strategies, e.g., those using prediction based on historical data. Our results show that the reciprocity-based strategy can thrive in a heterogeneous environment with competing strategies. Multicloud optimization on information leakage. In this work, we firstly study an important information leakage problem caused by unplanned data distribution in multicloud storage services. Then, we present StoreSim, an information leakage aware storage system in multicloud. StoreSim aims to store syntactically similar data on the same cloud, thereby minimizing the user's information leakage across multiple clouds. We design an approximate algorithm to efficiently generate similarity-preserving signatures for data chunks based on MinHash and Bloom filter, and also design a function to compute the information leakage based on these signatures. Next, we present an effective storage plan generation algorithm based on clustering for distributing data chunks with minimal information leakage across multiple clouds. Finally, we evaluate our scheme using two real datasets from Wikipedia and GitHub. We show that our scheme can reduce the information leakage by up to 60% compared to unplanned placement. Furthermore, our analysis in terms of system attackability demonstrates that our scheme makes attacks on information much more complex. Smart data sharing. Moving large amounts of distributed data into the cloud or from one cloud to another can incur high costs in both time and bandwidth. The optimization on data sharing in the multicloud can be conducted from two different angles: inter-cloud scheduling and intra-cloud optimization. We first present CoShare, a P2P inspired decentralized cost effective sharing system for data replication to optimize network transfer among small data centers. Then we propose a data summarization method to reduce the total size of dataset, thereby reducing network transfer
Change Management Systems for Seamless Evolution in Data Centers
Revenue for data centers today is highly dependent on the satisfaction of their enterprise customers. These customers often require various features to migrate their businesses and operations to the cloud. Thus, clouds today introduce new features at a swift pace to onboard new customers and to meet the needs of existing ones. This pace of innovation continues to grow on super linearly, e.g., Amazon deployed 1400 new features in 2017. However, such a rapid pace of evolution adds complexities both for users and the cloud. Clouds struggle to keep up with the deployment speed, and users struggle to learn which features they need and how to use them. The pace of these evolutions has brought us to a tipping point: we can no longer use rules of thumb to deploy new features, and customers need help to identify which features they need. We have built two systems: Janus and Cherrypick, to address these problems. Janus helps data center operators roll out new changes to the data center network. It automatically adapts to the data center topology, routing, traffic, and failure settings. The system reduces the risk of new deployments for network operators as they can now pick deployment strategies which are less likely to impact users’ performance. Cherrypick finds near-optimal cloud configurations for big data analytics. It adapts allows users to search through the new machine types the clouds are constantly introducing and find ones with a near-optimal performance that meets their budget. Cherrypick can adapt to new big-data frameworks and applications as well as the new machine types the clouds are constantly introducing. As the pace of cloud innovations increases, it is critical to have tools that allow operators to deploy new changes as well as those that would enable users to adapt to achieve good performance at low cost. The tools and algorithms discussed in this thesis help accomplish these goals