869 research outputs found
Notes on Cloud computing principles
This letter provides a review of fundamental distributed systems and economic
Cloud computing principles. These principles are frequently deployed in their
respective fields, but their inter-dependencies are often neglected. Given that
Cloud Computing first and foremost is a new business model, a new model to sell
computational resources, the understanding of these concepts is facilitated by
treating them in unison. Here, we review some of the most important concepts
and how they relate to each other
The state of SQL-on-Hadoop in the cloud
Managed Hadoop in the cloud, especially SQL-on-Hadoop, has been gaining attention recently. On Platform-as-a-Service (PaaS), analytical services like Hive and Spark come preconfigured for general-purpose and ready to use. Thus, giving companies a quick entry and on-demand deployment of ready SQL-like solutions for their big data needs. This study evaluates cloud services from an end-user perspective, comparing providers including: Microsoft Azure, Amazon Web Services, Google Cloud,
and Rackspace. The study focuses on performance, readiness, scalability, and cost-effectiveness of the different solutions at entry/test level clusters sizes. Results are based on over 15,000 Hive queries derived from the industry standard TPC-H benchmark.
The study is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines.
The ALOJA Project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization.
The study benchmarks cloud providers across a diverse range instance types, and uses input data scales from 1GB to 1TB, in order to survey the popular entry-level PaaS SQL-on-Hadoop solutions, thereby establishing a common results-base upon which subsequent research can be carried out by the project. Initial results already show the main performance trends to both hardware and software configuration, pricing, similarities and architectural differences of the evaluated PaaS solutions. Whereas some
providers focus on decoupling storage and computing resources while offering network-based elastic storage, others choose to keep the local processing model from Hadoop for high performance, but reducing flexibility. Results also show the importance of application-level tuning and how keeping up-to-date hardware and software stacks can influence performance even more than replicating the on-premises model in the cloud.This work is partially supported by the Microsoft Azure for Research program, the European Research Council (ERC) under
the EUs Horizon 2020 programme (GA 639595), the Spanish Ministry of Education (TIN2015-65316-P), and the Generalitat
de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft
Cloud Computing cost and energy optimization through Federated Cloud SoS
2017 Fall.Includes bibliographical references.The two most significant differentiators amongst contemporary Cloud Computing service providers have increased green energy use and datacenter resource utilization. This work addresses these two issues from a system's architectural optimization viewpoint. The proposed approach herein, allows multiple cloud providers to utilize their individual computing resources in three ways by: (1) cutting the number of datacenters needed, (2) scheduling available datacenter grid energy via aggregators to reduce costs and power outages, and lastly by (3) utilizing, where appropriate, more renewable and carbon-free energy sources. Altogether our proposed approach creates an alternative paradigm for a Federated Cloud SoS approach. The proposed paradigm employs a novel control methodology that is tuned to obtain both financial and environmental advantages. It also supports dynamic expansion and contraction of computing capabilities for handling sudden variations in service demand as well as for maximizing usage of time varying green energy supplies. Herein we analyze the core SoS requirements, concept synthesis, and functional architecture with an eye on avoiding inadvertent cascading conditions. We suggest a physical architecture that diminishes unwanted outcomes while encouraging desirable results. Finally, in our approach, the constituent cloud services retain their independent ownership, objectives, funding, and sustainability means. This work analyzes the core SoS requirements, concept synthesis, and functional architecture. It suggests a physical structure that simulates the primary SoS emergent behavior to diminish unwanted outcomes while encouraging desirable results. The report will analyze optimal computing generation methods, optimal energy utilization for computing generation as well as a procedure for building optimal datacenters using a unique hardware computing system design based on the openCompute community as an illustrative collaboration platform. Finally, the research concludes with security features cloud federation requires to support to protect its constituents, its constituents tenants and itself from security risks
Elastic Multi-resource Network Slicing: Can Protection Lead to Improved Performance?
In order to meet the performance/privacy requirements of future
data-intensive mobile applications, e.g., self-driving cars, mobile data
analytics, and AR/VR, service providers are expected to draw on shared
storage/computation/connectivity resources at the network "edge". To be
cost-effective, a key functional requirement for such infrastructure is
enabling the sharing of heterogeneous resources amongst tenants/service
providers supporting spatially varying and dynamic user demands. This paper
proposes a resource allocation criterion, namely, Share Constrained Slicing
(SCS), for slices allocated predefined shares of the network's resources, which
extends the traditional alpha-fairness criterion, by striking a balance among
inter- and intra-slice fairness vs. overall efficiency. We show that SCS has
several desirable properties including slice-level protection, envyfreeness,
and load driven elasticity. In practice, mobile users' dynamics could make the
cost of implementing SCS high, so we discuss the feasibility of using a simpler
(dynamically) weighted max-min as a surrogate resource allocation scheme. For a
setting with stochastic loads and elastic user requirements, we establish a
sufficient condition for the stability of the associated coupled network
system. Finally, and perhaps surprisingly, we show via extensive simulations
that while SCS (and/or the surrogate weighted max-min allocation) provides
inter-slice protection, they can achieve improved job delay and/or perceived
throughput, as compared to other weighted max-min based allocation schemes
whose intra-slice weight allocation is not share-constrained, e.g., traditional
max-min or discriminatory processor sharing
A study on performance measures for auto-scaling CPU-intensive containerized applications
Autoscaling of containers can leverage performance measures from the different layers of the computational stack. This paper investigate the problem of selecting the most appropriate performance measure to activate auto-scaling actions aiming at guaranteeing QoS constraints. First, the correlation between absolute and relative usage measures and how a resource allocation decision can be influenced by them is analyzed in different workload scenarios. Absolute and relative measures could assume quite different values. The former account for the actual utilization of resources in the host system, while the latter account for the share that each container has of the resources used. Then, the performance of a variant of Kubernetes’ auto-scaling algorithm, that transparently uses the absolute usage measures to scale-in/out containers, is evaluated through a wide set of experiments. Finally, a detailed analysis of the state-of-the-art is presented
Scavenger: A Cloud Service for Optimizing Cost and Performance of ML Training
While the pay-as-you-go nature of cloud virtual machines (VMs) makes it easy
to spin-up large clusters for training ML models, it can also lead to
ballooning costs. The 100s of virtual machine sizes provided by cloud platforms
also makes it extremely challenging to select the ``right'' cloud cluster
configuration for training. Furthermore, the training time and cost of
distributed model training is highly sensitive to the cluster configurations,
and presents a large and complex tradeoff-space.
In this paper, we develop principled and practical techniques for optimizing
the training time and cost of distributed ML model training on the cloud. Our
key insight is that both parallel and statistical efficiency must be considered
when selecting the optimum job configuration parameters such as the number of
workers and the batch size. By combining conventional parallel scaling concepts
and new insights into SGD noise, our models accurately estimate the time and
cost on different cluster configurations with < 5% error. Using the repetitive
nature of training and our models, we can search for optimum cloud
configurations in a black-box, online manner. Our approach reduces training
times by 2 times and costs more more than 50%. Compared to an oracle-based
approach, our performance models are accurate to within 2% such that the search
imposes an overhead of just 10%
RFaaS: RDMA-Enabled FaaS Platform for Serverless High-Performance Computing
The rigid MPI programming model and batch scheduling dominate
high-performance computing. While clouds brought new levels of elasticity into
the world of computing, supercomputers still suffer from low resource
utilization rates. To enhance supercomputing clusters with the benefits of
serverless computing, a modern cloud programming paradigm for pay-as-you-go
execution of stateless functions, we present rFaaS, the first RDMA-aware
Function-as-a-Service (FaaS) platform. With hot invocations and decentralized
function placement, we overcome the major performance limitations of FaaS
systems and provide low-latency remote invocations in multi-tenant
environments. We evaluate the new serverless system through a series of
microbenchmarks and show that remote functions execute with negligible
performance overheads. We demonstrate how serverless computing can bring
elastic resource management into MPI-based high-performance applications.
Overall, our results show that MPI applications can benefit from modern cloud
programming paradigms to guarantee high performance at lower resource costs
- …