6,558 research outputs found
Monitoring Large-Scale Cloud Systems with Layered Gossip Protocols
Monitoring is an essential aspect of maintaining and developing computer
systems that increases in difficulty proportional to the size of the system.
The need for robust monitoring tools has become more evident with the advent of
cloud computing. Infrastructure as a Service (IaaS) clouds allow end users to
deploy vast numbers of virtual machines as part of dynamic and transient
architectures. Current monitoring solutions, including many of those in the
open-source domain rely on outdated concepts including manual deployment and
configuration, centralised data collection and adapt poorly to membership
churn.
In this paper we propose the development of a cloud monitoring suite to
provide scalable and robust lookup, data collection and analysis services for
large-scale cloud systems. In lieu of centrally managed monitoring we propose a
multi-tier architecture using a layered gossip protocol to aggregate monitoring
information and facilitate lookup, information collection and the
identification of redundant capacity. This allows for a resource aware data
collection and storage architecture that operates over the system being
monitored. This in turn enables monitoring to be done in-situ without the need
for significant additional infrastructure to facilitate monitoring services. We
evaluate this approach against alternative monitoring paradigms and demonstrate
how our solution is well adapted to usage in a cloud-computing context.Comment: Extended Abstract for the ACM International Symposium on
High-Performance Parallel and Distributed Computing (HPDC 2013) Poster Trac
Workflow Partitioning and Deployment on the Cloud using Orchestra
Orchestrating service-oriented workflows is typically based on a design model
that routes both data and control through a single point - the centralised
workflow engine. This causes scalability problems that include the unnecessary
consumption of the network bandwidth, high latency in transmitting data between
the services, and performance bottlenecks. These problems are highly prominent
when orchestrating workflows that are composed from services dispersed across
distant geographical locations. This paper presents a novel workflow
partitioning approach, which attempts to improve the scalability of
orchestrating large-scale workflows. It permits the workflow computation to be
moved towards the services providing the data in order to garner optimal
performance results. This is achieved by decomposing the workflow into smaller
sub workflows for parallel execution, and determining the most appropriate
network locations to which these sub workflows are transmitted and subsequently
executed. This paper demonstrates the efficiency of our approach using a set of
experimental workflows that are orchestrated over Amazon EC2 and across several
geographic network regions.Comment: To appear in Proceedings of the IEEE/ACM 7th International Conference
on Utility and Cloud Computing (UCC 2014
Observing the clouds : a survey and taxonomy of cloud monitoring
This research was supported by a Royal Society Industry Fellowship and an Amazon Web Services (AWS) grant. Date of Acceptance: 10/12/2014Monitoring is an important aspect of designing and maintaining large-scale systems. Cloud computing presents a unique set of challenges to monitoring including: on-demand infrastructure, unprecedented scalability, rapid elasticity and performance uncertainty. There are a wide range of monitoring tools originating from cluster and high-performance computing, grid computing and enterprise computing, as well as a series of newer bespoke tools, which have been designed exclusively for cloud monitoring. These tools express a number of common elements and designs, which address the demands of cloud monitoring to various degrees. This paper performs an exhaustive survey of contemporary monitoring tools from which we derive a taxonomy, which examines how effectively existing tools and designs meet the challenges of cloud monitoring. We conclude by examining the socio-technical aspects of monitoring, and investigate the engineering challenges and practices behind implementing monitoring strategies for cloud computing.Publisher PDFPeer reviewe
A Dataflow Language for Decentralised Orchestration of Web Service Workflows
Orchestrating centralised service-oriented workflows presents significant
scalability challenges that include: the consumption of network bandwidth,
degradation of performance, and single points of failure. This paper presents a
high-level dataflow specification language that attempts to address these
scalability challenges. This language provides simple abstractions for
orchestrating large-scale web service workflows, and separates between the
workflow logic and its execution. It is based on a data-driven model that
permits parallelism to improve the workflow performance. We provide a
decentralised architecture that allows the computation logic to be moved
"closer" to services involved in the workflow. This is achieved through
partitioning the workflow specification into smaller fragments that may be sent
to remote orchestration services for execution. The orchestration services rely
on proxies that exploit connectivity to services in the workflow. These proxies
perform service invocations and compositions on behalf of the orchestration
services, and carry out data collection, retrieval, and mediation tasks. The
evaluation of our architecture implementation concludes that our decentralised
approach reduces the execution time of workflows, and scales accordingly with
the increasing size of data sets.Comment: To appear in Proceedings of the IEEE 2013 7th International Workshop
on Scientific Workflows, in conjunction with IEEE SERVICES 201
Routing and Staffing when Servers are Strategic
Traditionally, research focusing on the design of routing and staffing
policies for service systems has modeled servers as having fixed (possibly
heterogeneous) service rates. However, service systems are generally staffed by
people. Furthermore, people respond to workload incentives; that is, how hard a
person works can depend both on how much work there is, and how the work is
divided between the people responsible for it. In a service system, the routing
and staffing policies control such workload incentives; and so the rate servers
work will be impacted by the system's routing and staffing policies. This
observation has consequences when modeling service system performance, and our
objective is to investigate those consequences.
We do this in the context of the M/M/N queue, which is the canonical model
for large service systems. First, we present a model for "strategic" servers
that choose their service rate in order to maximize a trade-off between an
"effort cost", which captures the idea that servers exert more effort when
working at a faster rate, and a "value of idleness", which assumes that servers
value having idle time. Next, we characterize the symmetric Nash equilibrium
service rate under any routing policy that routes based on the server idle
time. We find that the system must operate in a quality-driven regime, in which
servers have idle time, in order for an equilibrium to exist, which implies
that the staffing must have a first-order term that strictly exceeds that of
the common square-root staffing policy. Then, within the class of policies that
admit an equilibrium, we (asymptotically) solve the problem of minimizing the
total cost, when there are linear staffing costs and linear waiting costs.
Finally, we end by exploring the question of whether routing policies that are
based on the service rate, instead of the server idle time, can improve system
performance.Comment: First submitted for journal publication in 2014; accepted for
publication in Operations Research in 2016. Presented in select conferences
throughout 201
Academic Cloud Computing Research: Five Pitfalls and Five Opportunities
This discussion paper argues that there are five fundamental pitfalls, which
can restrict academics from conducting cloud computing research at the
infrastructure level, which is currently where the vast majority of academic
research lies. Instead academics should be conducting higher risk research, in
order to gain understanding and open up entirely new areas.
We call for a renewed mindset and argue that academic research should focus
less upon physical infrastructure and embrace the abstractions provided by
clouds through five opportunities: user driven research, new programming
models, PaaS environments, and improved tools to support elasticity and
large-scale debugging. The objective of this paper is to foster discussion, and
to define a roadmap forward, which will allow academia to make longer-term
impacts to the cloud computing community.Comment: Accepted and presented at the 6th USENIX Workshop on Hot Topics in
Cloud Computing (HotCloud'14
- …