Monitoring is an essential aspect of maintaining and developing computer
systems that increases in difficulty proportional to the size of the system.
The need for robust monitoring tools has become more evident with the advent of
cloud computing. Infrastructure as a Service (IaaS) clouds allow end users to
deploy vast numbers of virtual machines as part of dynamic and transient
architectures. Current monitoring solutions, including many of those in the
open-source domain rely on outdated concepts including manual deployment and
configuration, centralised data collection and adapt poorly to membership
churn.
In this paper we propose the development of a cloud monitoring suite to
provide scalable and robust lookup, data collection and analysis services for
large-scale cloud systems. In lieu of centrally managed monitoring we propose a
multi-tier architecture using a layered gossip protocol to aggregate monitoring
information and facilitate lookup, information collection and the
identification of redundant capacity. This allows for a resource aware data
collection and storage architecture that operates over the system being
monitored. This in turn enables monitoring to be done in-situ without the need
for significant additional infrastructure to facilitate monitoring services. We
evaluate this approach against alternative monitoring paradigms and demonstrate
how our solution is well adapted to usage in a cloud-computing context.Comment: Extended Abstract for the ACM International Symposium on
High-Performance Parallel and Distributed Computing (HPDC 2013) Poster Trac