351,538 research outputs found

    Monitoring Large-Scale Cloud Systems with Layered Gossip Protocols

    Full text link
    Monitoring is an essential aspect of maintaining and developing computer systems that increases in difficulty proportional to the size of the system. The need for robust monitoring tools has become more evident with the advent of cloud computing. Infrastructure as a Service (IaaS) clouds allow end users to deploy vast numbers of virtual machines as part of dynamic and transient architectures. Current monitoring solutions, including many of those in the open-source domain rely on outdated concepts including manual deployment and configuration, centralised data collection and adapt poorly to membership churn. In this paper we propose the development of a cloud monitoring suite to provide scalable and robust lookup, data collection and analysis services for large-scale cloud systems. In lieu of centrally managed monitoring we propose a multi-tier architecture using a layered gossip protocol to aggregate monitoring information and facilitate lookup, information collection and the identification of redundant capacity. This allows for a resource aware data collection and storage architecture that operates over the system being monitored. This in turn enables monitoring to be done in-situ without the need for significant additional infrastructure to facilitate monitoring services. We evaluate this approach against alternative monitoring paradigms and demonstrate how our solution is well adapted to usage in a cloud-computing context.Comment: Extended Abstract for the ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2013) Poster Trac

    Bringing Introspection Into the BlobSeer Data-Management System Using the MonALISA Distributed Monitoring Framework

    Get PDF
    Held in conjunction with CISIS 2010 ConferenceInternational audienceIntrospection is the prerequisite of an autonomic behavior, the ïŹrst step towards a performance improvement and a resource-usage optimization for large-scale distributed systems. In grid environments, the task of observing the application behavior is assigned to monitoring systems. However, most of them are designed to provide general resource information and do not consider speciïŹc information for higher-level services. More specifically, in the context of data-intensive applications, a speciïŹc introspection layer is required in order to collect data about the usage of storage resources, about data access patterns, etc. This paper discusses the requirements for an introspection layer in a data-management system for large-scale distributed infrastructures. We focus on the case of BlobSeer, a large-scale distributed system for storing massive data. The paper explains why and how to enhance BlobSeer with introspective capabilities and proposes a three-layered architecture relying on the MonALISA monitoring framework. This approach has been evaluated on the Grid'5000 testbed, with experiments that prove the feasibility of generating relevant information related to the state and the behavior of the system

    DNA-inspired Scheme for Building the Energy Profile of HPC Systems

    Get PDF
    International audienceEnergy usage is becoming a challenge for the design of next generation large scale distributed systems. This paper explores an inno- vative approach of profiling such systems. It proposes a DNA-like solution without making any assumptions on the running applications and used hardware. This profiling based on internal counters usage and energy monitoring allows to isolate specific phases during the execution and enables some energy consumption control and energy usage prediction. First experimental validations of the system modeling are presented and analyzed

    WallMon : Interactive distributed monitoring of process-level resource usage on display and compute clusters

    Get PDF
    To achieve low overhead, traditional cluster monitoring systems sample data at low frequencies and with coarse granularity. However, interactive monitoring requires frequent (up to 60 Hz) sampling of fine-grained data and visualization tools that can explore and display data in near real-time. This makes traditional cluster monitoring systems unsuited for interactive monitoring of distributed cluster applications, as they fail to capture short-duration events, making understanding the performance relationship between processes on the same or different nodes difficult. To address this issue, WallMon was developed, a tool for interactive visual exploration of performance behaviors in distributed systems. For gathering of data, WallMon is centered around an abstraction of collectors and handlers; collectors gathers data of interest, such as CPU and memory usage, and forwards it to handlers in a push-based fashion, while handlers take action upon the data. WallMon captures and visualizes data for every process on every node, as well as overall node statistics. Data is visualized using a technique inspired by the concept of information flocking. WallMon's design is based on the client-server model, and it is extensible through a module system that encapsulates functionality specific to monitoring (collectors) and visualization (handlers). A set of experiments have been carried out on a cluster of 29 nodes with 180 processes per node. Performance results show 7% (of 100) CPU usage at 64 Hz sampling rate when performing process-level monitoring with WallMon. Using WallMon's interactive visualization, we have observed interesting patterns in different parallel and distributed systems, such as unexpected ratio of user- and kernel-level execution among processes in a particular distributed system

    A Low Cost UWB Based Solution for Direct Georeferencing UAV Photogrammetry

    Get PDF
    Thanks to their flexibility and availability at reduced costs, Unmanned Aerial Vehicles (UAVs) have been recently used on a wide range of applications and conditions. Among these, they can play an important role in monitoring critical events (e.g., disaster monitoring) when the presence of humans close to the scene shall be avoided for safety reasons, in precision farming and surveying. Despite the very large number of possible applications, their usage is mainly limited by the availability of the Global Navigation Satellite System (GNSS) in the considered environment: indeed, GNSS is of fundamental importance in order to reduce positioning error derived by the drift of (low-cost) Micro-Electro-Mechanical Systems (MEMS) internal sensors. In order to make the usage of UAVs possible even in critical environments (when GNSS is not available or not reliable, e.g., close to mountains or in city centers, close to high buildings), this paper considers the use of a low cost Ultra Wide-Band (UWB) system as the positioning method. Furthermore, assuming the use of a calibrated camera, UWB positioning is exploited to achieve metric reconstruction on a local coordinate system. Once the georeferenced position of at least three points (e.g., positions of three UWB devices) is known, then georeferencing can be obtained, as well. The proposed approach is validated on a specific case study, the reconstruction of the façade of a university building. Average error on 90 check points distributed over the building façade, obtained by georeferencing by means of the georeferenced positions of four UWB devices at fixed positions, is 0.29 m. For comparison, the average error obtained by using four ground control points is 0.18 m

    Master of Science in Computing

    Get PDF
    thesisCurrent Intrusion Detection Systems (IDS) in a typical enterprise or campus network are limited by having a number of static monitoring points and static IDS resources deployed. The monitoring points are typically deployed using hardware optical taps or span ports which are directly fed into the IDS. The IDS system is a compute resource requiring dedicated-server-grade hardware, and these are statically configured when installing the network for an enterprise or campus. We designed a framework for making a distributed elastic Intrusion Detection System (IDS) for a Software Defined Network (SDN) capable network, called Distributed Elastic Intrusion DeTECTion (DEIDtect). We combine the flexibility of SDN and the elastic resource usage of a cloud infrastructure with a DEIDtect orchestrating controller to achieve an elastic IDS framework. DEIDtect enables simple and more dynamic management of IDS systems. The flexibility of our approach also enables new IDS use cases and deployment strategies

    Resource state monitoring of service transactions in cloud systems

    Get PDF
    In cloud systems, services constituting a transaction may spread over a large number of servers or clusters. Theoretically, these services could consume cloud resources unlimitedly. To avoid financial loss due to resource overuse, clouds have to monitor the state of resources consumed by the services – collect values of consumption, and evaluate whether the combined usage of resources has excessed a pre-defined upper bound or not. The distributed nature of the services introduces a challenge to the monitoring system on how to summarise distributed state information with low cost. We present our resource state monitoring solution to capture the challenge introduced by services hosted in clouds. Our solution tracks the resource consumed by each service constituting a transaction individually whilst ensures the whole transaction does not overuse the allocated resource. It improves availability by avoiding single points of failure, and achieves scalability by minimising message exchanges.We performed experimental analyses that indicate this work can provide an inexpensive resource monitoring solution for transactions in clouds
    • 

    corecore