335,034 research outputs found

    Monitoring Cluster on Online Compiler with Ganglia

    Get PDF
    Ganglia is an open source monitoring system for high performance computing (HPC) that collect both a whole cluster and every nodes status and report to the user. We use Ganglia to monitor our spasi.informatika.lipi.go.id (SPASI), a customized-fedora10-based cluster, for our cluster online compiler, CLAW (cluster access through web). Our experience on using Ganglia shows that Ganglia has a capability to view our cluster status and allow us to track them

    LIKWID Monitoring Stack: A flexible framework enabling job specific performance monitoring for the masses

    Full text link
    System monitoring is an established tool to measure the utilization and health of HPC systems. Usually system monitoring infrastructures make no connection to job information and do not utilize hardware performance monitoring (HPM) data. To increase the efficient use of HPC systems automatic and continuous performance monitoring of jobs is an essential component. It can help to identify pathological cases, provides instant performance feedback to the users, offers initial data to judge on the optimization potential of applications and helps to build a statistical foundation about application specific system usage. The LIKWID monitoring stack is a modular framework build on top of the LIKWID tools library. It aims on enabling job specific performance monitoring using HPM data, system metrics and application-level data for small to medium sized commodity clusters. Moreover, it is designed to integrate in existing monitoring infrastructures to speed up the change from pure system monitoring to job-aware monitoring.Comment: 4 pages, 4 figures. Accepted for HPCMASPA 2017, the Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications, held in conjunction with IEEE Cluster 2017, Honolulu, HI, September 5, 201

    Developing a MongoDB Monitoring System using NoSQL Databases for Monitored Data Management

    Get PDF
    MongoDB is a NoSQL database, specifically used to efficiently store and access a large quantity of unstructured data over a distributed cluster of nodes. As the number of nodes in the cluster increases, it becomes difficult to manually monitor different components of the database. This poses an interesting problem of monitoring the MongoDB database to view the state of the system at any point. Although a few proprietary monitoring tools exist to monitor MongoDB clusters, they are not freely available for use in academia. Therefore, the focus of this project is to create a monitoring system that is completely built from open-source resources. To automatically monitor a MongoDB cluster, several components are to be built: monitoring agents that obtain this information from the nodes in the cluster, storage mechanisms to save this information for future use and write buffers to temporarily hold monitored records before they are written to storage. The monitoring agents have to be created to obtain only the information that a user of a monitoring system might find useful. Since monitored data is expected to be of high volume and velocity, NoSQL databases are ideal candidates for the storage component of the monitoring system. MongoDB, Cassandra, and OpenTSDB are identified as suitable candidates and their performances are compared with respect to several aspects such as read and write performance and storage requirements. In an attempt to improve the write performance of the system, the performance impact of adding a BigQueue as a write buffer to the storage is also studied

    C2MS: Dynamic Monitoring and Management of Cloud Infrastructures

    Full text link
    Server clustering is a common design principle employed by many organisations who require high availability, scalability and easier management of their infrastructure. Servers are typically clustered according to the service they provide whether it be the application(s) installed, the role of the server or server accessibility for example. In order to optimize performance, manage load and maintain availability, servers may migrate from one cluster group to another making it difficult for server monitoring tools to continuously monitor these dynamically changing groups. Server monitoring tools are usually statically configured and with any change of group membership requires manual reconfiguration; an unreasonable task to undertake on large-scale cloud infrastructures. In this paper we present the Cloudlet Control and Management System (C2MS); a system for monitoring and controlling dynamic groups of physical or virtual servers within cloud infrastructures. The C2MS extends Ganglia - an open source scalable system performance monitoring tool - by allowing system administrators to define, monitor and modify server groups without the need for server reconfiguration. In turn administrators can easily monitor group and individual server metrics on large-scale dynamic cloud infrastructures where roles of servers may change frequently. Furthermore, we complement group monitoring with a control element allowing administrator-specified actions to be performed over servers within service groups as well as introduce further customized monitoring metrics. This paper outlines the design, implementation and evaluation of the C2MS.Comment: Proceedings of the The 5th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2013), 8 page

    Aplikasi Monitoring Kinerja Processor Pada Lingkungan Linux Cluster Secara Real Time

    Get PDF
    Linux clusters have become the paradigm of choice for the execution of applications of science, engineering and commerce in a large scale. This is because computing using cluster technology is cheaper, has high performance, availability of many components - compenents own hardware and software that we can get for free that can be used to develop applications of the cluster. This book discussing about technology, Linux clusters, the architecture of the system, software that is used to develop applications (parallel program). The aim of this study is to know the performance of 8 processor computers using cluster with Debian linux. To test this Final project will be sorting the applications in use by many of the numbers in the thousands of numbers. To monitor the performance of computer processor at cluster while executing parallel programs The results of the monitoring processor performance will be displayed in real time in graphical for

    An ANFIS estimator based data aggregation scheme for fault tolerant Wireless Sensor Networks

    Get PDF
    AbstractWireless Sensor Networks (WSNs) are used widely in many mission critical applications like battlefield surveillance, environmental monitoring, forest fire monitoring etc. A lot of research is being done to reduce the energy consumption, enhance the network lifetime and fault tolerance capability of WSNs. This paper proposes an ANFIS estimator based data aggregation scheme called Neuro-Fuzzy Optimization Model (NFOM) for the design of fault-tolerant WSNs. The proposed scheme employs an Adaptive Neuro-Fuzzy Inference System (ANFIS) estimator for intra-cluster and inter-cluster fault detection in WSNs. The Cluster Head (CH) acts as the intra-cluster fault detection and data aggregation manager. It identifies the faulty Non-Cluster Head (NCH) nodes in a cluster by the application of the proposed ANFIS estimator. The CH then aggregates data from only the normal NCHs in that cluster and forwards it to the high-energy gateway nodes. The gateway nodes act as the inter-cluster fault detection and data aggregation manager. They pro-actively identify the faulty CHs by the application of the proposed ANFIS estimator and perform inter-cluster fault tolerant data aggregation. The simulation results confirm that the proposed NFOM data aggregation scheme can significantly improve the network performance as compared to other existing schemes with respect to different performance metrics

    Towards an Autonomic Cluster Management System (ACMS) with Reflex Autonomicity

    Get PDF
    Cluster computing, whereby a large number of simple processors or nodes are combined together to apparently function as a single powerful computer, has emerged as a research area in its own right. The approach offers a relatively inexpensive means of providing a fault-tolerant environment and achieving significant computational capabilities for high-performance computing applications. However, the task of manually managing and configuring a cluster quickly becomes daunting as the cluster grows in size. Autonomic computing, with its vision to provide self-management, can potentially solve many of the problems inherent in cluster management. We describe the development of a prototype Autonomic Cluster Management System (ACMS) that exploits autonomic properties in automating cluster management and its evolution to include reflex reactions via pulse monitoring

    A scalable monitoring for the CMS Filter Farm based on elasticsearch

    Get PDF
    A flexible monitoring system has been designed for the CMS File-based Filter Farm making use of modern data mining and analytics components. All the metadata and monitoring information concerning data flow and execution of the HLT are generated locally in the form of small documents using the JSON encoding. These documents are indexed into a hierarchy of elasticsearch (es) clusters along with process and system log information. Elasticsearch is a search server based on Apache Lucene. It provides a distributed, multitenant-capable search and aggregation engine. Since es is schema-free, any new information can be added seamlessly and the unstructured information can be queried in non-predetermined ways. The leaf es clusters consist of the very same nodes that form the Filter Farm thus providing natural horizontal scaling. A separate central" es cluster is used to collect and index aggregated information. The fine-grained information, all the way to individual processes, remains available in the leaf clusters. The central es cluster provides quasi-real-time high-level monitoring information to any kind of client. Historical data can be retrieved to analyse past problems or correlate them with external information. We discuss the design and performance of this system in the context of the CMS DAQ commissioning for LHC Run 2
    corecore