335,034 research outputs found
Monitoring Cluster on Online Compiler with Ganglia
Ganglia is an open source monitoring system for high performance computing (HPC) that collect both a whole cluster and every nodes status and report to the user. We use Ganglia to monitor our spasi.informatika.lipi.go.id (SPASI), a customized-fedora10-based cluster, for our cluster online compiler, CLAW (cluster access through web). Our experience on using Ganglia shows that Ganglia has a capability to view our cluster status and allow us to track them
LIKWID Monitoring Stack: A flexible framework enabling job specific performance monitoring for the masses
System monitoring is an established tool to measure the utilization and
health of HPC systems. Usually system monitoring infrastructures make no
connection to job information and do not utilize hardware performance
monitoring (HPM) data. To increase the efficient use of HPC systems automatic
and continuous performance monitoring of jobs is an essential component. It can
help to identify pathological cases, provides instant performance feedback to
the users, offers initial data to judge on the optimization potential of
applications and helps to build a statistical foundation about application
specific system usage. The LIKWID monitoring stack is a modular framework build
on top of the LIKWID tools library. It aims on enabling job specific
performance monitoring using HPM data, system metrics and application-level
data for small to medium sized commodity clusters. Moreover, it is designed to
integrate in existing monitoring infrastructures to speed up the change from
pure system monitoring to job-aware monitoring.Comment: 4 pages, 4 figures. Accepted for HPCMASPA 2017, the Workshop on
Monitoring and Analysis for High Performance Computing Systems Plus
Applications, held in conjunction with IEEE Cluster 2017, Honolulu, HI,
September 5, 201
Developing a MongoDB Monitoring System using NoSQL Databases for Monitored Data Management
MongoDB is a NoSQL database, specifically used to efficiently store and access a large quantity of unstructured data over a distributed cluster of nodes. As the number of nodes in the cluster increases, it becomes difficult to manually monitor different components of the database. This poses an interesting problem of monitoring the MongoDB database to view the state of the system at any point. Although a few proprietary monitoring tools exist to monitor MongoDB clusters, they are not freely available for use in academia. Therefore, the focus of this project is to create a monitoring system that is completely built from open-source resources. To automatically monitor a MongoDB cluster, several components are to be built: monitoring agents that obtain this information from the nodes in the cluster, storage mechanisms to save this information for future use and write buffers to temporarily hold monitored records before they are written to storage. The monitoring agents have to be created to obtain only the information that a user of a monitoring system might find useful. Since monitored data is expected to be of high volume and velocity, NoSQL databases are ideal candidates for the storage component of the monitoring system. MongoDB, Cassandra, and OpenTSDB are identified as suitable candidates and their performances are compared with respect to several aspects such as read and write performance and storage requirements. In an attempt to improve the write performance of the system, the performance impact of adding a BigQueue as a write buffer to the storage is also studied
C2MS: Dynamic Monitoring and Management of Cloud Infrastructures
Server clustering is a common design principle employed by many organisations
who require high availability, scalability and easier management of their
infrastructure. Servers are typically clustered according to the service they
provide whether it be the application(s) installed, the role of the server or
server accessibility for example. In order to optimize performance, manage load
and maintain availability, servers may migrate from one cluster group to
another making it difficult for server monitoring tools to continuously monitor
these dynamically changing groups. Server monitoring tools are usually
statically configured and with any change of group membership requires manual
reconfiguration; an unreasonable task to undertake on large-scale cloud
infrastructures.
In this paper we present the Cloudlet Control and Management System (C2MS); a
system for monitoring and controlling dynamic groups of physical or virtual
servers within cloud infrastructures. The C2MS extends Ganglia - an open source
scalable system performance monitoring tool - by allowing system administrators
to define, monitor and modify server groups without the need for server
reconfiguration. In turn administrators can easily monitor group and individual
server metrics on large-scale dynamic cloud infrastructures where roles of
servers may change frequently. Furthermore, we complement group monitoring with
a control element allowing administrator-specified actions to be performed over
servers within service groups as well as introduce further customized
monitoring metrics. This paper outlines the design, implementation and
evaluation of the C2MS.Comment: Proceedings of the The 5th IEEE International Conference on Cloud
Computing Technology and Science (CloudCom 2013), 8 page
Aplikasi Monitoring Kinerja Processor Pada Lingkungan Linux Cluster Secara Real Time
Linux clusters have become the paradigm of choice for the execution of applications of science, engineering and commerce in a large scale. This is because computing using cluster technology is cheaper, has high performance, availability of many components - compenents own hardware and software that we can get for free that can be used to develop applications of the cluster. This book discussing about technology, Linux clusters, the architecture of the system, software that is used to develop applications (parallel program). The aim of this study is to know the performance of 8 processor computers using cluster with Debian linux. To test this Final project will be sorting the applications in use by many of the numbers in the thousands of numbers. To monitor the performance of computer processor at cluster while executing parallel programs The results of the monitoring processor performance will be displayed in real time in graphical for
An ANFIS estimator based data aggregation scheme for fault tolerant Wireless Sensor Networks
AbstractWireless Sensor Networks (WSNs) are used widely in many mission critical applications like battlefield surveillance, environmental monitoring, forest fire monitoring etc. A lot of research is being done to reduce the energy consumption, enhance the network lifetime and fault tolerance capability of WSNs. This paper proposes an ANFIS estimator based data aggregation scheme called Neuro-Fuzzy Optimization Model (NFOM) for the design of fault-tolerant WSNs. The proposed scheme employs an Adaptive Neuro-Fuzzy Inference System (ANFIS) estimator for intra-cluster and inter-cluster fault detection in WSNs. The Cluster Head (CH) acts as the intra-cluster fault detection and data aggregation manager. It identifies the faulty Non-Cluster Head (NCH) nodes in a cluster by the application of the proposed ANFIS estimator. The CH then aggregates data from only the normal NCHs in that cluster and forwards it to the high-energy gateway nodes. The gateway nodes act as the inter-cluster fault detection and data aggregation manager. They pro-actively identify the faulty CHs by the application of the proposed ANFIS estimator and perform inter-cluster fault tolerant data aggregation. The simulation results confirm that the proposed NFOM data aggregation scheme can significantly improve the network performance as compared to other existing schemes with respect to different performance metrics
Towards an Autonomic Cluster Management System (ACMS) with Reflex Autonomicity
Cluster computing, whereby a large number of simple processors or nodes are combined together to apparently function as a single powerful computer, has emerged as a research area in its own right. The approach offers a relatively inexpensive means of providing a fault-tolerant environment and achieving significant computational capabilities for high-performance computing applications. However, the task of manually managing and configuring a cluster quickly becomes daunting as the cluster grows in size. Autonomic computing, with its vision to provide self-management, can potentially solve many of the problems inherent in cluster management. We describe the development of a prototype Autonomic Cluster Management System (ACMS) that exploits autonomic properties in automating cluster management and its evolution to include reflex reactions via pulse monitoring
A scalable monitoring for the CMS Filter Farm based on elasticsearch
A flexible monitoring system has been designed for the CMS File-based Filter Farm making use of modern data mining and analytics components. All the metadata and monitoring information concerning data flow and execution of the HLT are generated locally in the form of small documents using the JSON encoding. These documents are indexed into a hierarchy of elasticsearch (es) clusters along with process and system log information. Elasticsearch is a search server based on Apache Lucene. It provides a distributed, multitenant-capable search and aggregation engine. Since es is schema-free, any new information can be added seamlessly and the unstructured information can be queried in non-predetermined ways. The leaf es clusters consist of the very same nodes that form the Filter Farm thus providing natural horizontal scaling. A separate central" es cluster is used to collect and index aggregated information. The fine-grained information, all the way to individual processes, remains available in the leaf clusters. The central es cluster provides quasi-real-time high-level monitoring information to any kind of client. Historical data can be retrieved to analyse past problems or correlate them with external information. We discuss the design and performance of this system in the context of the CMS DAQ commissioning for LHC Run 2
- …