research

Monitoring the LHCb Experiment Computing Infrastructure with NAGIOS

Abstract

LHCb has a large and complex infrastructure consisting of thousands of servers and embedded computers, hundreds of network devices and a lot of common infrastructure services such as shared storage, login and time services, databases and many others. All aspects that are operatively critic are integrated into the standard Experiment Control System (ECS) based on PVSSII. This enables non-expert operators to do first-line reactions. As the lower level and in particular for monitoring the infrastructure, the Control System itself depends on a secondary infrastructure, whose monitoring is based on NAGIOS. We present the design and implementation of the fabric management based on NAGIOS. Care has been taken to complement rather than duplicate functionality available in the Experiment Control System

    Similar works