1 research outputs found
DiG: Enabling Out-of-Band Scalable High-Resolution Monitoring for Data-Center Analytics, Automation and Control (Extended)
Data centers are increasing in size and complexity, and we need scalable
approaches to support their automated analysis and control. Performance
counters and power consumption are their key "vital signs". State-of-the-Art
(SoA) monitoring systems provide built-in tools to collect performance
measurements, and custom solutions to get insight on their power consumption.
However, with the increase in measurement resolution (in time and space) and
the ensuing huge amount of measurement data to handle, new challenges arise,
such as bottlenecks on the network bandwidth, storage and software overhead on
the monitoring units. To face these challenges we propose a novel monitoring
platform for data centers, which enables real-time high-resolution profiling
(i.e., all available performance counters and the entire signal bandwidth of
the power consumption at the plug - sampling up to 20us - with an error below
1%) and analytics, both at the edge (node-level analysis) and on a centralized
unit (cluster-level analysis). The monitoring infrastructure is completely
out-of-band, scalable, technology agnostic and low cost, and it is already
installed in a SoA high-performance compute cluster (i.e., D.A.V.I.D.E. - 18th
in Green500 November 2017)