Monitoring the behavior of automated real-time stream processing systems has
become one of the most relevant problems in real world applications. Such
systems have grown in complexity relying heavily on high dimensional input
data, and data hungry Machine Learning (ML) algorithms. We propose a flexible
system, Feature Monitoring (FM), that detects data drifts in such data sets,
with a small and constant memory footprint and a small computational cost in
streaming applications. The method is based on a multi-variate statistical test
and is data driven by design (full reference distributions are estimated from
the data). It monitors all features that are used by the system, while
providing an interpretable features ranking whenever an alarm occurs (to aid in
root cause analysis). The computational and memory lightness of the system
results from the use of Exponential Moving Histograms. In our experimental
study, we analyze the system's behavior with its parameters and, more
importantly, show examples where it detects problems that are not directly
related to a single feature. This illustrates how FM eliminates the need to add
custom signals to detect specific types of problems and that monitoring the
available space of features is often enough.Comment: 10 pages, 5 figures. AutoML, KDD22, August 14-17, 2022, Washington,
DC, U