Data-driven soft sensors are extensively used in industrial and chemical
processes to predict hard-to-measure process variables whose real value is
difficult to track during routine operations. The regression models used by
these sensors often require a large number of labeled examples, yet obtaining
the label information can be very expensive given the high time and cost
required by quality inspections. In this context, active learning methods can
be highly beneficial as they can suggest the most informative labels to query.
However, most of the active learning strategies proposed for regression focus
on the offline setting. In this work, we adapt some of these approaches to the
stream-based scenario and show how they can be used to select the most
informative data points. We also demonstrate how to use a semi-supervised
architecture based on orthogonal autoencoders to learn salient features in a
lower dimensional space. The Tennessee Eastman Process is used to compare the
predictive performance of the proposed approaches.Comment: ICML 2022 Workshop on Adaptive Experimental Design and Active
Learning in the Real Worl