Refineries are complex industrial systems that transform crude oil into more valuable
subproducts. Due to the advances in sensors, easily measurable variables are continuously monitored
and several data-driven soft-sensors are proposed to control the distillation process and the quality
of the resultant subproducts. However, data preprocessing and soft-sensor modelling are still
complex and time-consuming tasks that are expected to be automatised in the context of Industry
4.0. Although recently several automated learning (autoML) approaches have been proposed, these
rely on model configuration and hyper-parameters optimisation. This paper advances the state-ofthe-
art by proposing an autoML approach that selects, among different normalisation and feature
weighting preprocessing techniques and various well-known Machine Learning (ML) algorithms,
the best configuration to create a reliable soft-sensor for the problem at hand. As proven in this
research, each normalisation method transforms a given dataset differently, which ultimately affects
the ML algorithm performance. The presented autoML approach considers the features preprocessing
importance, including it, and the algorithm selection and configuration, as a fundamental stage of the
methodology. The proposed autoML approach is applied to real data from a refinery in the Basque
Country to create a soft-sensor in order to complement the operators’ decision-making that, based on
the operational variables of a distillation process, detects 400 min in advance with 98.925% precision
if the resultant product does not reach the quality standards.This research received no external funding