6 research outputs found
Soft-Sensor for Class Prediction of the Percentage of Pentanes in Butane at a Debutanizer Column
Refineries are complex industrial systems that transform crude oil into more valuable
subproducts. Due to the advances in sensors, easily measurable variables are continuously monitored
and several data-driven soft-sensors are proposed to control the distillation process and the quality
of the resultant subproducts. However, data preprocessing and soft-sensor modelling are still
complex and time-consuming tasks that are expected to be automatised in the context of Industry
4.0. Although recently several automated learning (autoML) approaches have been proposed, these
rely on model configuration and hyper-parameters optimisation. This paper advances the state-ofthe-
art by proposing an autoML approach that selects, among different normalisation and feature
weighting preprocessing techniques and various well-known Machine Learning (ML) algorithms,
the best configuration to create a reliable soft-sensor for the problem at hand. As proven in this
research, each normalisation method transforms a given dataset differently, which ultimately affects
the ML algorithm performance. The presented autoML approach considers the features preprocessing
importance, including it, and the algorithm selection and configuration, as a fundamental stage of the
methodology. The proposed autoML approach is applied to real data from a refinery in the Basque
Country to create a soft-sensor in order to complement the operators’ decision-making that, based on
the operational variables of a distillation process, detects 400 min in advance with 98.925% precision
if the resultant product does not reach the quality standards.This research received no external funding
Prediction of Metabolic Syndrome Based on Machine Learning Techniques with Emphasis on Feature Relevances and Explainability Analysis
Publisher Copyright: © 2023 IEEE.Metabolic syndrome (MetS) is considered to be a major public health problem worldwide leading to a high risk of diabetes and cardiovascular diseases. In this paper, data collected by the Precision Medicine Initiative of the Basque Country, named the AKRIBEA project, is employed to infer via Machine Learning (ML) techniques the features that have the most influence on predicting MetS in the general case and also separately by gender. Different Feature Normalization (FN) and Feature Weighting (FW) methods are applied and an exhaustive analysis of explainability by means of Shapley Additive Explanations (SHAP) and feature relevance methods is performed. Validation results show that the Extreme Gradient Boosting (XGB) with Min-Max FN and Mutual Information FW achieves the best trade-off between precision and recall performance metrics.Peer reviewe
Feature weighting methods: A review
Publisher Copyright: © 2021 Elsevier LtdIn the last decades, a wide portfolio of Feature Weighting (FW) methods have been proposed in the literature. Their main potential is the capability to transform the features in order to contribute to the Machine Learning (ML) algorithm metric proportionally to their estimated relevance for inferring the output pattern. Nevertheless, the extensive number of FW related works makes difficult to do a scientific study in this field of knowledge. Therefore, in this paper a global taxonomy for FW methods is proposed by focusing on: (1) the learning approach (supervised or unsupervised), (2) the methodology used to calculate the weights (global or local), and (3) the feedback obtained from the ML algorithm when estimating the weights (filter or wrapper). Among the different taxonomy levels, an extensive review of the state-of-the-art is presented, followed by some considerations and guide points for the FW strategies selection regarding significant aspects of real-world data analysis problems. Finally, a summary of conclusions and challenges in the FW field is briefly outlined.This work has been supported in part by the ELKARTEK Research Programme of the Basque Government (Argia KK-2019/00068 ), the HAZITEK program ( DATALYSE ZL-2018/00765 ), the University of the Basque Country ( GIU19/045 ), the DATAinc program ( 48-AF-W1-2019-00002 ), and a TECNALIA Research and Innovation PhD Scholarship .Peer reviewe
Influence of statistical feature normalisation methods on K-Nearest Neighbours and K-Means in the context of industry 4.0
Publisher Copyright: © 2022 Elsevier LtdNormalisation is a preprocessing technique widely employed in Machine Learning (ML)-based solutions for industry to equalise the features’ contribution. However, few researchers have analysed the normalisation effect and its implications on the ML algorithm performance, especially on Euclidean distance-based algorithms, such as the well-known K-Nearest Neighbours and K-means. In this sense, this paper formally analyses the effect of normalisation yielding results significantly far from the state-of-the-art traditional claims. In particular, this paper shows that normalisation does not equalise the contribution of the features, with the consequent impact on the performance of the learning process for a particular problem. More concretely, this demonstration is made on K-Nearest Neighbours and K-means Euclidean distance-based ML algorithms. This paper concludes that normalisation can be viewed as an unsupervised Feature Weighting method. In this context, a new metric (Normalisation weight) for measuring the impact of normalisation on the features is presented. Likewise, an analysis of the normalisation effect on the Euclidean distance is conducted and a new metric referred to as Proportional influence that measures the features influence on the Euclidean distance is proposed. Both metrics enable the automatic selection of the most appropriate normalisation method for a particular engineering problem, which can significantly improve both the computational cost and classification performance of K-Nearest Neighbours and K-means algorithms. The analytical conclusions are validated on well-known datasets from the UCI repository and a real-life application from the refinery industry.This work has been supported by a DATAinc fellowship ( 48-AF-W1-2019-00002 ) and a TECNALIA Research and Innovation PhD Scholarship . Besides, this work is part of the OILTWIN project (KK-2020/00052), funded by the ELKARTEK program of the SPRI-Basque Government .Peer reviewe
Analysis and Application of Normalization Methods with Supervised Feature Weighting to Improve K-means Accuracy
Publisher Copyright: © 2020, Springer Nature Switzerland AG.Normalization methods are widely employed for transforming the variables or features of a given dataset. In this paper three classical feature normalization methods, Standardization (St), Min-Max (MM) and Median Absolute Deviation (MAD), are studied in different synthetic datasets from UCI repository. An exhaustive analysis of the transformed features’ ranges and their influence on the Euclidean distance is performed, concluding that knowledge about the group structure gathered by each feature is needed to select the best normalization method for a given dataset. In order to effectively collect the features’ importance and adjust their contribution, this paper proposes a two-stage methodology for normalization and supervised feature weighting based on a Pearson correlation coefficient and on a Random Forest Feature Importance estimation method. Simulations on five different datasets reveal that our two-stage proposed methodology, in terms of accuracy, outperforms or at least maintains the K-means performance obtained if only normalization is applied.Acknowledgement. This work has been supported in part by the ELKARTEK program (SeNDANEU KK-2018/00032), the HAZITEK program (DATALYSE ZL-2018/00765) of the Basque Government and a TECNALIA Research and Innovation PhD Scholarship.Peer reviewe
Soft-sensor design for vacuum distillation bottom product penetration classification
Publisher Copyright: © 2020Petroleum oil refineries are complex systems that convert crude into subproducts of value. The profit of the refinery depends on the quality of the resultant subproducts, which are usually determined by a laboratory analysis called “Needle penetration”. Normally, this laboratory analysis is costly and time-consuming since entails around four hours for its accomplishment. In order to solve this limitation, this paper proposes a novel soft-sensor design for online vacuum distillation bottom product penetration classification. The design of the soft-sensor is based on a new approach, the two-stage methodology, that considers the joint effect of both Normalization and Supervised Filter Feature Weighting methods to transform the features. This methodology stands on the analysis of the real impact of applying normalization methods on the contribution of each feature, providing results that significantly differ from the traditional premises of the state-of-the-art. The analysis includes the impact of normalization on distance metrics such as the Euclidean. Also, a new adaptation of Pearson correlation for the estimation of the feature weights respect to categorical labels is proposed in this work. Once the features are transformed, five well-known Machine Learning (ML) algorithms (K-means, K-NN, RFc, SVC and MLP) are considered for the design of the soft-sensor. The final soft-sensor design is selected based on the feature space transformation strategy and the ML algorithm that achieves the best results in terms of: accuracy, precision, generalization and explicability. In order to validate the proposal, real monitored data from a petroleum refinery plant sited in The Basque Country is employed. Results show that the proposed two-stage methodology improves the results obtained by the Normalization methods.Peer reviewe