13 research outputs found

    A comparison of machine learning methods for extremely unbalanced industrial quality data

    Get PDF
    The Industry 4.0 revolution is impacting manufacturing companies, which need to adopt more data intelligence processes in order to compete in the markets they operate. In particular, quality control is a key manufacturing process that has been addressed by Machine Learning (ML), aiming to improve productivity (e.g., reduce costs). However, modern industries produce a tiny portion of defective products, which results in extremely unbalanced datasets. In this paper, we analyze recent big data collected from a major automotive assembly manufacturer and related with the quality of eight products. The eight datasets in- clude millions of records but only a tiny percentage of failures (less than 0.07%). To handle such datasets, we perform a two-stage ML comparison study. Firstly, we consider two products and explore four ML algorithms, Random Forest (RF), two Automated ML (AutoML) methods and a deep Autoencoder (AE), and three balancing training strategies, namely None, Synthetic Minority Oversampling Technique (SMOTE) and Gaussian Copula (GC). When considering both classification performance and computational effort, interesting results were obtained by RF. Then, the selected RF was further explored by considering all eight datasets and five balancing methods: None, SMOTE, GC, Random Undersampling (RU) and Tomek Links (TL). Overall, competitive results were achieved by the combination of GC with RF.This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internation- alization Programme (COMPETE 2020) [Project n 39479; Funding Reference: POCI-01-0247-FEDER-39479]

    A machine learning approach for spare parts lifetime estimation

    Get PDF
    Under the Industry 4.0 concept, there is increased usage of data-driven analytics to enhance the production process. In particular, equipment maintenance is a key industrial area that can benefit from using Machine Learning (ML) models. In this paper, we propose a novel Remaining Useful Life (RUL) ML-based spare part prediction that considers maintenance historical records, which are commonly available in several industries and thus more easy to collect when compared with specific equipment measurement data. As a case study, we consider 18,355 RUL records from an automotive multimedia assembly company, where each RUL value is defined as the full amount of units produced within two consecutive corrective maintenance actions. Under regression modeling, two categorical input transforms and eight ML algorithms were explored by considering a realistic rolling window evaluation. The best prediction model, which adopts an Inverse Document Frequency (IDF) data transformation and the Random Forest (RF) algorithm, produced high-quality RUL prediction results under a reasonable computational effort. Moreover, we have executed an eXplainable Artificial Intelligence (XAI) approach, based on the SHapley Additive exPlanations (SHAP) method, over the selected RF model, showing its potential value to extract useful explanatory knowledge for the maintenance domain.- This work has been supported by FCT -Fundação para a CiĂȘncia e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020

    A comparison of machine learning approaches for predicting in-car display production quality

    Get PDF
    In this paper, we explore eight Machine Learning (ML) approaches (binary and one-class) to predict the quality of in-car displays, measured using Black Uniformity (BU) tests. During production, the industrial manufacturer routinely executes intermediate assembly (screwing and gluing) and functional tests that can signal potential causes for abnormal display units. By using these intermediate tests as inputs, the ML model can be used to identify the unknown relationships between intermediate and BU tests, helping to detect failure causes. In particular, we compare two sets of input variables (A and B) with hundreds of intermediate quality measures related with assembly and functional tests. Using recently collected industrial data, regarding around 147 thousand in-car display records, we performed two evaluation procedures, using first a time ordered train-test split and then a more robust rolling windows. Overall, the best predictive results (92%) were obtained using the full set of inputs (B) and an Automated ML (AutoML) Stacked Ensemble (ASE). We further demonstrate the value of the selected ASE model, by selecting distinct decision threshold scenarios and by using a Sensitivity Analysis (SA) eXplainable Artificial Intelligence (XAI) method.This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project nÂș 39479; Funding Reference: POCI-01-0247-FEDER-39479]

    A scalable and automated machine learning framework to support risk management

    Get PDF
    Due to the growth of data and wide spread usage of Machine Learning (ML) by non-experts, automation and scalability are becoming key issues for ML. This paper presents an automated and scalable framework for ML that requires minimum human input. We designed the framework for the domain of telecommunications risk management. This domain often requires non-ML-experts to continuously update supervised learning models that are trained on huge amounts of data. Thus, the framework uses Automated Machine Learning (AutoML), to select and tune the ML models, and distributed ML, to deal with Big Data. The modules included in the framework are task detection (to detect classification or regression), data preprocessing, feature selection, model training, and deployment. In this paper, we focus the experiments on the model training module. We first analyze the capabilities of eight AutoML tools: Auto-Gluon, Auto-Keras, Auto-Sklearn, Auto-Weka, H2O AutoML, Rminer, TPOT, and TransmogrifAI. Then, to select the tool for model training, we performed a benchmark with the only two tools that address a distributed ML (H2O AutoML and TransmogrifAI). The experiments used three real-world datasets from the telecommunications domain (churn, event forecasting, and fraud detection), as provided by an analytics company. The experiments allowed us to measure the computational effort and predictive capability of the AutoML tools. Both tools obtained high- quality results and did not present substantial predictive differences. Nevertheless, H2O AutoML was selected by the analytics company for the model training module, since it was considered a more mature technology that presented a more interesting set of features (e.g., integration with more platforms). After choosing H2O AutoML for the ML training, we selected the technologies for the remaining components of the architecture (e.g., data preprocessing and web interface).This work was executed under the project IRMDA - Intelligent Risk Management for the Digital Age, Individual Project, NUP: POCI-01-0247-FEDER-038526, co- funded by the Incentive System for Research and Technological Development, from the Thematic Operational Program Competitiveness of the national framework program - Portugal2020

    Reconstruction Algorithms in Compressive Sensing: An Overview

    Get PDF
    The theory Compressive Sensing (CS) has provided a newacquisition strategy and recovery with good in the image processing area.This theory guarantees to recover a signal with high probability from areduced sampling rate below the Nyquist-Shannon limit. The problem ofrecovering the original signal from the samples consists in solving an optimizationproblem. This article presents an overview of reconstructionalgorithms for sparse signal recovery in CS, these algorithms may bebroadly divided into six types. We have provided a comprehensive surveyof the numerous reconstruction algorithms in CS aiming to achievecomputational efficienc

    An empirical study on anomaly detection algorithms for extremely imbalanced datasets

    Get PDF
    Anomaly detection attempts to identify abnormal events that deviate from normality. Since such events are often rare, data related to this domain is usually imbalanced. In this paper, we compare diverse preprocessing and Machine Learning (ML) state-of-the-art algorithms that can be adopted within this anomaly detection context. These include two unsupervised learning algorithms, namely Isolation Forests (IF) and deep dense AutoEncoders (AE), and two supervised learning approaches, namely Random Forest and an Automated ML (AutoML) method. Several empirical experiments were conducted by adopting seven extremely imbalanced public domain datasets. Overall, the IF and AE unsupervised methods obtained competitive anomaly detection results, which also have the advantage of not requiring labeled data.This work has been supported by the European Regional Development Fund (FEDER) through a grant of the Operational Programme for Competitivity and Internationalization of Portugal 2020 Partnership Agreement (PRODUTECH4S&C, POCI-01-0247-FEDER-046102)

    Isolation forests and deep autoencoders for industrial screw tightening anomaly detection

    Get PDF
    Within the context of Industry 4.0, quality assessment procedures using data-driven techniques are becoming more critical due to the generation of massive amounts of production data. In this paper, we address the detection of abnormal screw tightening processes, which is a key industrial task. Since labeling is costly, requiring a manual effort, we focus on unsupervised detection approaches. In particular, we assume a computationally light low-dimensional problem formulation based on angle–torque pairs. Our work is focused on two unsupervised machine learning (ML) algorithms: isolation forest (IForest) and a deep learning autoencoder (AE). Several computational experiments were held by assuming distinct datasets and a realistic rolling window evaluation procedure. First, we compared the two ML algorithms with two other methods, a local outlier factor method and a supervised Random Forest, on older data related with two production days collected in November 2020. Since competitive results were obtained, during a second stage, we further compared the AE and IForest methods by adopting a more recent and larger dataset (from February to March 2021, totaling 26.9 million observations and related to three distinct assembled products). Both anomaly detection methods obtained an excellent quality class discrimination (higher than 90%) under a realistic rolling window with several training and testing updates. Turning to the computational effort, the AE is much lighter than the IForest for training (around 2.7 times faster) and inference (requiring 3.0 times less computation). This AE property is valuable within this industrial domain since it tends to generate big data. Finally, using the anomaly detection estimates, we developed an interactive visualization tool that provides explainable artificial intelligence (XAI) knowledge for the human operators, helping them to better identify the angle–torque regions associated with screw tightening failures.This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project n 39479; Funding Reference: POCI-01-0247-FEDER-39479]. The work of Diogo Ribeiro is supported the grant FCT PD/BDE/135105/2017

    A Comparison of AutoML Tools for Machine Learning, Deep Learning and XGBoost

    Get PDF
    This paper presents a benchmark of supervised Automated Machine Learning (AutoML) tools. Firstly, we an- alyze the characteristics of eight recent open-source AutoML tools (Auto-Keras, Auto-PyTorch, Auto-Sklearn, AutoGluon, H2O AutoML, rminer, TPOT and TransmogrifAI) and describe twelve popular OpenML datasets that were used in the benchmark (divided into regression, binary and multi-class classification tasks). Then, we perform a comparison study with hundreds of computational experiments based on three scenarios: General Machine Learning (GML), Deep Learning (DL) and XGBoost (XGB). To select the best tool, we used a lexicographic approach, considering first the average prediction score for each task and then the computational effort. The best predictive results were achieved for GML, which were further compared with the best OpenML public results. Overall, the best GML AutoML tools obtained competitive results, outperforming the best OpenML models in five datasets. These results confirm the potential of the general-purpose AutoML tools to fully automate the Machine Learning (ML) algorithm selection and tuning.Opti-Edge: 5G Digital Services Optimization at the Edge, Individual Project, NUP: POCI-01-0247-FEDER-045220, co-funded by the Incentive System for Research and Technological Development, from the Thematic Operational Program Competitiveness of the national framework program - Portugal202

    A comparison of anomaly detection methods for industrial screw tightening

    Get PDF
    Within the context of Industry 4.0, quality assessment pro- cedures using data-driven techniques are becoming more critical due to the generation of massive amounts of production data. In this paper, we address the detection of abnormal screw tightening processes, which is a relevant industrial task. Since labeling is costly, requiring a manual effort, we focus on unsupervised approaches. In particular, we assume a low-dimensional input screw fastening approach that is based only on angle-torque pairs. Using such pairs, we explore three main unsuper- vised Machine Learning (ML) algorithms: Local Outlier Factor (LOF), Isolation Forest (iForest) and a deep learning Autoencoder (AE). For benchmarking purposes, we also explore a supervised Random Forest (RF) algorithm. Several computational experiments were held by us- ing recent industrial data with 2.8 million angle-torque pair records and a realistic and robust rolling window evaluation. Overall, high quality anomaly discrimination results were achieved by the iForest (99%) and AE (95% and 96%) unsupervised methods, which compared well against the supervised RF (99% and 91%). When compared with iForest, the AE requires less computation effort and provides faster anomaly detection response times.This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internation- alization Programme (COMPETE 2020) [Project n 39479; Funding Reference: POCI-01-0247-FEDER-39479]

    An intelligent decision support system for road freight transport

    Get PDF
    This paper presents an Intelligent Decision Support System (IDSS) to optimize transport and logistics activities in a set of Portuguese companies currently operating in the freight transport sector. This IDSS comprises three main modules that can be used individually or chained together, dedicated to: a geographic clustering detection of transport services; a transport driver suggestion; and a route and truck-load optimization. The IDSS was entirely designed and developed to support real-time data and it consists of an end-to-end solution (E2ES), given that it covers all the main transport and logistics processes since the registration in the database to the optimized transport plan. The entire set of functionalities inserted in the IDSS was designed and validated by freight transport sector experts from the different companies that will use the proposed system.ERDF - European Regional Development Fund(undefined)The authors would like to express the most significant recognition to the project on which this IDSS has arisen, “aDyTrans - Dynamic Transportations Platform” reference NORTE-01-0247-FEDER-045174, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERD
    corecore