28 research outputs found

    Using supervised and one-class automated machine learning for predictive maintenance

    Get PDF
    Predictive Maintenance (PdM) is a critical area that is benefiting from the Industry 4.0 advent. Recently, several attempts have been made to apply Machine Learning (ML) to PdM, with the majority of the research studies assuming an expert-based ML modeling. In contrast with these works, this paper explores a purely Automated Machine Learning (AutoML) modeling for PdM under two main approaches. Firstly, we adapt and compare ten recent open-source AutoML technologies focused on a Supervised Learning. Secondly, we propose a novel AutoML approach focused on a One-Class (OC) Learning (AutoOneClass) that employs a Grammatical Evolution (GE) to search for the best PdM model using three types of learners (OC Support Vector Machines, Isolation Forests and deep Autoencoders). Using recently collected data from a Portuguese software company client, we performed a benchmark comparison study with the Supervised AutoML tools and the proposed AutoOneClass method to predict the number of days until the next failure of an equipment and also determine if the equipments will fail in a fixed amount of days. Overall, the results were close among the compared AutoML tools, with supervised AutoGluon obtaining the best results for all ML tasks. Moreover, the best supervised AutoML and AutoOneClass predictive results were compared with two manual ML modeling approaches (using a ML expert and a non-ML expert), revealing competitive results.This work was executed under the project Cognitive CMMS - Cognitive Computerized Maintenance Management System, NUP: POCI-01-0247-FEDER-033574, co-funded by the Incentive System for Research and Technological Development , from the Thematic Operational Program Competitiveness of the national framework program - Portugal2020. We wish to thank the anonymous reviewers for their helpful comments

    A machine learning approach for spare parts lifetime estimation

    Get PDF
    Under the Industry 4.0 concept, there is increased usage of data-driven analytics to enhance the production process. In particular, equipment maintenance is a key industrial area that can benefit from using Machine Learning (ML) models. In this paper, we propose a novel Remaining Useful Life (RUL) ML-based spare part prediction that considers maintenance historical records, which are commonly available in several industries and thus more easy to collect when compared with specific equipment measurement data. As a case study, we consider 18,355 RUL records from an automotive multimedia assembly company, where each RUL value is defined as the full amount of units produced within two consecutive corrective maintenance actions. Under regression modeling, two categorical input transforms and eight ML algorithms were explored by considering a realistic rolling window evaluation. The best prediction model, which adopts an Inverse Document Frequency (IDF) data transformation and the Random Forest (RF) algorithm, produced high-quality RUL prediction results under a reasonable computational effort. Moreover, we have executed an eXplainable Artificial Intelligence (XAI) approach, based on the SHapley Additive exPlanations (SHAP) method, over the selected RF model, showing its potential value to extract useful explanatory knowledge for the maintenance domain.- This work has been supported by FCT -Fundação para a CiĂȘncia e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020

    A comparison of machine learning approaches for predicting in-car display production quality

    Get PDF
    In this paper, we explore eight Machine Learning (ML) approaches (binary and one-class) to predict the quality of in-car displays, measured using Black Uniformity (BU) tests. During production, the industrial manufacturer routinely executes intermediate assembly (screwing and gluing) and functional tests that can signal potential causes for abnormal display units. By using these intermediate tests as inputs, the ML model can be used to identify the unknown relationships between intermediate and BU tests, helping to detect failure causes. In particular, we compare two sets of input variables (A and B) with hundreds of intermediate quality measures related with assembly and functional tests. Using recently collected industrial data, regarding around 147 thousand in-car display records, we performed two evaluation procedures, using first a time ordered train-test split and then a more robust rolling windows. Overall, the best predictive results (92%) were obtained using the full set of inputs (B) and an Automated ML (AutoML) Stacked Ensemble (ASE). We further demonstrate the value of the selected ASE model, by selecting distinct decision threshold scenarios and by using a Sensitivity Analysis (SA) eXplainable Artificial Intelligence (XAI) method.This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project nÂș 39479; Funding Reference: POCI-01-0247-FEDER-39479]

    An automated and distributed machine learning framework for telecommunications risk management

    Get PDF
    Automation and scalability are currently two of the main challenges of Machine Learning. This paper proposes an automated and distributed ML framework that automatically trains a supervised learning model and produces predictions independently of the dataset and with minimum human input. The framework was designed for the domain of telecommunications risk management, which often requires supervised learning models that need to be quickly updated by non-ML-experts and trained on vast amounts of data. Thus, the architecture assumes a distributed environment, in order to deal with big data, and Automated Machine Learning (AutoML), to select and tune the ML models. The framework includes several modules: task detection (to detect if classification or regression), data preprocessing, feature selection, model training, and deployment. In this paper, we detail the model training module. In order to select the computational technologies to be used in this module, we first analyzed the capabilities of an initial set of five modern AutoML tools: Auto-Keras, Auto-Sklearn, Auto-Weka, H2O AutoML, and TransmogrifAI. Then, we performed a benchmarking of the only two tools that address distributed ML (H2O AutoML and TransmogrifAI). Several comparison experiments were held using three real-world datasets from the telecommunications domain (churn, event forecasting, and fraud detection), allowing us to measure the computational effort and predictive capability of the AutoML tools.This work was executed under the project IR-MDA - Intelligent Risk Management for the Digital Age, Individual Project, NUP: POCI-01-0247-FEDER-038526, co-funded by the Incentive Systemfor Research and Technological Development, fromthe Thematic Operational Program Competitivenessof the national framework program - Portugal2020

    A comparison of machine learning methods for extremely unbalanced industrial quality data

    Get PDF
    The Industry 4.0 revolution is impacting manufacturing companies, which need to adopt more data intelligence processes in order to compete in the markets they operate. In particular, quality control is a key manufacturing process that has been addressed by Machine Learning (ML), aiming to improve productivity (e.g., reduce costs). However, modern industries produce a tiny portion of defective products, which results in extremely unbalanced datasets. In this paper, we analyze recent big data collected from a major automotive assembly manufacturer and related with the quality of eight products. The eight datasets in- clude millions of records but only a tiny percentage of failures (less than 0.07%). To handle such datasets, we perform a two-stage ML comparison study. Firstly, we consider two products and explore four ML algorithms, Random Forest (RF), two Automated ML (AutoML) methods and a deep Autoencoder (AE), and three balancing training strategies, namely None, Synthetic Minority Oversampling Technique (SMOTE) and Gaussian Copula (GC). When considering both classification performance and computational effort, interesting results were obtained by RF. Then, the selected RF was further explored by considering all eight datasets and five balancing methods: None, SMOTE, GC, Random Undersampling (RU) and Tomek Links (TL). Overall, competitive results were achieved by the combination of GC with RF.This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internation- alization Programme (COMPETE 2020) [Project n 39479; Funding Reference: POCI-01-0247-FEDER-39479]

    A scalable and automated machine learning framework to support risk management

    Get PDF
    Due to the growth of data and wide spread usage of Machine Learning (ML) by non-experts, automation and scalability are becoming key issues for ML. This paper presents an automated and scalable framework for ML that requires minimum human input. We designed the framework for the domain of telecommunications risk management. This domain often requires non-ML-experts to continuously update supervised learning models that are trained on huge amounts of data. Thus, the framework uses Automated Machine Learning (AutoML), to select and tune the ML models, and distributed ML, to deal with Big Data. The modules included in the framework are task detection (to detect classification or regression), data preprocessing, feature selection, model training, and deployment. In this paper, we focus the experiments on the model training module. We first analyze the capabilities of eight AutoML tools: Auto-Gluon, Auto-Keras, Auto-Sklearn, Auto-Weka, H2O AutoML, Rminer, TPOT, and TransmogrifAI. Then, to select the tool for model training, we performed a benchmark with the only two tools that address a distributed ML (H2O AutoML and TransmogrifAI). The experiments used three real-world datasets from the telecommunications domain (churn, event forecasting, and fraud detection), as provided by an analytics company. The experiments allowed us to measure the computational effort and predictive capability of the AutoML tools. Both tools obtained high- quality results and did not present substantial predictive differences. Nevertheless, H2O AutoML was selected by the analytics company for the model training module, since it was considered a more mature technology that presented a more interesting set of features (e.g., integration with more platforms). After choosing H2O AutoML for the ML training, we selected the technologies for the remaining components of the architecture (e.g., data preprocessing and web interface).This work was executed under the project IRMDA - Intelligent Risk Management for the Digital Age, Individual Project, NUP: POCI-01-0247-FEDER-038526, co- funded by the Incentive System for Research and Technological Development, from the Thematic Operational Program Competitiveness of the national framework program - Portugal2020

    An empirical study on anomaly detection algorithms for extremely imbalanced datasets

    Get PDF
    Anomaly detection attempts to identify abnormal events that deviate from normality. Since such events are often rare, data related to this domain is usually imbalanced. In this paper, we compare diverse preprocessing and Machine Learning (ML) state-of-the-art algorithms that can be adopted within this anomaly detection context. These include two unsupervised learning algorithms, namely Isolation Forests (IF) and deep dense AutoEncoders (AE), and two supervised learning approaches, namely Random Forest and an Automated ML (AutoML) method. Several empirical experiments were conducted by adopting seven extremely imbalanced public domain datasets. Overall, the IF and AE unsupervised methods obtained competitive anomaly detection results, which also have the advantage of not requiring labeled data.This work has been supported by the European Regional Development Fund (FEDER) through a grant of the Operational Programme for Competitivity and Internationalization of Portugal 2020 Partnership Agreement (PRODUTECH4S&C, POCI-01-0247-FEDER-046102)

    Predicting the tear strength of woven fabrics via automated machine learning: an application of the CRISP-DM methodology

    Get PDF
    Textile and clothing is an important industry that is currently being transformed by the adoption of the Industry 4.0 concept. In this paper, we use the CRoss-Industry Standard Process for Data Mining (CRISP-DM) methodology to model the textile testing process. Real-world data were collected from a Portuguese textile company. Predicting the outcome of a given textile test is beneficial to the company because it can reduce the number of physical samples that are needed to be produced when designing new fabrics. In particular, we target two important textile regression tasks: the tear strength in warp and weft directions. To better focus on feature engineering and data transformations, we adopt an Automated Machine Learning (AutoML) during the modeling stage of the CRISP-DM. Several iterations of the CRISP-DM methodology were employed, using different data preprocessing procedures (e.g., removal of outliers). The best predictive models were achieved after 2 (for warp) and 3 (for weft) CRISP-DM iterations.FEDER - European Regional Development Fund(P2020

    Production time prediction for contract manufacturing industries using automated machine learning

    Get PDF
    The estimation of production time is an essential part of the manufacturing domain, allowing companies to optimize their production plan and meet the dates required by the customers. In the last years, there have been several approaches that use Machine Learning (ML) to predict the time needed to finish production orders. In this paper, we use the CRISP-DM methodology and Automated Machine Learning (AutoML) to address production time prediction for a Portuguese contract manufacturing company that produces metal containers. We performed three CRISP-DM iterations using real data provided by the company related to production orders and production operations. We compared four open-source modern AutoML technologies to predict production time across the three iterations: AutoGluon, H2O AutoML, rminer, and TPOT. Overall, the best results were achieved in the third CRISP-DM iteration by the H2O AutoML tool, which obtained an average error of 3.03 days. The obtained results suggest that the inclusion of data about individual manufacturing operations is useful for improving production time for the entire production order.This work has been supported by the European Regional Development Fund (FEDER) through a grant of the Operational Programme for Competitivity and Internationalization of Portugal 2020 Partnership Agreement (POCI-01-0247-FEDER-046102, PRODUTECH4S&C)

    AI4CITY - An automated machine learning platform for smart cities

    Get PDF
    Nowadays, the general interest in Machine Learning (ML) based solutions is increasing. However, to develop and deploy a ML solution often requires experience and it involves developing large code scripts. In this paper, we propose AI4CITY, an automated technological platform that aims to reduce the complexity of designing ML solutions, with a particular focus on Smart Cities applications. We compare our solution with popular Automated ML (AutoML) tools (e.g., H2O, AutoGluon) and the results achieved by AI4CITY were quite interesting and competitive.This work has been supported by FCT – Fundação para a CiĂȘncia e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020, and was carried out within the project ”City Catalyst - Catalisador para Cidades SustentĂĄveis” reference POCI/LISBOA-01-0247-FEDER-046119, co-funded by Fundo Europeu de Desenvolvimento Regional (FEDER), through Portugal 2020 (P2020)
    corecore