77 research outputs found

    Learning With Imbalanced Data in Smart Manufacturing: A Comparative Analysis

    Get PDF
    The Internet of Things (IoT) paradigm is revolutionising the world of manufacturing into what is known as Smart Manufacturing or Industry 4.0. The main pillar in smart manufacturing looks at harnessing IoT data and leveraging machine learning (ML) to automate the prediction of faults, thus cutting maintenance time and cost and improving the product quality. However, faults in real industries are overwhelmingly outweighed by instances of good performance (faultless samples); this bias is reflected in the data captured by IoT devices. Imbalanced data limits the success of ML in predicting faults, thus presents a significant hindrance in the progress of smart manufacturing. Although various techniques have been proposed to tackle this challenge in general, this work is the first to present a framework for evaluating the effectiveness of these remedies in the context of manufacturing. We present a comprehensive comparative analysis in which we apply our proposed framework to benchmark the performance of different combinations of algorithm components using a real-world manufacturing dataset. We draw key insights into the effectiveness of each component and inter-relatedness between the dataset, the application context, and the design of the ML algorithm

    Federated Learning for Predictive Maintenance and Quality Inspection in Industrial Applications

    Full text link
    Data-driven machine learning is playing a crucial role in the advancements of Industry 4.0, specifically in enhancing predictive maintenance and quality inspection. Federated learning (FL) enables multiple participants to develop a machine learning model without compromising the privacy and confidentiality of their data. In this paper, we evaluate the performance of different FL aggregation methods and compare them to central and local training approaches. Our study is based on four datasets with varying data distributions. The results indicate that the performance of FL is highly dependent on the data and its distribution among clients. In some scenarios, FL can be an effective alternative to traditional central or local training methods. Additionally, we introduce a new federated learning dataset from a real-world quality inspection setting

    Generating real-valued failure data for prognostics under the conditions of limited Data Availability

    Get PDF
    Data-driven prognostics solutions underperform under the conditions of limited failure data availability since the number of failure data samples is insufficient for training prognostics models effectively. In order to address this problem, we present a novel methodology for generating real-valued failure data which allows training datasets to be augmented so that the number of failure data samples is increased. In contrast to existing data generation techniques which duplicate or randomly generate data, the proposed methodology is capable of generating new and realistic failure data samples. To this end, we utilised the conditional generative adversarial network and auxiliary information pertaining to the failure modes. The proposed methodology is evaluated in a real-world case study involving the prediction of air purge valve failures in heavy trucks. Two prognostics models are developed using gradient boosting machine and random forest classifiers. It is shown that when these models are trained on the augmented training dataset, they outperform the best prognostics solution previously proposed in the literature for the case study by a large margin. More specifically, costs due to breakdowns and false alarms are reduced by 44%.EPSR

    Predicting NOx sensor failure in heavy duty trucks using histogram-based random forests

    Get PDF
    Being able to accurately predict the impending failures of truck components is often associated with significant amount of cost savings, customer satisfaction and flexibility in maintenance service plans. However, because of the diversity in the way trucks typically are configured and their usage under different conditions, the creation of accurate prediction models is not an easy task. This paper describes an effort in creating such a prediction model for the NOx sensor, i.e., a component measuring the emitted level of nitrogen oxide in the exhaust of the engine. This component was chosen because it is vital for the truck to function properly, while at the same time being very fragile and costly to repair. As input to the model, technical specifications of trucks and their operational data are used. The process of collecting the data and making it ready for training the model via a slightly modified Random Forest learning algorithm is described along with various challenges encountered during this process. The operational data consists of features represented as histograms, posing an additional challenge for the data analysis task. In the study, a modified version of the random forest algorithm is employed, which exploits the fact that the individual bins in the histograms are related, in contrast to the standard approach that would consider the bins as independent features. Experiments are conducted using the updated random forest algorithm, and they clearly show that the modified version is indeed beneficial when compared to the standard random forest algorithm. The performance of the resulting prediction model for the NOx sensor is promising and may be adopted for the benefit of operators of heavy trucks

    Integration of MLOps with IoT edge

    Get PDF
    Abstract. Edge Computing and Machine Learning have become increasingly vital in today’s digital landscape. Edge computing brings computational power closer to the data source enabling reduced latency and bandwith, increased privacy, and real-time decision-making. Running Machine Learning models on edge devices further enhances these advantages by reducing the reliance on cloud. This empowers industries such as transport, healthcare, manufacturing, to harness the full potential of Machine Learning. MLOps, or Machine Learning Operations play a major role streamlining the deployment, monitoring, and management of Machine Learning models in production. With MLOps, organisations can achieve faster model iteration, reduced deployment time, improved collaboration with developers, optimised performance, and ultimately meaningful business outcomes. Integrating MLOps with edge devices poses unique challenges. Overcoming these challenges requires careful planning, customised deployment strategies, and efficient model optimization techniques. This thesis project introduces a set of tools that enable the integration of MLOps practices with edge devices. The solution consists of two sets of tools: one for setting up infrastructure within edge devices to be able to receive, monitor, and run inference on Machine Learning models, and another for MLOps pipelines to package models to be compatible with the inference and monitoring components of the respective edge devices. This platform was evaluated by obtaining a public dataset used for predicting the breakdown of Air Pressure Systems in trucks, which is an ideal use-case for running ML inference on the edge, and connecting MLOps pipelines with edge devices.. A simulation was created using the data in order to control the volume of data flow into edge devices. Thereafter, the performance of the platform was tested against the scenario created by the simulation script. Response time and CPU usage in different components were the metrics that were tested. Additionally, the platform was evaluated against a set of commercial and open source tools and services that serve similar purposes. The overall performance of this solution matches that of already existing tools and services, while allowing end users setting up Edge-MLOps infrastructure the complete freedom to set up their system without completely relying on third party licensed software.MLOps-integraatio reunalaskennan tarpeisiin. Tiivistelmä. Reunalaskennasta (Edge Computing) ja koneoppimisesta on tullut yhä tärkeämpiä nykypäivän digitaalisessa ympäristössä. Reunalaskenta tuo laskentatehon lähemmäs datalähdettä, mikä mahdollistaa reaaliaikaisen päätöksenteon ja pienemmän viiveen. Koneoppimismallien suorittaminen reunalaitteissa parantaa näitä etuja entisestään vähentämällä riippuvuutta pilvipalveluista. Näin esimerkiksi liikenne-, terveydenhuolto- ja valmistusteollisuus voivat hyödyntää koneoppimisen koko potentiaalin. MLOps eli Machine Learning Operations on merkittävässä asemassa tehostettaessa ML -mallien käyttöönottoa, seurantaa ja hallintaa tuotannossa. MLOpsin avulla organisaatiot voivat nopeuttaa mallien iterointia, lyhentää käyttöönottoaikaa, parantaa yhteistyötä kehittäjien kesken, optimoida laskennan suorituskykyä ja lopulta saavuttaa merkityksellisiä liiketoimintatuloksia. MLOpsin integroiminen reunalaitteisiin asettaa ainutlaatuisia haasteita. Näiden haasteiden voittaminen edellyttää huolellista suunnittelua, räätälöityjä käyttöönottostrategioita ja tehokkaita mallien optimointitekniikoita. Tässä opinnäytetyöhankkeessa esitellään joukko työkaluja, jotka mahdollistavat MLOps-käytäntöjen integroinnin reunalaitteisiin. Ratkaisu koostuu kahdesta työkalukokonaisuudesta: toinen infrastruktuurin perustamisesta reunalaitteisiin, jotta ne voivat vastaanottaa, valvoa ja suorittaa päätelmiä koneoppimismalleista, ja toinen MLOps “prosesseista”, joilla mallit paketoidaan yhteensopiviksi vastaavien reunalaitteiden komponenttien kanssa. Ratkaisun toimivuutta arvioitiin avoimeen dataan perustuvalla käyttötapauksella. Datan avulla luotiin simulaatio, jonka tarkoituksena oli mahdollistaa reunalaitteisiin suuntautuvan datatovirran kontrollonti. Tämän jälkeen suorituskykyä testattiin simuloinnin luoman skenaarion avulla. Testattaviin mittareihin kuuluivat muun muassa suorittimen käyttö. Lisäksi ratkaisua arvioitiin vertaamalla sitä olemassa oleviin kaupallisiin ja avoimen lähdekoodin alustoihin. Tämän ratkaisun kokonaissuorituskyky vastaa jo markkinoilla olevien työkalujen ja palvelujen suorituskykyä. Ratkaisu antaa samalla loppukäyttäjille mahdollisuuden perustaa Edge-MLOps-infrastruktuuri ilman riippuvuutta kolmannen osapuolen lisensoiduista ohjelmistoista

    A Comprehensive Survey on Rare Event Prediction

    Full text link
    Rare event prediction involves identifying and forecasting events with a low probability using machine learning and data analysis. Due to the imbalanced data distributions, where the frequency of common events vastly outweighs that of rare events, it requires using specialized methods within each step of the machine learning pipeline, i.e., from data processing to algorithms to evaluation protocols. Predicting the occurrences of rare events is important for real-world applications, such as Industry 4.0, and is an active research area in statistical and machine learning. This paper comprehensively reviews the current approaches for rare event prediction along four dimensions: rare event data, data processing, algorithmic approaches, and evaluation approaches. Specifically, we consider 73 datasets from different modalities (i.e., numerical, image, text, and audio), four major categories of data processing, five major algorithmic groupings, and two broader evaluation approaches. This paper aims to identify gaps in the current literature and highlight the challenges of predicting rare events. It also suggests potential research directions, which can help guide practitioners and researchers.Comment: 44 page

    Design of software-oriented technician for vehicle’s fault system prediction using AdaBoost and random forest classifiers

    Get PDF
    Detecting and isolating faults on heavy duty vehicles is very important because it helps maintain high vehicle performance, low emissions, fuel economy, high vehicle safety and ensures repair and service efficiency. These factors are important because they help reduce the overall life cycle cost of a vehicle. The aim of this paper is to deliver a Web application model which aids the professional technician or vehicle user with basic automobile knowledge to access the working condition of the vehicles and detect the fault subsystem in the vehicles. The scope of this system is to visualize the data acquired from vehicle, diagnosis the fault component using trained fault model obtained from improvised Machine Learning (ML) classifiers and generate a report. The visualization page is built with plotly python package and prepared with selected parameter from On-board Diagnosis (OBD) tool data. The Histogram data is pre-processed with techniques such as null value Imputation techniques, Standardization and Balancing methods in order to increase the quality of training and it is trained with Classifiers. Finally, Classifier is tested and the Performance Metrics such as Accuracy, Precision, Re-call and F1 measure which are calculated from the Confusion Matrix. The proposed methodology for fault model prediction uses supervised algorithms such as Random Forest (RF), Ensemble Algorithm like AdaBoost Algorithm which offer reasonable Accuracy and Recall. The Python package joblib is used to save the model weights and reduce the computational time. Google Colabs is used as the python environment as it offers versatile features and PyCharm is utilised for the development of Web application. Hence, the Web application, outcome of this proposed work can, not only serve as the perfect companion to minimize the cost of time and money involved in unnecessary checks done for fault system detection but also aids to quickly detect and isolate the faulty system to avoid the propagation of errors that can lead to more dangerous cases

    Developing Leading and Lagging Indicators to Enhance Equipment Reliability in a Lean System

    Get PDF
    With increasing complexity in equipment, the failure rates are becoming a critical metric due to the unplanned maintenance in a production environment. Unplanned maintenance in manufacturing process is created issues with downtimes and decreasing the reliability of equipment. Failures in equipment have resulted in the loss of revenue to organizations encouraging maintenance practitioners to analyze ways to change unplanned to planned maintenance. Efficient failure prediction models are being developed to learn about the failures in advance. With this information, failures predicted can reduce the downtimes in the system and improve the throughput. The goal of this thesis is to predict failure in centrifugal pumps using various machine learning models like random forest, stochastic gradient boosting, and extreme gradient boosting. For accurate prediction, historical sensor measurements were modified into leading and lagging indicators which explained the failure patterns in the equipment were developed. The best subset of indicators was selected by filtering using random forest and utilized in the developed model. Finally, the models give a probability of failure before the failure occurs. Appropriate evaluation metrics were used to obtain the accurate model. The proposed methodology was illustrated with two case studies: first, to the centrifugal pump asset performance data provided by Meridium, Inc. and second, the data collected from aircraft turbine engine provided in the NASA prognostics data repository. The automated methodology was shown to develop and identify appropriate failure leading and lagging indicators in both cases and facilitate machine learning model development

    Predictive Maintenance of Lead-Acid Batteries with Sparse Vehicle Operational Data

    Get PDF
    Predictive maintenance aims to predict failures in components of a system, a heavy-duty vehicle in this work, and do maintenance before any actual fault occurs. Predictive maintenance is increasingly important in the automotive industry due to the development of new services and autonomous vehicles with no driver who can notice first signs of a component problem. The lead-acid battery in a heavy vehicle is mostly used during engine starts, but also for heating and cooling the cockpit, and is an important part of the electrical system that is essential for reliable operation. This paper develops and evaluates two machine-learning based methods for battery prognostics, one based on Long Short-Term Memory (LSTM) neural networks and one on Random Survival Forest (RSF). The objective is to estimate time of battery failure based on sparse and non-equidistant vehicle operational data, obtained from workshop visits or over-the-air readouts. The dataset has three characteristics: 1) no sensor measurements are directly related to battery health, 2) the number of data readouts vary from one vehicle to another, and 3) readouts are collected at different time periods. Missing data is common and is addressed by comparing different imputation techniques. RSF- and LSTM-based models are proposed and evaluated for the case of sparse multiple readouts. How to measure model performance is discussed and how the amount of vehicle information influences performance
    corecore