239 research outputs found

    A comparison of machine learning approaches for predicting in-car display production quality

    Get PDF
    In this paper, we explore eight Machine Learning (ML) approaches (binary and one-class) to predict the quality of in-car displays, measured using Black Uniformity (BU) tests. During production, the industrial manufacturer routinely executes intermediate assembly (screwing and gluing) and functional tests that can signal potential causes for abnormal display units. By using these intermediate tests as inputs, the ML model can be used to identify the unknown relationships between intermediate and BU tests, helping to detect failure causes. In particular, we compare two sets of input variables (A and B) with hundreds of intermediate quality measures related with assembly and functional tests. Using recently collected industrial data, regarding around 147 thousand in-car display records, we performed two evaluation procedures, using first a time ordered train-test split and then a more robust rolling windows. Overall, the best predictive results (92%) were obtained using the full set of inputs (B) and an Automated ML (AutoML) Stacked Ensemble (ASE). We further demonstrate the value of the selected ASE model, by selecting distinct decision threshold scenarios and by using a Sensitivity Analysis (SA) eXplainable Artificial Intelligence (XAI) method.This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project nº 39479; Funding Reference: POCI-01-0247-FEDER-39479]

    Using supervised and one-class automated machine learning for predictive maintenance

    Get PDF
    Predictive Maintenance (PdM) is a critical area that is benefiting from the Industry 4.0 advent. Recently, several attempts have been made to apply Machine Learning (ML) to PdM, with the majority of the research studies assuming an expert-based ML modeling. In contrast with these works, this paper explores a purely Automated Machine Learning (AutoML) modeling for PdM under two main approaches. Firstly, we adapt and compare ten recent open-source AutoML technologies focused on a Supervised Learning. Secondly, we propose a novel AutoML approach focused on a One-Class (OC) Learning (AutoOneClass) that employs a Grammatical Evolution (GE) to search for the best PdM model using three types of learners (OC Support Vector Machines, Isolation Forests and deep Autoencoders). Using recently collected data from a Portuguese software company client, we performed a benchmark comparison study with the Supervised AutoML tools and the proposed AutoOneClass method to predict the number of days until the next failure of an equipment and also determine if the equipments will fail in a fixed amount of days. Overall, the results were close among the compared AutoML tools, with supervised AutoGluon obtaining the best results for all ML tasks. Moreover, the best supervised AutoML and AutoOneClass predictive results were compared with two manual ML modeling approaches (using a ML expert and a non-ML expert), revealing competitive results.This work was executed under the project Cognitive CMMS - Cognitive Computerized Maintenance Management System, NUP: POCI-01-0247-FEDER-033574, co-funded by the Incentive System for Research and Technological Development , from the Thematic Operational Program Competitiveness of the national framework program - Portugal2020. We wish to thank the anonymous reviewers for their helpful comments

    A machine learning approach for spare parts lifetime estimation

    Get PDF
    Under the Industry 4.0 concept, there is increased usage of data-driven analytics to enhance the production process. In particular, equipment maintenance is a key industrial area that can benefit from using Machine Learning (ML) models. In this paper, we propose a novel Remaining Useful Life (RUL) ML-based spare part prediction that considers maintenance historical records, which are commonly available in several industries and thus more easy to collect when compared with specific equipment measurement data. As a case study, we consider 18,355 RUL records from an automotive multimedia assembly company, where each RUL value is defined as the full amount of units produced within two consecutive corrective maintenance actions. Under regression modeling, two categorical input transforms and eight ML algorithms were explored by considering a realistic rolling window evaluation. The best prediction model, which adopts an Inverse Document Frequency (IDF) data transformation and the Random Forest (RF) algorithm, produced high-quality RUL prediction results under a reasonable computational effort. Moreover, we have executed an eXplainable Artificial Intelligence (XAI) approach, based on the SHapley Additive exPlanations (SHAP) method, over the selected RF model, showing its potential value to extract useful explanatory knowledge for the maintenance domain.- This work has been supported by FCT -Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020

    Spam email filtering using network-level properties

    Get PDF
    Spam is serious problem that affects email users (e.g. phishing attacks, viruses and time spent reading unwanted messages). We propose a novel spam email filtering approach based on network-level attributes (e.g. the IP sender geographic coordinates) that are more persistent in time when compared to message content. This approach was tested using two classifiers, Naive Bayes (NB) and Support Vector Machines (SVM), and compared against bag-of-words models and eight blacklists. Several experiments were held with recent collected legitimate (ham) and non legitimate (spam) messages, in order to simulate distinct user profiles from two countries (USA and Portugal). Overall, the network-level based SVM model achieved the best discriminatory performance. Moreover, preliminary results suggests that such method is more robust to phishing attacks.Fundação para a Ciência e a Tecnologia (FCT) - PTDC/EIA/64541/200

    Otimização de um padrão gravimétrico de medição de caudal de fluidos entre 20 mL/h a 0,006 mL/h e extensão da capacidade para 600 mL/h

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia MecânicaA metrologia, enquanto ciência de medições, é responsável por garantir a exatidão exigida e necessária de vários processos produtivos. Tem como principal objetivo garantir a qualidade de produtos e serviços, recorrendo a processos de calibração de instrumentos de medição e da realização de ensaios, sendo a base fundamental para a comparação e competitividade das empresas e da sociedade em geral. Sendo que, na sua generalidade, todos os ramos da ciência se encontram em constante evolução, a área científica responsável pelo estudo de micro caudais não é exceção, o que exige um desenvolvimento adequado da metrologia aplicada a esta área, de forma a garantir as exigências legais e económicas necessárias. O tema da presente dissertação foi proposto pelo Instituto Português da Qualidade (IPQ), sendo este um dos participantes de um projeto financiado pela União Europeia, que visa atualizar a infraestrutura metrológica criando condições para uma calibração adequada de dispositivos que operam com caudais inferiores a 0,6 mL/h. A presente dissertação centrou-se na conceção, no projeto, na implementação e nos respetivos ensaios experimentais de dois padrões primários de medição de micro caudal, desenvolvidos no âmbito de um estágio realizado no Laboratório de Volume (LVO) do Instituto Português da Qualidade (IPQ), com o objetivo de fornecer rastreabilidade a vários equipamentos de medição de caudal. Ao longo do estágio que sustentou a presente tese foram ainda realizadas várias calibrações de outros instrumentos laboratoriais como vidraria de laboratório, picnómetros, balanças, reservatórios e sistemas infusores utlizados para fins médicos. Após o estudo efetuado no âmbito do estágio e da presente dissertação, o Laboratório de Volume do IPQ ficou apto a realizar a calibração de sistemas infusores, pelo método gravimétrico dinâmico, utilizados no doseamento de fármacos

    An automated and distributed machine learning framework for telecommunications risk management

    Get PDF
    Automation and scalability are currently two of the main challenges of Machine Learning. This paper proposes an automated and distributed ML framework that automatically trains a supervised learning model and produces predictions independently of the dataset and with minimum human input. The framework was designed for the domain of telecommunications risk management, which often requires supervised learning models that need to be quickly updated by non-ML-experts and trained on vast amounts of data. Thus, the architecture assumes a distributed environment, in order to deal with big data, and Automated Machine Learning (AutoML), to select and tune the ML models. The framework includes several modules: task detection (to detect if classification or regression), data preprocessing, feature selection, model training, and deployment. In this paper, we detail the model training module. In order to select the computational technologies to be used in this module, we first analyzed the capabilities of an initial set of five modern AutoML tools: Auto-Keras, Auto-Sklearn, Auto-Weka, H2O AutoML, and TransmogrifAI. Then, we performed a benchmarking of the only two tools that address distributed ML (H2O AutoML and TransmogrifAI). Several comparison experiments were held using three real-world datasets from the telecommunications domain (churn, event forecasting, and fraud detection), allowing us to measure the computational effort and predictive capability of the AutoML tools.This work was executed under the project IR-MDA - Intelligent Risk Management for the Digital Age, Individual Project, NUP: POCI-01-0247-FEDER-038526, co-funded by the Incentive Systemfor Research and Technological Development, fromthe Thematic Operational Program Competitivenessof the national framework program - Portugal2020

    Comparación de los métodos A.P.U. y costeo ABC para el análisis de precios unitarios en la construcción

    Get PDF
    El propósito de realizar esta comparación es obtener información más acertada, para tener una mejor visión de los costos, rentabilidad y planeación, que nos ayude a tomar decisiones fundamentales que mejoren el desempeño operacional de las distintas obras. Toda obra realizada por el hombre es motivada para satisfacer alguna necesidad, y esto hace necesario, una técnica para planearla, un tiempo para construirla y los recursos para llevarla a cabo, por lo que antes de iniciar cualquier trabajo es necesario presupuestar el trabajo que se va a realizar. Por presupuestar una obra nos referimos a establecer de qué está compuesta (composición cualitativa) y cuántas unidades de cada componente se requieren (composición cuantitativa) para, finalmente, aplicar precios a cada uno y obtener su valor en un momento dado. Para ello se necesita un análisis de precios unitarios (APU); este análisis de precios unitarios es un modelo matemático que adelanta el resultado, expresado en moneda, de una situación relacionada con una actividad sometida a estudio; en el cual se analiza el tipo de mano de obra, los materiales, el equipo y/o la herramienta que necesitaría la obra en cuestión, para determinar el costo de cada actividad o rubro, de una manera empírica. En el campo real no se realiza este proceso para asignar el precio unitario a un rubro, se basan en la información de obras anteriores o experiencia adquirida. Por lo que es necesario este estudio para lograr identificar las diferencias entre lo que indica la metodología y lo que en verdad se hace en campo real para definir el costo de cada rubro

    A comparison of machine learning methods for extremely unbalanced industrial quality data

    Get PDF
    The Industry 4.0 revolution is impacting manufacturing companies, which need to adopt more data intelligence processes in order to compete in the markets they operate. In particular, quality control is a key manufacturing process that has been addressed by Machine Learning (ML), aiming to improve productivity (e.g., reduce costs). However, modern industries produce a tiny portion of defective products, which results in extremely unbalanced datasets. In this paper, we analyze recent big data collected from a major automotive assembly manufacturer and related with the quality of eight products. The eight datasets in- clude millions of records but only a tiny percentage of failures (less than 0.07%). To handle such datasets, we perform a two-stage ML comparison study. Firstly, we consider two products and explore four ML algorithms, Random Forest (RF), two Automated ML (AutoML) methods and a deep Autoencoder (AE), and three balancing training strategies, namely None, Synthetic Minority Oversampling Technique (SMOTE) and Gaussian Copula (GC). When considering both classification performance and computational effort, interesting results were obtained by RF. Then, the selected RF was further explored by considering all eight datasets and five balancing methods: None, SMOTE, GC, Random Undersampling (RU) and Tomek Links (TL). Overall, competitive results were achieved by the combination of GC with RF.This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internation- alization Programme (COMPETE 2020) [Project n 39479; Funding Reference: POCI-01-0247-FEDER-39479]

    A scalable and automated machine learning framework to support risk management

    Get PDF
    Due to the growth of data and wide spread usage of Machine Learning (ML) by non-experts, automation and scalability are becoming key issues for ML. This paper presents an automated and scalable framework for ML that requires minimum human input. We designed the framework for the domain of telecommunications risk management. This domain often requires non-ML-experts to continuously update supervised learning models that are trained on huge amounts of data. Thus, the framework uses Automated Machine Learning (AutoML), to select and tune the ML models, and distributed ML, to deal with Big Data. The modules included in the framework are task detection (to detect classification or regression), data preprocessing, feature selection, model training, and deployment. In this paper, we focus the experiments on the model training module. We first analyze the capabilities of eight AutoML tools: Auto-Gluon, Auto-Keras, Auto-Sklearn, Auto-Weka, H2O AutoML, Rminer, TPOT, and TransmogrifAI. Then, to select the tool for model training, we performed a benchmark with the only two tools that address a distributed ML (H2O AutoML and TransmogrifAI). The experiments used three real-world datasets from the telecommunications domain (churn, event forecasting, and fraud detection), as provided by an analytics company. The experiments allowed us to measure the computational effort and predictive capability of the AutoML tools. Both tools obtained high- quality results and did not present substantial predictive differences. Nevertheless, H2O AutoML was selected by the analytics company for the model training module, since it was considered a more mature technology that presented a more interesting set of features (e.g., integration with more platforms). After choosing H2O AutoML for the ML training, we selected the technologies for the remaining components of the architecture (e.g., data preprocessing and web interface).This work was executed under the project IRMDA - Intelligent Risk Management for the Digital Age, Individual Project, NUP: POCI-01-0247-FEDER-038526, co- funded by the Incentive System for Research and Technological Development, from the Thematic Operational Program Competitiveness of the national framework program - Portugal2020

    Production time prediction for contract manufacturing industries using automated machine learning

    Get PDF
    The estimation of production time is an essential part of the manufacturing domain, allowing companies to optimize their production plan and meet the dates required by the customers. In the last years, there have been several approaches that use Machine Learning (ML) to predict the time needed to finish production orders. In this paper, we use the CRISP-DM methodology and Automated Machine Learning (AutoML) to address production time prediction for a Portuguese contract manufacturing company that produces metal containers. We performed three CRISP-DM iterations using real data provided by the company related to production orders and production operations. We compared four open-source modern AutoML technologies to predict production time across the three iterations: AutoGluon, H2O AutoML, rminer, and TPOT. Overall, the best results were achieved in the third CRISP-DM iteration by the H2O AutoML tool, which obtained an average error of 3.03 days. The obtained results suggest that the inclusion of data about individual manufacturing operations is useful for improving production time for the entire production order.This work has been supported by the European Regional Development Fund (FEDER) through a grant of the Operational Programme for Competitivity and Internationalization of Portugal 2020 Partnership Agreement (POCI-01-0247-FEDER-046102, PRODUTECH4S&C)
    corecore