370 research outputs found

    A machine learning approach for spare parts lifetime estimation

    Get PDF
    Under the Industry 4.0 concept, there is increased usage of data-driven analytics to enhance the production process. In particular, equipment maintenance is a key industrial area that can benefit from using Machine Learning (ML) models. In this paper, we propose a novel Remaining Useful Life (RUL) ML-based spare part prediction that considers maintenance historical records, which are commonly available in several industries and thus more easy to collect when compared with specific equipment measurement data. As a case study, we consider 18,355 RUL records from an automotive multimedia assembly company, where each RUL value is defined as the full amount of units produced within two consecutive corrective maintenance actions. Under regression modeling, two categorical input transforms and eight ML algorithms were explored by considering a realistic rolling window evaluation. The best prediction model, which adopts an Inverse Document Frequency (IDF) data transformation and the Random Forest (RF) algorithm, produced high-quality RUL prediction results under a reasonable computational effort. Moreover, we have executed an eXplainable Artificial Intelligence (XAI) approach, based on the SHapley Additive exPlanations (SHAP) method, over the selected RF model, showing its potential value to extract useful explanatory knowledge for the maintenance domain.- This work has been supported by FCT -Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020

    Modeling Events and Interactions through Temporal Processes -- A Survey

    Full text link
    In real-world scenario, many phenomena produce a collection of events that occur in continuous time. Point Processes provide a natural mathematical framework for modeling these sequences of events. In this survey, we investigate probabilistic models for modeling event sequences through temporal processes. We revise the notion of event modeling and provide the mathematical foundations that characterize the literature on the topic. We define an ontology to categorize the existing approaches in terms of three families: simple, marked, and spatio-temporal point processes. For each family, we systematically review the existing approaches based based on deep learning. Finally, we analyze the scenarios where the proposed techniques can be used for addressing prediction and modeling aspects.Comment: Image replacement

    A Distributional Perspective on Remaining Useful Life Prediction with Deep Learning and Quantile Regression

    Get PDF
    With the rapid development of information and sensor technology, the data-driven remaining useful lifetime (RUL) prediction methods have been acquired a successful development. Nowadays, the data-driven RUL methods are focused on estimating the RUL value. However, it is more important to quantify uncertainty associated with the RUL value. This is because increasingly complex industrial systems would arise various sources of uncertainty. This paper proposes a novel distributional RUL prediction method, which aims at quantifying the RUL uncertainty by identifying the confidence interval with the cumulative distribution function (CDF). The proposed learning method has been built based on quantile regression and implemented from a distributional perspective under the deep neural network framework. The results of the run-to-failure degradation experiments of rolling bearing demonstrate the effectiveness and good performance of the proposed method compared to other state-of-the-art methods. The visualization results obtained by t-SNE technology have been investigated to further verify the effectiveness and generalization ability of the proposed method

    Advancing systems biology of yeast through machine learning and comparative genomics

    Get PDF
    Synthetic biology has played a pivotal role in accomplishing the production of high value commodities, pharmaceuticals, and bulk chemicals. Fueled by the breakthrough of synthetic biology and metabolic engineering, Saccharomyces cerevisiae and various other yeasts (such as Yarrowia lipolytica, Pichia pastoris) have been proven to be promising microbial cell factories and are frequently used in scientific studies. However, the cellular metabolism and physiological properties for most of the yeast species have not been characterized in detail. To address these knowledge gaps, this thesis aims to leverage the large amounts of data available for yeast species and use state-of-the-art machine learning techniques and comparative genomic analysis to gain a deeper insight into yeast traits and metabolism.In this thesis, machine learning was applied to various unresolved biological problems on yeasts, i.e., gene essentiality, enzyme turnover number (kcat), and protein production. In the first part of the work, machine learning approaches were employed to predict gene essentiality based on sequence features and evolutionary features. It was demonstrated that the essential gene prediction could be substantially improved by integrating evolution-based features. Secondly, a high-quality deep learning model DLKcat was developed to predict kcat\ua0values by combining a graph neural network for substrates and a convolutional neural network for proteins. By predicting kcat profiles for 343 yeast/fungi species, enzyme-constrained models were reconstructed and used to further elucidate the cellular metabolism on a large scale. Lastly, a random forest algorithm was adopted to investigate feature importance analysis on protein production, it was found that post-translational modifications (PTMs) have a relatively higher impact on protein production compared with amino acid composition. In comparative genomics, a comprehensive toolbox HGTphyloDetect was developed to facilitate the identification of horizontal gene transfer (HGT) events. Case studies on some yeast species demonstrated the ability of HGTphyloDetect to identify horizontally acquired genes with high accuracy. In addition, through systematic evolution analysis (e.g., HGT, gene family expansion) and genome-scale metabolic model simulation, the underlying mechanisms for substrate utilization were further probed across large-scale yeast species

    An integrated deep learning-based approach for automobile maintenance prediction with GIS data

    Get PDF
    Predictive maintenance (PdM) can be beneficial to the industry in terms of lowering maintenance cost and improve productivity. Remaining useful life (RUL) prediction is an important task in PdM. The RUL of an automobile can be impacted by various surrounding factors such as weather, traffic and terrain, which can be captured by the geographical information system (GIS). Recently, most researchers have conducted studies of RUL modelling based on sensor data. Owing to the fact that the collection of sensor data is expensive, while maintenance data is relatively easy to obtain. This study aims to establish an automobile RUL prediction model with GIS data through a data-driven approach. In this approach, firstly, due to the data type and sampling rate of the maintenance data and GIS data are different, a data integration scheme was researched. Secondly, the Cox proportional hazard model (Cox PHM) was introduced to construct the health index (HI) for the integrated data. Then, a deep learning structure called M-LSTM (Merged-long-short term memory) network was designed for HI modelling based on the integrated data which contains both sequential data and ordinary numeric data. Finally, the RUL was mapped by predicted HI and the Cox PHM. An experimental study using a sizable real-world fleet maintenance dataset provided by a UK fleet company revealed the effectiveness of the proposed approach and the impact of the GIS factors on the automobiles under investigation

    Process Mining Workshops

    Get PDF
    This open access book constitutes revised selected papers from the International Workshops held at the Third International Conference on Process Mining, ICPM 2021, which took place in Eindhoven, The Netherlands, during October 31–November 4, 2021. The conference focuses on the area of process mining research and practice, including theory, algorithmic challenges, and applications. The co-located workshops provided a forum for novel research ideas. The 28 papers included in this volume were carefully reviewed and selected from 65 submissions. They stem from the following workshops: 2nd International Workshop on Event Data and Behavioral Analytics (EDBA) 2nd International Workshop on Leveraging Machine Learning in Process Mining (ML4PM) 2nd International Workshop on Streaming Analytics for Process Mining (SA4PM) 6th International Workshop on Process Querying, Manipulation, and Intelligence (PQMI) 4th International Workshop on Process-Oriented Data Science for Healthcare (PODS4H) 2nd International Workshop on Trust, Privacy, and Security in Process Analytics (TPSA) One survey paper on the results of the XES 2.0 Workshop is included

    Machine learning and data-parallel processing for viral metagenomics

    Get PDF
    More than 2 million cancer cases around the world each year are caused by viruses. In addition, there are epidemiological indications that other cancer-associated viruses may also exist. However, the identification of highly divergent and yet unknown viruses in human biospecimens is one of the biggest challenges in bio- informatics. Modern-day Next Generation Sequencing (NGS) technologies can be used to directly sequence biospecimens from clinical cohorts with unprecedented speed and depth. These technologies are able to generate billions of bases with rapidly decreasing cost but current bioinformatics tools are inefficient to effectively process these massive datasets. Thus, the objective of this thesis was to facilitate both the detection of highly divergent viruses among generated sequences as well as large-scale analysis of human metagenomic datasets. To re-analyze human sample-derived sequences that were classified as being of “unknown” origin by conventional alignment-based methods, we used a meth- odology based on profile Hidden Markov Models (HMM) which can capture evolutionary changes by using multiple sequence alignments. We thus identified 510 sequences that were classified as distantly related to viruses. Many of these sequences were homologs to large viruses such as Herpesviridae and Mimiviridae but some of them were also related to small circular viruses such as Circoviridae. We found that bioinformatics analysis using viral profile HMM is capable of extending the classification of previously unknown sequences and consequently the detection of viruses in biospecimens from humans. Different organisms use synonymous codons differently to encode the same amino acids. To investigate whether codon usage bias could predict the presence of virus in metagenomic sequencing data originating from human samples, we trained Random Forest and Artificial Neural Networks based on Relative Synonymous Codon Usage (RSCU) frequency. Our analysis showed that machine learning tech- niques based on RSCU could identify putative viral sequences with area under the ROC curve of 0.79 and provide important information for taxonomic classification. For identification of viral genomes among raw metagenomic sequences, we devel- oped the tool ViraMiner, a deep learning-based method which uses Convolutional Neural Networks with two convolutional branches. Using 300 base-pair length sequences, ViraMiner achieved 0.923 area under the ROC curve which is con- siderably improved performance in comparison with previous machine learning methods for virus sequence classification. The proposed architecture, to the best of our knowledge, is the first deep learning tool which can detect viral genomes on raw metagenomic sequences originating from a variety of human samples. To enable large-scale analysis of massive metagenomic sequencing data we used Apache Hadoop and Apache Spark to develop ViraPipe, a scalable parallel bio- informatics pipeline for viral metagenomics. Comparing ViraPipe (executed on 23 nodes) with the sequential pipeline (executed on a single node) was 11 times faster in the metagenome analysis. The new distributed workflow contains several standard bioinformatics tools and can scale to terabytes of data by accessing more computer power from the nodes. To analyze terabytes of RNA-seq data originating from head and neck squamous cell carcinoma samples, we used our parallel bioinformatics pipeline ViraPipe and the most recent version of the HPV sequence database. We detected transcription of HPV viral oncogenes in 92/500 cancers. HPV 16 was the most important HPV type, followed by HPV 33 as the second most common infection. If these cancers are indeed caused by HPV, we estimated that vaccination might prevent about 36 000 head and neck cancer cases in the United States every year. In conclusion, the work in this thesis improves the prospects for biomedical researchers to classify the sequence contents of ultra-deep datasets, conduct large- scale analysis of metagenome studies, and detect presence of viral genomes in human biospecimens. Hopefully, this work will contribute to our understanding of biodiversity of viruses in humans which in turn can help exploring infectious causes of human disease

    Remaining Useful Life Estimation of Bearings Meta-Analysis of Experimental Procedure

    Get PDF
    In the domain of predictive maintenance, when trying to repli- cate and compare research in remaining useful life estimation (RUL), several inconsistencies and errors were identified in the experimental methodology used by various researchers. This makes the replication and the comparison of results diffi- cult, thus severely hindering both progress in this research do- main and its practical application to industry. We survey the literature to evaluate the experimental procedures that were used, and identify the most common errors and omission in both experimental procedures and reporting. A total of 70 papers on RUL were audited. From this meta- analysis we estimate that approximately 11% of the papers present work that will allow for replication and comparison. Surprisingly, only about 24.3% (17 of the 70 articles) com- pared their results with previous work. Of the remaining work, 41.4% generated and compared several models of their own and, somewhat unsettling, 31.4% of the researchers made no comparison whatsoever. The remaining 2.9% did not use the same data set for comparisons. The results of this study were also aggregated into 3 categories: problem class selec- tion, model fitting best practices and evaluation best practices. We conclude that model evaluation is the most problematic one. The main contribution of the article is a proposal of an ex- perimental protocol and several recommendations that specif- ically target model evaluation. Adherence to this protocol should substantially facilitate the research and application of RUL prediction models. The goals are to promote the collab- oration between scholars and practitioners alike and advance the research in this domain

    Process Mining Workshops

    Get PDF
    This open access book constitutes revised selected papers from the International Workshops held at the Third International Conference on Process Mining, ICPM 2021, which took place in Eindhoven, The Netherlands, during October 31–November 4, 2021. The conference focuses on the area of process mining research and practice, including theory, algorithmic challenges, and applications. The co-located workshops provided a forum for novel research ideas. The 28 papers included in this volume were carefully reviewed and selected from 65 submissions. They stem from the following workshops: 2nd International Workshop on Event Data and Behavioral Analytics (EDBA) 2nd International Workshop on Leveraging Machine Learning in Process Mining (ML4PM) 2nd International Workshop on Streaming Analytics for Process Mining (SA4PM) 6th International Workshop on Process Querying, Manipulation, and Intelligence (PQMI) 4th International Workshop on Process-Oriented Data Science for Healthcare (PODS4H) 2nd International Workshop on Trust, Privacy, and Security in Process Analytics (TPSA) One survey paper on the results of the XES 2.0 Workshop is included
    • …
    corecore