2 research outputs found

    Machine-tool condition monitoring with Gaussian mixture models-based dynamic probabilistic clustering

    Full text link
    The combination of artificial intelligence with data, computing power, and new algorithms can provide important tools for solving engineering problems, such as machine-tool condition monitoring. However, many of these problems require algorithms that can perform in highly dynamic scenarios where the data stream shave extremely high sampling rates from different types of variables. The unsupervised learning algorithm based on Gaussian mixture models called Gaussian-based dynamic probabilistic clustering (GDPC) is one of these tools. However, this algorithm may have major limitations if a large amount of concept drifts associated with transients occurs within the data stream. GDPC becomes unstable under these conditions, so we propose anew algorithm called GDPC+ to increase its robustness. GDPC+ represents an important improvement because we introduce: (a) automatic selection of the number of mixture components based on the Bayesian information criterion (BIC), and (b) concept drift transition stabilization based on Cauchy–Schwarz divergence integrated with the Dickey–Fuller test. Thus, GDPC+ can perform better in highly dynamic scenarios than GDPC in terms of the number of false positives. The behavior of GDPC+ was investigated using random synthetic data streams and in a real data stream-based condition monitoring obtained from a machine-tool that produces engine crankshafts at high speed. We found that the initial temporal window size can be used to adapt the algorithm to different analytical requirements. The clustering results were also investigated by induction of the rules generated by the repeated incremental pruning to produce error reduction (RIPPER) algorithm in order to provide insights from the underlying monitored process and its associated concept drifts

    A proposição de um framework de Data Analytics para o estudo do desempenho da inovação

    Get PDF
    O objetivo deste estudo é propor um framework de data analytics para classificar setores econômicos em níveis de inovação – em uma escala que vai de altamente a pouco inovadores, a partir de uma base de dados com indicadores de inovação. O problema consiste em entender como se comporta o desempenho de inovação nesses setores, dado o número de empresas inovadoras que contêm e características que apresentam, e é formulado como um problema de classificação. O framework combina métodos para normalização da base, determinação do número de classes (níveis de inovação) encontrados nos dados, tratamento de classes desbalanceadas, seleção de variáveis (indicadores de inovação dos setores), classificação e estimação do desempenho da inovação (empresas que inovam no setor em relação ao total da amostra). Para isso, diferentes abordagens são experimentadas. Os modelos Random Forest, Extreme Gradient Boosting e Support Vector Machine são utilizados nas etapas de classificação das observações, seleção de variáveis e estimação da variável de saída. Na determinação do número de classes, são experimentadas abordagens gerencial e de quartis. Técnicas de Synthetic Minority Oversampling Technique são testadas para o balanceamento de amostras nas classes. A abordagem analítica no estudo dos dados de inovação das empresas auxilia na compreensão dos fatores que influenciam o desempenho da inovação dos setores e apoiará a tomada de decisão acerca de ações de fomento.The aim of this study is to propose an analytics framework to classify sectors at levels of innovation - on a scale from highly to less innovative, given a database with innovation indicators for economic sectors. The problem is to understand how innovation performance behaves in these sectors, given the number of innovative companies they contain and the characteristics they present, and it is formulated as a classification problem. The framework combines methods for data normalization, determination of the number of classes (levels of innovation), deal with imbalanced classes, feature selection (innovation indicators), classification and estimation (companies that innovate in the sector in relation to the total sample). For this, different approaches are tested. The Random Forest, Extreme Gradient Boosting and Support Vector Machine models are used in the observation classification, feature selection and output estimation steps. To determine the number of classes, managerial and quartile approaches are experimented. Synthetic Minority Oversampling Techniques are tested for balancing classes. The analytical approach in the study of companies innovation data helps to understand which factors that affect the sectors innovation performance and support decision making about fostering actions
    corecore