16,887 research outputs found

    Efficient Algorithms for Discovering Concept Drift in Business Processes.

    Get PDF
    Protsessikaeve on suhteliselt uus, kuid ühiskonna poolt juba kasutusele võetud uurimisvaldkond. Paljud ettevõtted ja asutused rakendavad erinevaid infosüsteemidega toetatud protsesse, mille käivitamisest jäävad maha sündmuste logid. Neid logisid analüüsides saab ehitada mudeli, mis kajastab, kuidas need protsessid reaalselt toimivad. Tänapäevased algoritmid eeldavad, et analüüsitav protsess on stabiilne, kuid tegelikult võib seda mõjutada hooaegsus, uus seadus või mõni väline sündmus – näiteks järsk majanduslangus. Sellisel juhul on tegemist kontseptsiooninihkega. Kontseptsiooninihked võivad olla järsud (kui protsessi muutus on äkiline) või järkjärgulised (kui üks protsessivariant asendub teisega sujuvalt). Antud töös pakkusime välja viis uudset lähenemist kontseptsiooninihke avastamiseks protsessikaeves. Igaüks neist parandab või laiendab algset Bose poolt kirjeldatud algoritmi [1]. Sammu pikkuse suurendamine võimaldab algoritmi kiirendada, jättes välja mõned vahepealsed sammud. Muutmispunkti automaatne leidmine võimaldab ekstraheerida kontseptsiooninihke punktid ilma manuaalse analüüsita. Adapteerivate akende algoritm (ADWIN) pehmendab originaalse algoritmi sõltuvust populatsiooni suurusest, seega vähendab vale-positiivsete ja vale-negatiivsete tulemuste arvu. Mittejärjestikkuste populatsioonidega algoritm võimaldab uurida järkjärgulisi kontseptsiooninihkeid. Lisaks lubab populatsioonide suuruste määramine ajaliste perioodide kaupa (jälgede koguse asemel) leida mikro-taseme ja makro-taseme nihked multi-taseme dünaamikaga logides, kus protsess muutub mitmel detailsuse tasemel. Kõik algoritmid olid implementeetirud ProM raamistiku Concept Drift moodulis. Algoritmide kvaliteedi hindamiseks pakub käesolev töö välja meetodi, kus CPN Tools programmi abil genereeritakse logisid erinevate kontseptsiooninihke tunnustega. Samuti on välja arendatud kvaliteedi hindamise raamistik, mis sarnaneb sellega, mis on kasutusel infootsingu valdkonnas ning mis hõlmab endas tegelike positiivsete, valepositiivsete ja valenegatiivsete väärtuste loendamist ning tuletatud meetrikate arvutamist. Algoritmid olid edukalt testitud nii simuleeritud, kui ka päriselu andmetega. [1] Bose, R.P.J.C., van der Aalst, W.M.P., Žliobaitė, I., Pechenizkiy, M.: Handling Concept Drift in Process Mining. In: CAiSE. LNCS, vol. 6741, pp. 391–405.Springer, Berlin (2011)Process mining is a relatively new research area, but it is already used in practice. Every company and organization run different business processes, which are supported by information systems and which leave event logs while being executed. By analyzing those logs one can build a process model, which reflects how the process operates in reality.Existing algorithms assume that the analyzed process is in steady state, however it could be altered because of seasonality, a new law or some event, like a financial crisis. In this case, we have to deal with concept drift. Concept drifts can be sudden, when the change is abrupt and gradual, where one concept fades gradually while the other takes over. In this work we proposed five novel approaches for detecting concept drifts in process mining. All of them improve or expand the algorithm, proposed by Bose et al [1]. Step size improvement allows to speed up the algorithm by leaving out some intermediate steps. Automatic change point detection algorithm allows to extract the concept drift points without the need to analyze the plot manually. The adaptive windows algorithm (ADWIN) relaxes the original algorithm's dependency on the fixed population size, thus reducing the amount of false positives and false negatives. The algorithm with non-continuous populations allows to deal with gradual drifts. And finally, defining the population sizes in terms of time periods instead of trace amount allows to detect micro-level and macro-level drifts in logs with multi-order dynamics, where process changes can happen on multiple level of granularity. The algorithms were implemented in the Concept Drift plug-in of ProM framework. For assessing the quality of algorithms, we proposed a way to generate logs with different concept drift characteristics using CPN Tools and a quality evaluation framework, similar to the one used in the field information retrieval, involving calculating true positives, false positives, false negative and derived metrics. The algotihms were successfully tested on both simulated and real-life data. [1] Bose, R.P.J.C., van der Aalst, W.M.P., Žliobaitė, I., Pechenizkiy, M.: Handling Concept Drift in Process Mining. In: CAiSE. LNCS, vol. 6741, pp. 391–405.Springer, Berlin (2011

    CONDA-PM -- A Systematic Review and Framework for Concept Drift Analysis in Process Mining

    Get PDF
    Business processes evolve over time to adapt to changing business environments. This requires continuous monitoring of business processes to gain insights into whether they conform to the intended design or deviate from it. The situation when a business process changes while being analysed is denoted as Concept Drift. Its analysis is concerned with studying how a business process changes, in terms of detecting and localising changes and studying the effects of the latter. Concept drift analysis is crucial to enable early detection and management of changes, that is, whether to promote a change to become part of an improved process, or to reject the change and make decisions to mitigate its effects. Despite its importance, there exists no comprehensive framework for analysing concept drift types, affected process perspectives, and granularity levels of a business process. This article proposes the CONcept Drift Analysis in Process Mining (CONDA-PM) framework describing phases and requirements of a concept drift analysis approach. CONDA-PM was derived from a Systematic Literature Review (SLR) of current approaches analysing concept drift. We apply the CONDA-PM framework on current approaches to concept drift analysis and evaluate their maturity. Applying CONDA-PM framework highlights areas where research is needed to complement existing efforts.Comment: 45 pages, 11 tables, 13 figure

    An agent-based implementation of hidden Markov models for gas turbine condition monitoring

    Get PDF
    This paper considers the use of a multi-agent system (MAS) incorporating hidden Markov models (HMMs) for the condition monitoring of gas turbine (GT) engines. Hidden Markov models utilizing a Gaussian probability distribution are proposed as an anomaly detection tool for gas turbines components. The use of this technique is shown to allow the modeling of the dynamics of GTs despite a lack of high frequency data. This allows the early detection of developing faults and avoids costly outages due to asset failure. These models are implemented as part of a MAS, using a proposed extension of an established power system ontology, for fault detection of gas turbines. The multi-agent system is shown to be applicable through a case study and comparison to an existing system utilizing historic data from a combined-cycle gas turbine plant provided by an industrial partner

    Data-driven Soft Sensors in the Process Industry

    Get PDF
    In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work

    An adaptive, fault-tolerant system for road network traffic prediction using machine learning

    Get PDF
    This thesis has addressed the design and development of an integrated system for real-time traffic forecasting based on machine learning methods. Although traffic prediction has been the driving motivation for the thesis development, a great part of the proposed ideas and scientific contributions in this thesis are generic enough to be applied in any other problem where, ideally, their definition is that of the flow of information in a graph-like structure. Such application is of special interest in environments susceptible to changes in the underlying data generation process. Moreover, the modular architecture of the proposed solution facilitates the adoption of small changes to the components that allow it to be adapted to a broader range of problems. On the other hand, certain specific parts of this thesis are strongly tied to the traffic flow theory. The focus in this thesis is on a macroscopic perspective of the traffic flow where the individual road traffic flows are correlated to the underlying traffic demand. These short-term forecasts include the road network characterization in terms of the corresponding traffic measurements –traffic flow, density and/or speed–, the traffic state –whether a road is congested or not, and its severity–, and anomalous road conditions –incidents or other non-recurrent events–. The main traffic data used in this thesis is data coming from detectors installed along the road networks. Nevertheless, other kinds of traffic data sources could be equally suitable with the appropriate preprocessing. This thesis has been developed in the context of Aimsun Live –a simulation-based traffic solution for real-time traffic prediction developed by Aimsun–. The methods proposed here is planned to be linked to it in a mutually beneficial relationship where they cooperate and assist each other. An example is when an incident or non-recurrent event is detected with the proposed methods in this thesis, then the simulation-based forecasting module can simulate different strategies to measure their impact. Part of this thesis has been also developed in the context of the EU research project "SETA" (H2020-ICT-2015). The main motivation that has guided the development of this thesis is enhancing those weak points and limitations previously identified in Aimsun Live, and whose research found in literature has not been especially extensive. These include: • Autonomy, both in the preparation and real-time stages. • Adaptation, to gradual or abrupt changes in traffic demand or supply. • Informativeness, about anomalous road conditions. • Forecasting accuracy improved with respect to previous methodology at Aimsun and a typical forecasting baseline. • Robustness, to deal with faulty or missing data in real-time. • Interpretability, adopting modelling choices towards a more transparent reasoning and understanding of the underlying data-driven decisions. • Scalable, using a modular architecture with emphasis on a parallelizable exploitation of large amounts of data. The result of this thesis is an integrated system –Adarules– for real-time forecasting which is able to make the best of the available historical data, while at the same time it also leverages the theoretical unbounded size of data in a continuously streaming scenario. This is achieved through the online learning and change detection features along with the automatic finding and maintenance of patterns in the network graph. In addition to the Adarules system, another result is a probabilistic model that characterizes a set of interpretable latent variables related to the traffic state based on the traffic data provided by the sensors along with optional prior knowledge provided by the traffic expert following a Bayesian approach. On top of this traffic state model, it is built the probabilistic spatiotemporal model that learns the dynamics of the transition of traffic states in the network, and whose objectives include the automatic incident detection.Esta tesis ha abordado el diseño y desarrollo de un sistema integrado para la predicción de tráfico en tiempo real basándose en métodos de aprendizaje automático. Aunque la predicción de tráfico ha sido la motivación que ha guiado el desarrollo de la tesis, gran parte de las ideas y aportaciones científicas propuestas en esta tesis son lo suficientemente genéricas como para ser aplicadas en cualquier otro problema en el que, idealmente, su definición sea la del flujo de información en una estructura de grafo. Esta aplicación es de especial interés en entornos susceptibles a cambios en el proceso de generación de datos. Además, la arquitectura modular facilita la adaptación a una gama más amplia de problemas. Por otra parte, ciertas partes específicas de esta tesis están fuertemente ligadas a la teoría del flujo de tráfico. El enfoque de esta tesis se centra en una perspectiva macroscópica del flujo de tráfico en la que los flujos individuales están ligados a la demanda de tráfico subyacente. Las predicciones a corto plazo incluyen la caracterización de las carreteras en base a las medidas de tráfico -flujo, densidad y/o velocidad-, el estado del tráfico -si la carretera está congestionada o no, y su severidad-, y la detección de condiciones anómalas -incidentes u otros eventos no recurrentes-. Los datos utilizados en esta tesis proceden de detectores instalados a lo largo de las redes de carreteras. No obstante, otros tipos de fuentes de datos podrían ser igualmente empleados con el preprocesamiento apropiado. Esta tesis ha sido desarrollada en el contexto de Aimsun Live -software desarrollado por Aimsun, basado en simulación para la predicción en tiempo real de tráfico-. Los métodos aquí propuestos cooperarán con este. Un ejemplo es cuando se detecta un incidente o un evento no recurrente, entonces pueden simularse diferentes estrategias para medir su impacto. Parte de esta tesis también ha sido desarrollada en el marco del proyecto de la UE "SETA" (H2020-ICT-2015). La principal motivación que ha guiado el desarrollo de esta tesis es mejorar aquellas limitaciones previamente identificadas en Aimsun Live, y cuya investigación encontrada en la literatura no ha sido muy extensa. Estos incluyen: -Autonomía, tanto en la etapa de preparación como en la de tiempo real. -Adaptación, a los cambios graduales o abruptos de la demanda u oferta de tráfico. -Sistema informativo, sobre las condiciones anómalas de la carretera. -Mejora en la precisión de las predicciones con respecto a la metodología anterior de Aimsun y a un método típico usado como referencia. -Robustez, para hacer frente a datos defectuosos o faltantes en tiempo real. -Interpretabilidad, adoptando criterios de modelización hacia un razonamiento más transparente para un humano. -Escalable, utilizando una arquitectura modular con énfasis en una explotación paralela de grandes cantidades de datos. El resultado de esta tesis es un sistema integrado –Adarules- para la predicción en tiempo real que sabe maximizar el provecho de los datos históricos disponibles, mientras que al mismo tiempo también sabe aprovechar el tamaño teórico ilimitado de los datos en un escenario de streaming. Esto se logra a través del aprendizaje en línea y la capacidad de detección de cambios junto con la búsqueda automática y el mantenimiento de los patrones en la estructura de grafo de la red. Además del sistema Adarules, otro resultado de la tesis es un modelo probabilístico que caracteriza un conjunto de variables latentes interpretables relacionadas con el estado del tráfico basado en los datos de sensores junto con el conocimiento previo –opcional- proporcionado por el experto en tráfico utilizando un planteamiento Bayesiano. Sobre este modelo de estados de tráfico se construye el modelo espacio-temporal probabilístico que aprende la dinámica de la transición de estadosPostprint (published version
    corecore