42 research outputs found

    Goal-driven Command Recommendations for Analysts

    Full text link
    Recent times have seen data analytics software applications become an integral part of the decision-making process of analysts. The users of these software applications generate a vast amount of unstructured log data. These logs contain clues to the user's goals, which traditional recommender systems may find difficult to model implicitly from the log data. With this assumption, we would like to assist the analytics process of a user through command recommendations. We categorize the commands into software and data categories based on their purpose to fulfill the task at hand. On the premise that the sequence of commands leading up to a data command is a good predictor of the latter, we design, develop, and validate various sequence modeling techniques. In this paper, we propose a framework to provide goal-driven data command recommendations to the user by leveraging unstructured logs. We use the log data of a web-based analytics software to train our neural network models and quantify their performance, in comparison to relevant and competitive baselines. We propose a custom loss function to tailor the recommended data commands according to the goal information provided exogenously. We also propose an evaluation metric that captures the degree of goal orientation of the recommendations. We demonstrate the promise of our approach by evaluating the models with the proposed metric and showcasing the robustness of our models in the case of adversarial examples, where the user activity is misaligned with selected goal, through offline evaluation.Comment: 14th ACM Conference on Recommender Systems (RecSys 2020

    Resource-aware business process management : analysis and support

    Get PDF

    Data-Driven Methods for Data Center Operations Support

    Get PDF
    During the last decade, cloud technologies have been evolving at an impressive pace, such that we are now living in a cloud-native era where developers can leverage on an unprecedented landscape of (possibly managed) services for orchestration, compute, storage, load-balancing, monitoring, etc. The possibility to have on-demand access to a diverse set of configurable virtualized resources allows for building more elastic, flexible and highly-resilient distributed applications. Behind the scenes, cloud providers sustain the heavy burden of maintaining the underlying infrastructures, consisting in large-scale distributed systems, partitioned and replicated among many geographically dislocated data centers to guarantee scalability, robustness to failures, high availability and low latency. The larger the scale, the more cloud providers have to deal with complex interactions among the various components, such that monitoring, diagnosing and troubleshooting issues become incredibly daunting tasks. To keep up with these challenges, development and operations practices have undergone significant transformations, especially in terms of improving the automations that make releasing new software, and responding to unforeseen issues, faster and sustainable at scale. The resulting paradigm is nowadays referred to as DevOps. However, while such automations can be very sophisticated, traditional DevOps practices fundamentally rely on reactive mechanisms, that typically require careful manual tuning and supervision from human experts. To minimize the risk of outages—and the related costs—it is crucial to provide DevOps teams with suitable tools that can enable a proactive approach to data center operations. This work presents a comprehensive data-driven framework to address the most relevant problems that can be experienced in large-scale distributed cloud infrastructures. These environments are indeed characterized by a very large availability of diverse data, collected at each level of the stack, such as: time-series (e.g., physical host measurements, virtual machine or container metrics, networking components logs, application KPIs); graphs (e.g., network topologies, fault graphs reporting dependencies among hardware and software components, performance issues propagation networks); and text (e.g., source code, system logs, version control system history, code review feedbacks). Such data are also typically updated with relatively high frequency, and subject to distribution drifts caused by continuous configuration changes to the underlying infrastructure. In such a highly dynamic scenario, traditional model-driven approaches alone may be inadequate at capturing the complexity of the interactions among system components. DevOps teams would certainly benefit from having robust data-driven methods to support their decisions based on historical information. For instance, effective anomaly detection capabilities may also help in conducting more precise and efficient root-cause analysis. Also, leveraging on accurate forecasting and intelligent control strategies would improve resource management. Given their ability to deal with high-dimensional, complex data, Deep Learning-based methods are the most straightforward option for the realization of the aforementioned support tools. On the other hand, because of their complexity, this kind of models often requires huge processing power, and suitable hardware, to be operated effectively at scale. These aspects must be carefully addressed when applying such methods in the context of data center operations. Automated operations approaches must be dependable and cost-efficient, not to degrade the services they are built to improve. i

    Conformance checking and diagnosis in process mining

    Get PDF
    In the last decades, the capability of information systems to generate and record overwhelming amounts of event data has experimented an exponential growth in several domains, and in particular in industrial scenarios. Devices connected to the internet (internet of things), social interaction, mobile computing, and cloud computing provide new sources of event data and this trend will continue in the next decades. The omnipresence of large amounts of event data stored in logs is an important enabler for process mining, a novel discipline for addressing challenges related to business process management, process modeling, and business intelligence. Process mining techniques can be used to discover, analyze and improve real processes, by extracting models from observed behavior. The capability of these models to represent the reality determines the quality of the results obtained from them, conditioning its usefulness. Conformance checking is the aim of this thesis, where modeled and observed behavior are analyzed to determine if a model defines a faithful representation of the behavior observed a the log. Most of the efforts in conformance checking have focused on measuring and ensuring that models capture all the behavior in the log, i.e., fitness. Other properties, such as ensuring a precise model (not including unnecessary behavior) have been disregarded. The first part of the thesis focuses on analyzing and measuring the precision dimension of conformance, where models describing precisely the reality are preferred to overly general models. The thesis includes a novel technique based on detecting escaping arcs, i.e., points where the modeled behavior deviates from the one reflected in log. The detected escaping arcs are used to determine, in terms of a metric, the precision between log and model, and to locate possible actuation points in order to achieve a more precise model. The thesis also presents a confidence interval on the provided precision metric, and a multi-factor measure to assess the severity of the detected imprecisions. Checking conformance can be time consuming for real-life scenarios, and understanding the reasons behind the conformance mismatches can be an effort-demanding task. The second part of the thesis changes the focus from the precision dimension to the fitness dimension, and proposes the use of decomposed techniques in order to aid in checking and diagnosing fitness. The proposed approach is based on decomposing the model into single entry single exit components. The resulting fragments represent subprocesses within the main process with a simple interface with the rest of the model. Fitness checking per component provides well-localized conformance information, aiding on the diagnosis of the causes behind the problems. Moreover, the relations between components can be exploded to improve the diagnosis capabilities of the analysis, identifying areas with a high degree of mismatches, or providing a hierarchy for a zoom-in zoom-out analysis. Finally, the thesis proposed two main applications of the decomposed approach. First, the theory proposed is extended to incorporate data information for fitness checking in a decomposed manner. Second, a real-time event-based framework is presented for monitoring fitness.En las últimas décadas, la capacidad de los sistemas de información para generar y almacenar datos de eventos ha experimentado un crecimiento exponencial, especialmente en contextos como el industrial. Dispositivos conectados permanentemente a Internet (Internet of things), redes sociales, teléfonos inteligentes, y la computación en la nube proporcionan nuevas fuentes de datos, una tendencia que continuará en los siguientes años. La omnipresencia de grandes volúmenes de datos de eventos almacenados en logs abre la puerta al Process Mining (Minería de Procesos), una nueva disciplina a caballo entre las técnicas de gestión de procesos de negocio, el modelado de procesos, y la inteligencia de negocio. Las técnicas de minería de procesos pueden usarse para descubrir, analizar, y mejorar procesos reales, a base de extraer modelos a partir del comportamiento observado. La capacidad de estos modelos para representar la realidad determina la calidad de los resultados que se obtengan, condicionando su efectividad. El Conformance Checking (Verificación de Conformidad), objetivo final de esta tesis, permite analizar los comportamientos observados y modelados, y determinar si el modelo es una fiel representación de la realidad. La mayoría de los esfuerzos en Conformance Checking se han centrado en medir y asegurar que los modelos fueran capaces de capturar todo el comportamiento observado, también llamado "fitness". Otras propiedades, tales como asegurar la "precisión" de los modelos (no modelar comportamiento innecesario) han sido relegados a un segundo plano. La primera parte de esta tesis se centra en analizar la precisión, donde modelos describiendo la realidad con precisión son preferidos a modelos demasiado genéricos. La tesis presenta una nueva técnica basada en detectar "arcos de escape", i.e. puntos donde el comportamiento modelado se desvía del comportamiento reflejado en el log. Estos arcos de escape son usados para determinar, en forma de métrica, el nivel de precisión entre un log y un modelo, y para localizar posibles puntos de mejora. La tesis también presenta un intervalo de confianza sobre la métrica, así como una métrica multi-factorial para medir la severidad de las imprecisiones detectadas. Conformance Checking puede ser una operación costosa para escenarios reales, y entender las razones que causan los problemas requiere esfuerzo. La segunda parte de la tesis cambia el foco (de precisión a fitness), y propone el uso de técnicas de descomposición para ayudar en la verificación de fitness. Las técnicas propuestas se basan en descomponer el modelo en componentes con una sola entrada y una sola salida, llamados SESEs. Estos componentes representan subprocesos dentro del proceso principal. Verificar el fitness a nivel de subproceso proporciona una información detallada de dónde están los problemas, ayudando en su diagnóstico. Además, las relaciones entre subprocesos pueden ser explotadas para mejorar las capacidades de diagnóstico e identificar qué áreas concentran la mayor densidad de problemas. Finalmente, la tesis propone dos aplicaciones directas de las técnicas de descomposición: 1) la teoría es extendida para incluir información de datos a la verificación de fitness, y 2) el uso de sistemas descompuestos en tiempo real para monitorizar fitnes

    Tackling Dierent Business Process Perspectives

    Get PDF
    Business Process Management (BPM) has emerged as a discipline to design, control, analyze, and optimize business operations. Conceptual models lie at the core of BPM. In particular, business process models have been taken up by organizations as a means to describe the main activities that are performed to achieve a specific business goal. Process models generally cover different perspectives that underlie separate yet interrelated representations for analyzing and presenting process information. Being primarily driven by process improvement objectives, traditional business process modeling languages focus on capturing the control flow perspective of business processes, that is, the temporal and logical coordination of activities. Such approaches are usually characterized as \u201cactivity-centric\u201d. Nowadays, activity-centric process modeling languages, such as the Business Process Model and Notation (BPMN) standard, are still the most used in practice and benefit from industrial tool support. Nevertheless, evidence shows that such process modeling languages still lack of support for modeling non-control-flow perspectives, such as the temporal, informational, and decision perspectives, among others. This thesis centres on the BPMN standard and addresses the modeling the temporal, informational, and decision perspectives of process models, with particular attention to processes enacted in healthcare domains. Despite being partially interrelated, the main contributions of this thesis may be partitioned according to the modeling perspective they concern. The temporal perspective deals with the specification, management, and formal verification of temporal constraints. In this thesis, we address the specification and run-time management of temporal constraints in BPMN, by taking advantage of process modularity and of event handling mechanisms included in the standard. Then, we propose three different mappings from BPMN to formal models, to validate the behavior of the proposed process models and to check whether they are dynamically controllable. The informational perspective represents the information entities consumed, produced or manipulated by a process. This thesis focuses on the conceptual connection between processes and data, borrowing concepts from the database domain to enable the representation of which part of a database schema is accessed by a certain process activity. This novel conceptual view is then employed to detect potential data inconsistencies arising when the same data are accessed erroneously by different process activities. The decision perspective encompasses the modeling of the decision-making related to a process, considering where decisions are made in the process and how decision outcomes affect process execution. In this thesis, we investigate the use of the Decision Model and Notation (DMN) standard in conjunction with BPMN starting from a pattern-based approach to ease the derivation of DMN decision models from the data represented in BPMN processes. Besides, we propose a methodology that focuses on the integrated use of BPMN and DMN for modeling decision-intensive care pathways in a real-world application domain
    corecore