42 research outputs found
Goal-driven Command Recommendations for Analysts
Recent times have seen data analytics software applications become an
integral part of the decision-making process of analysts. The users of these
software applications generate a vast amount of unstructured log data. These
logs contain clues to the user's goals, which traditional recommender systems
may find difficult to model implicitly from the log data. With this assumption,
we would like to assist the analytics process of a user through command
recommendations. We categorize the commands into software and data categories
based on their purpose to fulfill the task at hand. On the premise that the
sequence of commands leading up to a data command is a good predictor of the
latter, we design, develop, and validate various sequence modeling techniques.
In this paper, we propose a framework to provide goal-driven data command
recommendations to the user by leveraging unstructured logs. We use the log
data of a web-based analytics software to train our neural network models and
quantify their performance, in comparison to relevant and competitive
baselines. We propose a custom loss function to tailor the recommended data
commands according to the goal information provided exogenously. We also
propose an evaluation metric that captures the degree of goal orientation of
the recommendations. We demonstrate the promise of our approach by evaluating
the models with the proposed metric and showcasing the robustness of our models
in the case of adversarial examples, where the user activity is misaligned with
selected goal, through offline evaluation.Comment: 14th ACM Conference on Recommender Systems (RecSys 2020
Data-Driven Methods for Data Center Operations Support
During the last decade, cloud technologies have been evolving at
an impressive pace, such that we are now living in a cloud-native
era where developers can leverage on an unprecedented landscape
of (possibly managed) services for orchestration, compute, storage,
load-balancing, monitoring, etc. The possibility to have on-demand
access to a diverse set of configurable virtualized resources allows
for building more elastic, flexible and highly-resilient distributed
applications. Behind the scenes, cloud providers sustain the heavy
burden of maintaining the underlying infrastructures, consisting in
large-scale distributed systems, partitioned and replicated among
many geographically dislocated data centers to guarantee scalability,
robustness to failures, high availability and low latency. The larger the
scale, the more cloud providers have to deal with complex interactions
among the various components, such that monitoring, diagnosing and
troubleshooting issues become incredibly daunting tasks.
To keep up with these challenges, development and operations
practices have undergone significant transformations, especially in
terms of improving the automations that make releasing new software,
and responding to unforeseen issues, faster and sustainable at scale.
The resulting paradigm is nowadays referred to as DevOps. However,
while such automations can be very sophisticated, traditional DevOps
practices fundamentally rely on reactive mechanisms, that typically
require careful manual tuning and supervision from human experts.
To minimize the risk of outages—and the related costs—it is crucial to
provide DevOps teams with suitable tools that can enable a proactive
approach to data center operations.
This work presents a comprehensive data-driven framework to address
the most relevant problems that can be experienced in large-scale
distributed cloud infrastructures. These environments are indeed characterized
by a very large availability of diverse data, collected at each
level of the stack, such as: time-series (e.g., physical host measurements,
virtual machine or container metrics, networking components
logs, application KPIs); graphs (e.g., network topologies, fault graphs
reporting dependencies among hardware and software components,
performance issues propagation networks); and text (e.g., source code,
system logs, version control system history, code review feedbacks).
Such data are also typically updated with relatively high frequency,
and subject to distribution drifts caused by continuous configuration
changes to the underlying infrastructure. In such a highly dynamic scenario,
traditional model-driven approaches alone may be inadequate
at capturing the complexity of the interactions among system components. DevOps teams would certainly benefit from having robust
data-driven methods to support their decisions based on historical
information. For instance, effective anomaly detection capabilities may
also help in conducting more precise and efficient root-cause analysis.
Also, leveraging on accurate forecasting and intelligent control
strategies would improve resource management.
Given their ability to deal with high-dimensional, complex data,
Deep Learning-based methods are the most straightforward option for
the realization of the aforementioned support tools. On the other hand,
because of their complexity, this kind of models often requires huge
processing power, and suitable hardware, to be operated effectively
at scale. These aspects must be carefully addressed when applying
such methods in the context of data center operations. Automated
operations approaches must be dependable and cost-efficient, not to
degrade the services they are built to improve.
i
Conformance checking and diagnosis in process mining
In the last decades, the capability of information systems to generate and record overwhelming amounts of event data has experimented an exponential growth in several domains, and in particular in industrial scenarios. Devices connected to the internet (internet of things), social interaction, mobile computing, and cloud computing provide new sources of event data and this trend will continue in the next decades. The omnipresence of large amounts of event data stored in logs is an important enabler for process mining, a novel discipline for addressing challenges related to business process management, process modeling, and business intelligence. Process mining techniques can be used to discover, analyze and improve real processes, by extracting models from observed behavior. The capability of these models to represent the reality determines the quality of the results obtained from them, conditioning its usefulness. Conformance checking is the aim of this thesis, where modeled and observed behavior are analyzed to determine if a model defines a faithful representation of the behavior observed a the log.
Most of the efforts in conformance checking have focused on measuring and ensuring that models capture all the behavior in the log, i.e., fitness. Other properties, such as ensuring a precise model (not including unnecessary behavior) have been disregarded. The first part of the thesis focuses on analyzing and measuring the precision dimension of conformance, where models describing precisely the reality are preferred to overly general models. The thesis includes a novel technique based on detecting escaping arcs, i.e., points where the modeled behavior deviates from the one reflected in log. The detected escaping arcs are used to determine, in terms of a metric, the precision between log and model, and to locate possible actuation points in order to achieve a more precise model. The thesis also presents a confidence interval on the provided precision metric, and a multi-factor measure to assess the severity of the detected imprecisions.
Checking conformance can be time consuming for real-life scenarios, and understanding the reasons behind the conformance mismatches can be an effort-demanding task. The second part of the thesis changes the focus from the precision dimension to the fitness dimension, and proposes the use of decomposed techniques in order to aid in checking and diagnosing fitness. The proposed approach is based on decomposing the model into single entry single exit components. The resulting fragments represent subprocesses within the main process with a simple interface with the rest of the model. Fitness checking per component provides well-localized conformance information, aiding on the diagnosis of the causes behind the problems. Moreover, the relations between components can be exploded to improve the diagnosis capabilities of the analysis, identifying areas with a high degree of mismatches, or providing a hierarchy for a zoom-in zoom-out analysis. Finally, the thesis proposed two main applications of the decomposed approach. First, the theory proposed is extended to incorporate data information for fitness checking in a decomposed manner. Second, a real-time event-based framework is presented for monitoring fitness.En las últimas décadas, la capacidad de los sistemas de información para generar y almacenar datos de eventos ha experimentado un crecimiento exponencial, especialmente en contextos como el industrial. Dispositivos conectados permanentemente a Internet (Internet of things), redes sociales, teléfonos inteligentes, y la computación en la nube proporcionan nuevas fuentes de datos, una tendencia que continuará en los siguientes años. La omnipresencia de grandes volúmenes de datos de eventos almacenados en logs abre la puerta al Process Mining (MinerÃa de Procesos), una nueva disciplina a caballo entre las técnicas de gestión de procesos de negocio, el modelado de procesos, y la inteligencia de negocio. Las técnicas de minerÃa de procesos pueden usarse para descubrir, analizar, y mejorar procesos reales, a base de extraer modelos a partir del comportamiento observado. La capacidad de estos modelos para representar la realidad determina la calidad de los resultados que se obtengan, condicionando su efectividad. El Conformance Checking (Verificación de Conformidad), objetivo final de esta tesis, permite analizar los comportamientos observados y modelados, y determinar si el modelo es una fiel representación de la realidad. La mayorÃa de los esfuerzos en Conformance Checking se han centrado en medir y asegurar que los modelos fueran capaces de capturar todo el comportamiento observado, también llamado "fitness". Otras propiedades, tales como asegurar la "precisión" de los modelos (no modelar comportamiento innecesario) han sido relegados a un segundo plano. La primera parte de esta tesis se centra en analizar la precisión, donde modelos describiendo la realidad con precisión son preferidos a modelos demasiado genéricos. La tesis presenta una nueva técnica basada en detectar "arcos de escape", i.e. puntos donde el comportamiento modelado se desvÃa del comportamiento reflejado en el log. Estos arcos de escape son usados para determinar, en forma de métrica, el nivel de precisión entre un log y un modelo, y para localizar posibles puntos de mejora. La tesis también presenta un intervalo de confianza sobre la métrica, asà como una métrica multi-factorial para medir la severidad de las imprecisiones detectadas. Conformance Checking puede ser una operación costosa para escenarios reales, y entender las razones que causan los problemas requiere esfuerzo. La segunda parte de la tesis cambia el foco (de precisión a fitness), y propone el uso de técnicas de descomposición para ayudar en la verificación de fitness. Las técnicas propuestas se basan en descomponer el modelo en componentes con una sola entrada y una sola salida, llamados SESEs. Estos componentes representan subprocesos dentro del proceso principal. Verificar el fitness a nivel de subproceso proporciona una información detallada de dónde están los problemas, ayudando en su diagnóstico. Además, las relaciones entre subprocesos pueden ser explotadas para mejorar las capacidades de diagnóstico e identificar qué áreas concentran la mayor densidad de problemas. Finalmente, la tesis propone dos aplicaciones directas de las técnicas de descomposición: 1) la teorÃa es extendida para incluir información de datos a la verificación de fitness, y 2) el uso de sistemas descompuestos en tiempo real para monitorizar fitnes
Tackling Dierent Business Process Perspectives
Business Process Management (BPM) has emerged as a discipline to design, control, analyze, and optimize business operations. Conceptual models lie at the core of BPM. In particular, business process models have been taken up by organizations as a means to describe the main activities that are performed to achieve a specific business goal. Process models generally cover different perspectives that underlie separate yet interrelated representations for analyzing and presenting process information. Being primarily driven by process improvement objectives, traditional business process modeling languages focus on capturing the control flow perspective of business processes, that is, the temporal and logical coordination of activities. Such approaches are usually characterized as \u201cactivity-centric\u201d. Nowadays, activity-centric process modeling languages, such as the Business Process Model and Notation (BPMN) standard, are still the most used in practice and benefit from industrial tool support. Nevertheless, evidence shows that such process modeling languages still lack of support for modeling non-control-flow perspectives, such as the temporal, informational, and decision perspectives, among others. This thesis centres on the BPMN standard and addresses the modeling the temporal, informational, and decision perspectives of process models, with particular attention to processes enacted in healthcare domains. Despite being partially interrelated, the main contributions of this thesis may be partitioned according to the modeling perspective they concern. The temporal perspective deals with the specification, management, and formal verification of temporal constraints. In this thesis, we address the specification and run-time management of temporal constraints in BPMN, by taking advantage of process modularity and of event handling mechanisms included in the standard. Then, we propose three different mappings from BPMN to formal models, to validate the behavior of the proposed process models and to check whether they are dynamically controllable. The informational perspective represents the information entities consumed, produced or manipulated by a process. This thesis focuses on the conceptual connection between processes and data, borrowing concepts from the database domain to enable the representation of which part of a database schema is accessed by a certain process activity. This novel conceptual view is then employed to detect potential data inconsistencies arising when the same data are accessed erroneously by different process activities. The decision perspective encompasses the modeling of the decision-making related to a process, considering where decisions are made in the process and how decision outcomes affect process execution. In this thesis, we investigate the use of the Decision Model and Notation (DMN) standard in conjunction with BPMN starting from a pattern-based approach to ease the derivation of DMN decision models from the data represented in BPMN processes. Besides, we propose a methodology that focuses on the integrated use of BPMN and DMN for modeling decision-intensive care pathways in a real-world application domain