5 research outputs found
Prognosing the Compliance of Declarative Business Processes Using Event Trace Robustness
Several proposals have studied the compliance of execution
of business process traces in accordance with a set of compliance rules.
Unfortunately, the detection of a compliance violation (diagnosis) means
that the observed events have already violated the compliance rules that
describe the model. In turn, the detection of a compliance violation before
its actual occurrence would prevent misbehaviour of the business
processes. This functionality is referred to as proactive management of
compliance violations in literature. However, existing approaches focus
on the detection of inconsistencies between the compliance rules or monitoring
process instances that are in a violable state. The notion of robustness
could help us to prognosticate the occurrence of these inconsistent
states in a premature way, and to detect, depending on the current execution
state of the process instance, how “close” the execution is to a
possible violation. On top of being able to possibly avoid violations, a
robust trace is not sensitive to small changes. In this paper we propose
the way to determine whether a process instance is robust against a set
of compliance rules during its execution at runtime. Thanks to the use of
constraint programming and the capacities of super solutions, a robust
trace can be guaranteed
Computing alignments with constraint programming : the acyclic case
Conformance checking confronts process models with real
process executions to detect and measure deviations between modelled
and observed behaviour. The core technique for conformance checking
is the computation of an alignment. Current approaches for alignment
computation rely on a shortest-path technique over the product of the
state-space of a model and the observed trace, thus suffering from the
well-known state explosion problem. This paper presents a fresh alternative
for alignment computation of acyclic process models, that encodes
the alignment problem as a Constraint Satisfaction Problem. Since modern
solvers for this framework are capable of dealing with large instances,
this contribution has a clear potential. Remarkably, our prototype implementation
can handle instances that represent a real challenge for current
techniques. Main advantages of using Constraint Programming paradigm
lie in the possibility to adapt parameters such as the maximum search
time, or the maximum misalignment allowed. Moreover, using search and
propagation algorithms incorporated in Constraint Programming Solvers
permits to find solutions for problems unsolvable with other techniques.Ministerio de Economía y Competitividad TIN2015-63502-C3-2-RMinisterio de Economía y Competitividad TIN2013-46181-C2-1-
PAIS-DQ: Extending Process-Aware Information Systems to support Data Quality in PAIS life-cycle
The successful execution of a Business Process
implies to use data with an adequate level of quality, thereby
enabling the output of processes to be obtained in accordance with
users requirements. The necessity to be aware of the data quality
in the business processes is known, but the problem is how the
incorporation of data quality management can affect and increase
the complexity of the software development that supports the
business process life-cycle. In order to gain advantages that data
quality management can provide, organizations need to introduce
mechanisms aimed at checking whether data satisfies the established
data-quality requirements. Desirably, the implementation,
deployment and use of these mechanisms should not interfere
into the regular working of the business processes. In order to
enable this independence, we propose the PAIS-DQ framework as
an extension of the classical Process-Aware Information System
(PAIS) proposal. The PAIS-DQ addresses the concerns related to
data quality management activities by minimizing the required
time for the software developers. In addition, with the aim of
guiding developers in the use of PAIS-DQ, a methodology has
been also provided to facilitate organizations to deal with complex
concerns. The methodology renders our proposal applicable in
practice, and has been applied to a case study where a service
architecture implementing the standard ISO/IEC 8000-100:2009
parts 100 to 140 is included.Ministerio de Ciencia y Tecnología TIN2015-63502Ministerio de Ciencia y Tecnología TIN2012-37493-C03-0
On the enhancement of Big Data Pipelines through Data Preparation, Data Quality, and the distribution of Optimisation Problems
Nowadays, data are fundamental for companies, providing operational support by facilitating daily
transactions. Data has also become the cornerstone of strategic decision-making processes in
businesses. For this purpose, there are numerous techniques that allow to extract knowledge and
value from data. For example, optimisation algorithms excel at supporting decision-making
processes to improve the use of resources, time and costs in the organisation. In the current
industrial context, organisations usually rely on business processes to orchestrate their daily
activities while collecting large amounts of information from heterogeneous sources. Therefore,
the support of Big Data technologies (which are based on distributed environments) is required
given the volume, variety and speed of data. Then, in order to extract value from the data, a set
of techniques or activities is applied in an orderly way and at different stages. This set of
techniques or activities, which facilitate the acquisition, preparation, and analysis of data, is known
in the literature as Big Data pipelines.
In this thesis, the improvement of three stages of the Big Data pipelines is tackled: Data
Preparation, Data Quality assessment, and Data Analysis. These improvements can be
addressed from an individual perspective, by focussing on each stage, or from a more complex
and global perspective, implying the coordination of these stages to create data workflows.
The first stage to improve is the Data Preparation by supporting the preparation of data with
complex structures (i.e., data with various levels of nested structures, such as arrays).
Shortcomings have been found in the literature and current technologies for transforming complex
data in a simple way. Therefore, this thesis aims to improve the Data Preparation stage through
Domain-Specific Languages (DSLs). Specifically, two DSLs are proposed for different use cases.
While one of them is a general-purpose Data Transformation language, the other is a DSL aimed
at extracting event logs in a standard format for process mining algorithms.
The second area for improvement is related to the assessment of Data Quality. Depending on the
type of Data Analysis algorithm, poor-quality data can seriously skew the results. A clear example
are optimisation algorithms. If the data are not sufficiently accurate and complete, the search
space can be severely affected. Therefore, this thesis formulates a methodology for modelling
Data Quality rules adjusted to the context of use, as well as a tool that facilitates the automation
of their assessment. This allows to discard the data that do not meet the quality criteria defined
by the organisation. In addition, the proposal includes a framework that helps to select actions to
improve the usability of the data.
The third and last proposal involves the Data Analysis stage. In this case, this thesis faces the
challenge of supporting the use of optimisation problems in Big Data pipelines. There is a lack of
methodological solutions that allow computing exhaustive optimisation problems in distributed
environments (i.e., those optimisation problems that guarantee the finding of an optimal solution
by exploring the whole search space). The resolution of this type of problem in the Big Data
context is computationally complex, and can be NP-complete. This is caused by two different
factors. On the one hand, the search space can increase significantly as the amount of data to
be processed by the optimisation algorithms increases. This challenge is addressed through a
technique to generate and group problems with distributed data. On the other hand, processing
optimisation problems with complex models and large search spaces in distributed environments
is not trivial. Therefore, a proposal is presented for a particular case in this type of scenario.
As a result, this thesis develops methodologies that have been published in scientific journals and
conferences.The methodologies have been implemented in software tools that are integrated with
the Apache Spark data processing engine. The solutions have been validated through tests and use cases with real datasets
Una propuesta basada en el paradigma dirigido por modelos para la extracción de procesos del software desde sistemas heredados utilizando la perspectiva temporal
Falta palabras claveBusiness Process Management (BPM) es un factor estratégico en el sector de las tecnologías de la información (TI), así como en otros sectores productivos. Las TI utilizan sistemas heredados (legacy systems) para gestionar su negocio, donde sus bases de datos (legacy databases) almacenan estados históricos de la ejecución de todo tipo de procesos, razón por la cual pueden considerarse como una fuente para extraer perspectivas o dimensiones de estos procesos: i) el tiempo, ii) los recursos, iii) la información y iv) los casos.
Algunos estándares para representar procesos del software como UML AD, BPMN, SPEM e Iso/Iec 24744 están sustentados por robustos Meta-modelos. El paradigma Model Driven Engineering (MDE) es cada vez más aceptado al ofrecer modelos y Meta-modelos de diversos niveles de abstracción así como mecanismos para realizar transformaciones entre ellos. MDE puede emplearse para tanto para extraer modelos mediante ingeniería inversa como para generar modelos desde una especificación de alto nivel.
Esta tesis define una propuesta general basada en MDE para hacer ingeniería inversa de legacy databases extrayendo la perspectiva temporal de procesos de TI. Se ha analizado la definición de dimensiones BPM sobre distintas categorías de legacy systems de uso frecuente en TI, concluyendo que casi toda TI organiza su actividad bajo proyectos que tiene que planificar y controlar. Un estudio sistemático de la literatura realizado sobre la especificación de la dimensión temporal de los procesos nos ha llevado a aportar una taxonomía de reglas que cubre la tipología que aparecen en las TI y también en otros sectores. Esta taxonomía nos ha permitido evaluar carencias de lenguajes de procesos de uso frecuentemente en TI y proponer Meta-modelos UML y OCL que permiten formalizar dichas reglas para resolver estas debilidades, además de facilitar la extracción de procesos desde legacy databases. MS Project (como legacy system) y BPMN (como estándar de modelación e intercambio de procesos serializados) son usados frecuentemente en el sector TI, razón por la que consideramos estos sistemas como piloto de la solución. La arquitectura general se especializa con este caso piloto, definiendo: i) un Meta-modelo de tareas para MS Project, ii) la extensión del Meta-modelo de BPMN con la dimensión temporal y iii) transformaciones MDE que extraen automáticamente procesos BPMN desde proyectos definidos en MS Project.
La solución se ha contrastado en el proyecto de transferencia tecnológica AQUA-WS entre el grupo PAIDI TIC021 IWT2 y EMASESA, verificando la utilidad de los resultados obtenidos, que podrían extrapolarse a otros casos y sectores productivos. Por otro lado, como trabajo futuro, se podría: i) incorporar otras perspectivas BPM como: recursos y casos; ii) generar log de eventos para utilizarse en el campo de la minería de procesos