    A relational algebra approach to ETL modeling

    The MAP-i Doctoral Programme in Informatics, of the Universities of Minho, Aveiro and PortoInformation Technology has been one of drivers of the revolution that currently is happening in today’s management decisions in most organizations. The amount of data gathered and processed through the use of computing devices has been growing every day, providing a valuable source of information for decision makers that are managing every type of organization, public or private. Gathering the right amount of data in a centralized and unified repository like a data warehouse is similar to build the foundations for a system that will act has a base to support decision making processes requiring factual information. Nevertheless, the complexity of building such a repository is very challenging, as well as developing all the components of a data warehousing system. One of the most critical components of a data warehousing system is the Extract-Transform-Load component, ETL for short, which is responsible for gathering data from information sources, clean, transform and conform it in order to store it in a data warehouse. Several designing methodologies for the ETL components have been presented in the last few years with very little impact in ETL commercial tools. Basically, this was due to an existing gap between the conceptual design of an ETL system and its correspondent physical implementation. The methodologies proposed ranged from new approaches, with novel notation and diagrams, to the adoption and expansion of current standard modeling notations, like UML or BPMN. However, all these proposals do not contain enough detail to be translated automatically into a specific execution platform. The use of a standard well-known notation like Relational Algebra might bridge the gap between the conceptual design and the physical design of an ETL component, mainly due to its formal approach that is based on a limited set of operators and also due to its functional characteristics like being a procedural language operating over data stored in relational format. The abstraction that Relational Algebra provides over the technological infrastructure might also be an advantage for uncommon execution platforms, like computing grids that provide an exceptional amount of processing power that is very critical for ETL systems. Additionally, partitioning data and task distribution over computing nodes works quite well with a Relational Algebra approach. An extensive research over the use of Relational Algebra in the ETL context was conducted to validate its usage. To complement this, a set of Relational Algebra patterns were also developed to support the most common ETL tasks, like changing data capture, data quality enforcement, data conciliation and integration, slowly changing dimensions and surrogate key pipelining. All these patterns provide a formal approach to the referred ETL tasks by specifying all the operations needed to accomplish them in a series of Relational Algebra operations. To evaluate the feasibility of the work done in this thesis, we used a real ETL application scenario for the extraction of data in two different social networks operational systems, storing hashtag usage information in a specific data mart. The ability to analyze trends in social network usage is a hot topic in today’s media and information coverage. A complete design of the ETL component using the patterns developed previously is also provided, as well as a critical evaluation of its usage.As Tecnologias da Informação têm sido um dos principais catalisadores na revolução que se assiste nas tomadas de decisão na maioria das organizações. A quantidade de dados que são angariados e processados através do uso de dispositivos computacionais tem crescido diariamente, tornando-se uma fonte de informação valiosa para os decisores que gerem todo o tipo de organizações, públicas ou privadas. Concentrar o conjunto ideal de dados num repositório centralizado e unificado, como um data warehouse, é essencial para a construção de um sistema que servirá de suporte aos processos de tomada de decisão que necessitam de factos. No entanto, a complexidade associada à construção deste repositório e de todos os componentes que caracterizam um sistema de data warehousing é extremamente desafiante. Um dos componentes mais críticos de um sistema de data warehousing é a componente de Extração-Transformação- Alimentação (ETL) que lida com a extração de dados das fontes, que limpa, transforma e concilia os dados com vista à sua integração no data warehouse. Nos últimos anos têm sido apresentadas várias metodologias de desenho da componente de ETL, no entanto estas não têm sido adotadas pelas ferramentas comerciais de ETL principalmente devido ao diferencial existente entre o desenho concetual e as plataformas físicas de execução. As metodologias de desenho propostas variam desde propostas que assentam em novas notações e diagramas até às propostas que usam notações standard como a notação UML e BPMN que depois são complementadas com conceitos de ETL. Contudo, estas propostas de modelação concetual não contêm informações detalhadas que permitam uma tradução automática para plataformas de execução. A utilização de uma linguagem standard e reconhecida como a linguagem de Álgebra Relacional pode servir como complemento e colmatar o diferencial existente entre o desenho concetual e o desenho físico da componente de ETL, principalmente devido ao facto de esta linguagem assentar numa abordagem procedimental com um conjunto limitado de operadores que atuam sobre dados armazenados num formato relacional. A abstração providenciada pela Álgebra Relacional relativamente às plataformas de execução pode eventualmente ser uma vantagem tendo em vista a utilização de plataformas menos comuns, como por exemplo grids computacionais. Este tipo de arquiteturas disponibiliza por norma um grande poder computacional o que é essencial para um sistema de ETL. O particionamento e distribuição dos dados e tarefas pelos nodos computacionais conjugam relativamente bem com a abordagem da Álgebra Relacional. No decorrer deste trabalho foi efetuado um estudo extensivo às propriedades da AR num contexto de ETL com vista à avaliação da sua usabilidade. Como complemento, foram desenhados um conjunto de padrões de AR que suportam as atividades mais comuns de ETL como por exemplo changing data capture, data quality enforcement, data conciliation and integration, slowly changing dimensions e surrogate key pipelining. Estes padrões formalizam este conjunto de atividades ETL, especificando numa série de operações de Álgebra Relacional quais os passos necessários à sua execução. Com vista à avaliação da sustentabilidade da proposta presente neste trabalho, foi utilizado um cenário real de ETL em que os dados fontes pertencem a duas redes sociais e os dados armazenados no data mart identificam a utilização de hashtags por parte dos seus utilizadores. De salientar que a deteção de tendências e de assuntos que estão na ordem do dia nas redes sociais é de vital importância para as empresas noticiosas e para as próprias redes sociais. Por fim, é apresentado o desenho completo do sistema de ETL para o cenário escolhido, utilizando os padrões desenvolvidos neste trabalho, avaliando e criticando a sua utilização

    Quality of (Digital) Services in e-Government

    Internet growth in the nineties supported government ambition to provide better services to citizens through the development of Information and Communication Technologies based solutions. Thanks to the Lisbon conference, which in 2000 covered and investigated this topic, e-government has been recognized as one of the major priorities in Public Administration innovation process. As a matter of\ud fact in the last 10 years the number of services provided to citizens through Information and Communication Technologies has increased rapidly. Nevertheless the increasing rate, the access and usage of digital services do not follow the same trend. Nowadays Public Administrations deliver many electronic services which\ud are seldom used by citizens. Different reasons contribute to the highlighted situation.\ud The main assumption of the thesis is that quality of e-government digital services strongly affects real access to services by citizens. According to the complexity of quality in e-government, one of the main challenges was to define a suitable quality model. To reach such aim, domain-dependent characteristics on the services delivery have been investigated. The defined model refers to citizen,\ud technology and service related quality characteristics. Correspondingly a suitable way to represent, assess, and continuously improve services quality according to\ud such domain requirements has been introduced.\ud Concerning the service related quality aspects a methodology and a tool permitting to formally and automatically assess the quality of a designed service with\ud respect to the quality model has been defined. Starting from an user friendly notation, both for service and quality requirements, the proposed methodology has\ud been implemented as an user friendly tool supported by a mapping from user friendly notations to formal language. The tool allows to verify formally via model checking, if the given service satisfies one by one the quality requirements addressed by the quality model.\ud Additionally in some case an unique view on e-government service quality is quite useful. A mathematical model provides a single value for quality starting from the assessment of all the requirements defined in the quality model. It relies on the following activities: homogeneity, interaction and grouping.\ud A set of experiments has been performed in order to validate the goodness of the work. Services already implemented in a local Public Administration has\ud been considered. Literature review and domain experts knowledge were the main drivers of this work. It proofs the goodness of the quality model, the application of formal techniques in the complex field of study such as e-government and the quality aggregation via the mathematical model.\ud This thesis introduces advance research in e-government by providing the contributions that quality oriented service delivery in Public Administration promotes services used by the citizens. Further applications of the proposed approaches could be investigated in the areas of practical benchmarking and Service Level Agreement specification

    Similarity of business process models : metrics and evaluation

    It is common for large and complex organizations to maintain repositories of business process models in order to document and to continuously improve their operations. Given such a repository, this paper deals with the problem of retrieving those process models in the repository that most closely resemble a given process model or fragment thereof. The paper presents three similarity metrics that can be used to answer such queries: (i) label matching similarity that compares the labels attached to process model elements; (ii) structural similarity that compares element labels as well as the topology of process models; and (iii) behavioral similarity that compares element labels as well as causal relations captured in the process model. These similarity metrics are experimentally evaluated in terms of precision and recall, and in terms of correlation of the metrics with respect to human judgement. The experimental results show that all three metrics yield comparable results, with structural similarity slightly outperforming the other two metrics. Also, all three metrics outperform traditional search engines when it comes to searching through a repository for similar business process models

    An Integrated Formal Task Specification Method for Smart Environments

    This thesis is concerned with the development of interactive systems for smart environments. In such scenario different interaction paradigms need to be supported and according methods and development strategies need to be applied to comprise not only explicit interaction (e.g., pressing a button to adjust the light) but also implicit interactions (e.g., walking to the speaker’s desk to give a talk) to assist the user appropriately. A task-based modeling approach is introduced allowing basing the implementing of different interaction paradigms on the same artifact

    Process Mining Handbook

    This is an open access book. This book comprises all the single courses given as part of the First Summer School on Process Mining, PMSS 2022, which was held in Aachen, Germany, during July 4-8, 2022. This volume contains 17 chapters organized into the following topical sections: Introduction; process discovery; conformance checking; data preprocessing; process enhancement and monitoring; assorted process mining topics; industrial perspective and applications; and closing

    Business Process Quality Management

    During the past 25 years, research in the field of business process management as well as the practical adoption of corresponding methods and tools have made substantial progress. In particular, this development was driven by the insight that well-managed business processes enable organizations to better serve their stakeholders, save costs and, ultimately, realize competitive advantage. It is therefore not surprising that improving business processes ranks high on the list of priorities of organizations. In practice, this challenge is currently being addressed through approaches such as benchmarking, industry-specific best practice reference models or process reengineering heuristics. However, no systematic and generic proposition towards managing business process quality has achieved broad acceptance yet. To address this gap, this thesis contributes to the field of business process quality management with the results lined out in the following. First, it defines a concise notion of business process quality based on organizational targets, and applies it to a sample real-world case. This definition is not specific to any particular application field, and thus constitutes a vital first step towards systematic and generic business process quality management. On that basis, an approach is developed to model business objectives in the sense of the requirements that shall be fulfilled by the results of a business process. In turn, this approach enables appraising if a business process achieves its business objective as one of the core criteria relevant to business process quality. Further, this thesis proposes extensions to common business process meta-models which enable quality-aware business process modeling, and demonstrates how fundamental quality characteristics can be derived from corresponding models. At this stage, the results achieved have enabled an advanced understanding of business process quality. By means of these insights, a model of business process quality attributes with corresponding quality criteria is developed. This model complements and exceeds preceding approaches since, for the first time, it systematically derives relevant quality attributes from a business process management perspective instead of adopting these from related fields. It enables appraising business process quality independently of a particular field of application, and deriving recommendations to improve the processes assessed. To enable practical adoption of the concepts developed, the integration of procedures and functionality relevant to quality in business process management lifecycles and system landscapes is discussed next. To establish the contribution of this thesis beyond the previous state of the art, the proposed quality model is then compared to existing business process reengineering practices as well as propositions in the area of business process quality. Further, quality attributes are employed to improve a substantial real-world business process. This experience report demonstrates how quality management practices can be applied even if quality-aware system landscapes are not in place yet. It thus contributes to bridging the gap between the research results proposed in this thesis and the conditions present in practice today. Finally, remaining limitations with regard to the research objectives pursued are discussed, and challenges for future research are lined out. Addressing the latter will enable further leveraging the potentials of business process quality management

    Service substitution : a behavioral approach based on Petri Nets

    Service-Oriented Computing is an emerging computing paradigm that supports the modular design of (software) systems. Complex systems are designed by composing less complex systems, called services. Such a (complex) system is a distributed application often involving several cooperating enterprises. As a system usually changes over time, individual services will be substituted by other services. Substituting one service by another one should not affect the correctness of the overall system. Assuring correctness becomes particularly challenging, as the services rely on each other, and each of the involved enterprises only oversees a part of the overall system. In addition, services communicate asynchronously which makes the analysis even more difficult. For this reason, formal methods to support service substitution are indispensable. In this thesis, we study service substitution at the level of service models. Thereby we restrict ourselves to service behavior. As a formalism to model service behavior, we use Petri nets. The first contribution of this thesis is the definition of several substitutability criteria that are suitable in the context of Service-Oriented Computing. Substituting a service S by a service S0 should preserve some behavioral properties of the overall system. For each set of behavioral properties and a given service S, there exists a set of behaviorally compatible services for S. A substitutability criterion defines which of these behaviorally compatible services of S have to be preserved by S0. We relate our substitutability criteria to preorders and equivalences known from process theory. The second contribution of this thesis is to present, for each substitutability criterion, a procedure to decide whether a service S0 can substitute a service S. The decision requires the comparison of the in general infinite sets of behaviorally compatible services for the services S and S0. Hence, we extend existing work on an abstract representation of all behaviorally compatible services for a given service. For each notion of behavioral compatibility, we present an algorithmic solution to represent all behaviorally compatible services. Based on these representations, we can decide substitutability of a service S by a service S0. The third contribution of this thesis is a method to support the design of a service S0 that can substitute a service S according to a substitutability criterion. Our approach is to derive a service S0 from the service S by stepwise transformation. To this end, we present several transformation rules. Finally, we formalize and we extend the equivalence notion for services specified in the language WS-BPEL. That way, we demonstrate the applicability of our work

    Bigraphical Languages and their Simulation

    CACIC 2015 : XXI Congreso Argentino de Ciencias de la Computación. Libro de actas

    Actas del XXI Congreso Argentino de Ciencias de la Computación (CACIC 2015), realizado en Sede UNNOBA Junín, del 5 al 9 de octubre de 2015.Red de Universidades con Carreras en Informática (RedUNCI