    Distributed Model-to-Model Transformation with ATL on MapReduce

    International audienceEfficient processing of very large models is a key requirement for the adoption of Model-Driven Engineering (MDE) in some industrial contexts. One of the central operations in MDE is rule-based model transformation (MT). It is used to specify manipulation operations over structured data coming in the form of model graphs. However, being based on com-putationally expensive operations like subgraph isomorphism, MT tools are facing issues on both memory occupancy and execution time while dealing with the increasing model size and complexity. One way to overcome these issues is to exploit the wide availability of distributed clusters in the Cloud for the distributed execution of MT. In this paper, we propose an approach to automatically distribute the execution of model transformations written in a popular MT language, ATL, on top of a well-known distributed programming model, MapReduce. We show how the execution semantics of ATL can be aligned with the MapReduce computation model. We describe the extensions to the ATL transformation engine to enable distribution, and we experimentally demonstrate the scalability of this solution in a reverse-engineering scenario

    DC4MT : uma abordagem orientada a dados para transformação de modelos

    Orientador: Marcos Didonet Del FabroTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 25/08/2020Inclui referências: p. 95-106Área de concentração: Ciência da ComputaçãoResumo: Transformações de Modelos são operações que recebem um conjunto de modelos como entrada e produzem um conjunto de modelos como saída, seguindo uma especificação. Há uma coleção diversificada de abordagens e ferramentas utilizadas para a especificação de diferentes tipos de transformações de modelos. A maioria dessas abordagens adota como estratégia a execução local e sequencial. No entanto, essas abordagens não estão totalmente aptas para processar modelos com grandes quantidades de elementos. VLMs (Very Large Models) são modelos que possuem milhões de elementos. Esses modelos estão presentes em domínios de aplicações como na indústria automotiva, modernização de sistemas legados, internet das coisas, redes sociais, entre outros domínios. Essas abordagens possuem lacunas para suportar o processamento desses VLMs. Por exemplo, para possibilitar a execução das transformações de modelos, considerando a escala do problema ou para melhoria de desempenho. Nesta tese é proposta a Dc4MT, uma abordagem para suportar transformações de VLMs com a aplicação e adaptação de técnicas relacionadas à distribuição de dados. A Dc4MT é uma abordagem Orientada a Dados (Dc - Data-centric) para ser aplicada no domínio da Engenharia Dirigida por Modelos (MDE - Model Driven Engineering). A abordagem é especificada, utilizando um framework de processamento distribuído, e define um conjunto de operações para a fragmentação, extração e transformação de modelos. A fragmentação é uma operação que divide os modelos de entrada (em formatos XMI ou JSON) em fragmentos, de modo que esses fragmentos possam ser distribuídos e processados de maneira paralela/distribuída. A extração é uma operação que processa os fragmentos do modelo de entrada e os traduz em um grafo acíclico, atribuindo um novo domínio de modelagem a esses fragmentos. A transformação de modelos na abordagem Dc4MT é uma operação que transforma modelos de entrada em modelos de saída (M2M) a partir do resultado da extração. As execuções de transformação podem ser em modo paralelo ou distribuído, com ou sem a intervenção no método de particionamento do framework disponível para melhorar o desempenho. Um conjunto de modelos de entrada (datasets) e os ambientes local (transformações paralelas) e distribuído (transformações distribuídas) são utilizados nos experimentos para validar a abordagem Dc4MT, sob os aspectos de factibilidade, desempenho e de escalabilidade. Os resultados desses experimentos, mostram que as operações de fragmentação e extração de modelos favorecem a transformação escalável de VLMs, reconstruindo a estrutura dos fragmentos em um grafo. A operação de extração é executada em modo paralelo/distribuído. Além disso, os aspectos como a imutabilidade, lazy-evaluate e o paralelismo implícito presentes na Dc4MT, permitem o processamento paralelo/distribuído de regras de transformação em uma plataforma escalável. Palavras-chave: Abordagem Orientada a Dados. Engenharia Dirigida por Modelos. Transformação Paralela de Modelos. Transformação Distribuída de Modelos.Abstract: Model Transformations are operations that receive a set of source models as input and produce a set of target models as output, following a specification. There is a variety of approaches and tools used for the specification of different types of model transformation. Most of these approaches adopt for model transformation the local and sequential execution strategy. However, these approaches not fully adapted for processing models with large amounts of elements. VLMs (Very Large Models) are models with millions of elements. These models are present in application domains such as the automotive industry, modernization of legacy systems, internet of things, social networks, among others. These approaches have gaps to support the processing of these increasingly larger models. For example, to enable model transformations, considering the scale of the problem or to improve performance. In this thesis, the Dc4MT is proposed such as an approach to support transformation of VLMs, applying and adapting distribution techniques of data. The Dc4MT is a Data-centric (Dc) approach for applying in Model Driven Engineering (MDE). The approach will be specified using a distributed processing framework, and defines a set of operations for fragmentation, extraction, and transformation of models. The fragmentation is an operation that splits the input models (in the XMI or JSON formats) in a way that the fragments can be processed in parallel/distributed. The extraction is an operation that processes the fragments of the input model in parallel and translates them to an acyclic graph, assigning a new modeling domain to these fragments. The model transformation in Dc4MT is an operation that transforms input models in output models (M2M) from the results of the extraction. The transformation executions can be parallel or distributed with ou without the intervention in the framework partitioning method to improve the performance. A set of input models (datasets) and the local (parallel transformations) and distributed (distributed transformations) environments are used in the experiments to validate the Dc4MT approach, in terms of feasibility, performance, and scalability. The results of the experiments show that the model fragmentation and extraction operations favor the scalable transformation of models, reconstructing the structure of the fragments in a graph. The extraction operation is executed on parallel/distributed way. Moreover, aspects such as immutability, lazy-evaluation, and implicit parallelism present in Dc4MT, allowing the parallel/distributed processing of transformation rules on a scalable platform. Keywords: Data-centric Approach. Model Driven Engineering. Parallel Model Transformation. Distributed Model Transformation

    Enabling Model-Driven Live Analytics For Cyber-Physical Systems: The Case of Smart Grids

    Advances in software, embedded computing, sensors, and networking technologies will lead to a new generation of smart cyber-physical systems that will far exceed the capabilities of today’s embedded systems. They will be entrusted with increasingly complex tasks like controlling electric grids or autonomously driving cars. These systems have the potential to lay the foundations for tomorrow’s critical infrastructures, to form the basis of emerging and future smart services, and to improve the quality of our everyday lives in many areas. In order to solve their tasks, they have to continuously monitor and collect data from physical processes, analyse this data, and make decisions based on it. Making smart decisions requires a deep understanding of the environment, internal state, and the impacts of actions. Such deep understanding relies on efficient data models to organise the sensed data and on advanced analytics. Considering that cyber-physical systems are controlling physical processes, decisions need to be taken very fast. This makes it necessary to analyse data in live, as opposed to conventional batch analytics. However, the complex nature combined with the massive amount of data generated by such systems impose fundamental challenges. While data in the context of cyber-physical systems has some similar characteristics as big data, it holds a particular complexity. This complexity results from the complicated physical phenomena described by this data, which makes it difficult to extract a model able to explain such data and its various multi-layered relationships. Existing solutions fail to provide sustainable mechanisms to analyse such data in live. This dissertation presents a novel approach, named model-driven live analytics. The main contribution of this thesis is a multi-dimensional graph data model that brings raw data, domain knowledge, and machine learning together in a single model, which can drive live analytic processes. This model is continuously updated with the sensed data and can be leveraged by live analytic processes to support decision-making of cyber-physical systems. The presented approach has been developed in collaboration with an industrial partner and, in form of a prototype, applied to the domain of smart grids. The addressed challenges are derived from this collaboration as a response to shortcomings in the current state of the art. More specifically, this dissertation provides solutions for the following challenges: First, data handled by cyber-physical systems is usually dynamic—data in motion as opposed to traditional data at rest—and changes frequently and at different paces. Analysing such data is challenging since data models usually can only represent a snapshot of a system at one specific point in time. A common approach consists in a discretisation, which regularly samples and stores such snapshots at specific timestamps to keep track of the history. Continuously changing data is then represented as a finite sequence of such snapshots. Such data representations would be very inefficient to analyse, since it would require to mine the snapshots, extract a relevant dataset, and finally analyse it. For this problem, this thesis presents a temporal graph data model and storage system, which consider time as a first-class property. A time-relative navigation concept enables to analyse frequently changing data very efficiently. Secondly, making sustainable decisions requires to anticipate what impacts certain actions would have. Considering complex cyber-physical systems, it can come to situations where hundreds or thousands of such hypothetical actions must be explored before a solid decision can be made. Every action leads to an independent alternative from where a set of other actions can be applied and so forth. Finding the sequence of actions that leads to the desired alternative, requires to efficiently create, represent, and analyse many different alternatives. Given that every alternative has its own history, this creates a very high combinatorial complexity of alternatives and histories, which is hard to analyse. To tackle this problem, this dissertation introduces a multi-dimensional graph data model (as an extension of the temporal graph data model) that enables to efficiently represent, store, and analyse many different alternatives in live. Thirdly, complex cyber-physical systems are often distributed, but to fulfil their tasks these systems typically need to share context information between computational entities. This requires analytic algorithms to reason over distributed data, which is a complex task since it relies on the aggregation and processing of various distributed and constantly changing data. To address this challenge, this dissertation proposes an approach to transparently distribute the presented multi-dimensional graph data model in a peer-to-peer manner and defines a stream processing concept to efficiently handle frequent changes. Fourthly, to meet future needs, cyber-physical systems need to become increasingly intelligent. To make smart decisions, these systems have to continuously refine behavioural models that are known at design time, with what can only be learned from live data. Machine learning algorithms can help to solve this unknown behaviour by extracting commonalities over massive datasets. Nevertheless, searching a coarse-grained common behaviour model can be very inaccurate for cyber-physical systems, which are composed of completely different entities with very different behaviour. For these systems, fine-grained learning can be significantly more accurate. However, modelling, structuring, and synchronising many fine-grained learning units is challenging. To tackle this, this thesis presents an approach to define reusable, chainable, and independently computable fine-grained learning units, which can be modelled together with and on the same level as domain data. This allows to weave machine learning directly into the presented multi-dimensional graph data model. In summary, this thesis provides an efficient multi-dimensional graph data model to enable live analytics of complex, frequently changing, and distributed data of cyber-physical systems. This model can significantly improve data analytics for such systems and empower cyber-physical systems to make smart decisions in live. The presented solutions combine and extend methods from model-driven engineering, [email protected], data analytics, database systems, and machine learning

    Parallel and Distributed Execution of Model Management Programs

    The engineering process of complex systems involves many stakeholders and development artefacts. Model-Driven Engineering (MDE) is an approach to development which aims to help curtail and better manage this complexity by raising the level of abstraction. In MDE, models are first-class artefacts in the development process. Such models can be used to describe artefacts of arbitrary complexity at various levels of abstraction according to the requirements of their prospective stakeholders. These models come in various sizes and formats and can be thought of more broadly as structured data. Since models are the primary artefacts in MDE, and the goal is to enhance the efficiency of the development process, powerful tools are required to work with such models at an appropriate level of abstraction. Model management tasks – such as querying, validation, comparison, transformation and text generation – are often performed using dedicated languages, with declarative constructs used to improve expressiveness. Despite their semantically constrained nature, the execution engines of these languages rarely capitalize on the optimization opportunities afforded to them. Therefore, working with very large models often leads to poor performance when using MDE tools compared to general-purpose programming languages, which has a detrimental effect on productivity. Given the stagnant single-threaded performance of modern CPUs along with the ubiquity of distributed computing, parallelization of these model management program is a necessity to address some of the scalability concerns surrounding MDE. This thesis demonstrates efficient parallel and distributed execution algorithms for model validation, querying and text generation and evaluates their effectiveness. By fully utilizing the CPUs on 26 hexa-core systems, we were able to improve performance of a complex model validation language by 122x compared to its existing sequential implementation. Up to 11x speedup was achieved with 16 cores for model query and model-to-text transformation tasks