20 research outputs found

    Towards Scalable Model Indexing

    Get PDF
    Model-Driven Engineering (MDE) is a software engineering discipline promoting models as first-class artefacts of the software lifecycle. It offers increased productivity, consistency, maintainability and reuse by using these models to generate other necessary products, such as program code or documentation. As such, persisting, accessing, manipulating, transforming and querying such models needs to be efficient, for maintaining the various benefits MDE can offer. Scalability is often identified to be a bottleneck for potential adapters of MDE, as large-scale models need to be handled seamlessly, without causing disproportionate losses in performance or limiting the ability of multiple stakeholders to work simultaneously on the same collection of large models. This work identifies the primary scalability concerns of MDE and tackles those related to the querying of large collections of models in collaborative modeling environments; it presents a novel approach whereby information contained in such models can be efficiently retrieved, orthogonally to the formats in which models are persisted. This approach, coined model indexing leverages the use of file-based version control systems for storing models, while allowing developers to efficiently query models without needing to retrieve them from remote locations or load them into memory beforehand. Empirical evidence gathered during the course of the research project is then detailed, which provides confidence that such novel tools and technologies can mitigate these specific scalability concerns; the results obtained are promising, offering large improvements in the execution time of certain classes of queries, which can be further optimized by use of caching and database indexing techniques. The architecture of the approach is also empirically validated, by virtue of integration with various state-of-the-art modeling and model management tools, and so is the correctness of the various algorithms used in this approach

    Crossflow : A framework for distributed mining of software repositories

    Get PDF
    Large-scale software repository mining typically requires substantial storage and computational resources, and often involves a large number of calls to (rate-limited) APIs such as those of GitHub and StackOverflow. This creates a growing need for distributed execution of repository mining programs to which remote collaborators can contribute computational and storage resources, as well as API quotas (ideally without sharing API access tokens or credentials). In this paper we introduce Crossflow, a novel framework for building distributed repository mining programs. We demonstrate how Crossflow can delegate mining jobs to remote workers and cache their results, and how workers can implement advanced behaviour such as load balancing and rejecting jobs they cannot perform (e.g. due to lack of space, credentials for a specific API)

    Stress-Testing Remote Model Querying APIs for Relational and Graph-Based Stores

    Get PDF
    Recent research in scalable model-driven engineering now allows very large models to be stored and queried. Due to their size, rather than transferring such models over the network in their entirety, it is typically more efficient to access them remotely using networked services (e.g. model repositories, model indexes). Little attention has been paid so far to the nature of these services, and whether they remain responsive with an increasing number of concurrent clients. This paper extends a previous empirical study on the impact of certain key decisions on the scalability of concurrent model queries on two domains, using an Eclipse Connected Data Objects model repository, four configurations of the Hawk model index and a Neo4j-based configuration of the NeoEMF model store. The study evaluates the impact of the network protocol, the API design, the caching layer, the query language and the type of database, and analyses the reasons for their varying levels of performance. The design of the API was shown to make a bigger difference compared to the network protocol (HTTP/TCP) used. Where available, the query-specific indexed and derived attributes in Hawk outperformed the comprehensive generic caching in CDO. Finally, the results illustrate the still ongoing evolution of graph databases: two tools using different versions of the same backend had very different performance, with one slower than CDO and the other faster than it

    Identification and Optimisation of Type-Level Model Queries

    Get PDF
    The main appeal of task-specific model management languages such as ATL, OCL, Epsilon etc. is that they offer tailored syntaxes for the tasks they target, and provide concise first-class support for recurring activities in these tasks. On the flip side, task-specific model management languages are typically interpreted and are therefore significantly slower than general purpose programming languages (which can be also used to query and modify models) such as Java. While this is not an issue for smaller models, as models grow in size, naive execution of interpreted model management programs against them can become a scalability bottleneck. In this paper, we demonstrate an architecture for optimisation of model management programs written in languages of the Epsilon platform using static analysis and program rewriting techniques. The proposed architecture facilitates optimisation of queries that target models of heterogeneous technologies in an orthogonal way. We demonstrate how the proposed architecture is used to identify and optimise typelevel queries against EMF-based models in the context of EOL programs and EVL validation constraints. We also demonstrate the performance benefits that can be delivered by this form of optimisation through a series of experiments on EMF-based models. Our experiments have shown performance improvements of up to 99.56%

    Selective Traceability for Rule-Based Model-to-Model Transformations

    Get PDF
    Model-to-model (M2M) transformation is a key ingredient in a typical Model-Driven Engineering workflow and there are several tailored high-level interpreted languages for capturing and executing such transformations. While these languages enable the specification of concise transformations through task-specific constructs (rules/mappings, bindings), their use can pose scalability challenges when it comes to very large models. In this paper, we present an architecture for optimising the execution of model-to-model transformations written in such a language, by leveraging static analysis and automated program rewriting techniques. We demonstrate how static analysis and dependency information between rules can be used to reduce the size of the transformation trace and to optimise certain classes of transformations. Finally, we detail the performance benefits that can be delivered by this form of optimisation, through a series of benchmarks performed with an existing transformation language (Epsilon Transformation Language - ETL) and EMF-based models. Our experiments have shown considerable performance improvements compared to the existing ETL execution engine, without sacrificing any features of the language

    Efficiently Querying Large-Scale Heterogeneous Models

    Get PDF
    With the increase in the complexity of software systems, the size and the complexity of underlying models also increases proportionally. In a low-code system, models can be stored in different backend technologies and can be represented in various formats. Tailored high-level query languages are used to query such heterogeneous models, but typically this has a significant impact on performance. Our main aim is to propose optimization strategies that can help to query large models in various formats efficiently. In this paper, we present an approach based on compile-time static analysis and specific query optimizers/translators to improve the performance of complex queries over large-scale heterogeneous models. The proposed approach aims to bring efficiency in terms of query execution time and memory footprint, when compared to the naive query execution for low-code platforms

    Restmule : Enabling resilient clients for remote APIs

    Get PDF
    Mining data from remote repositories, such as GitHub and StackExchange, involves the execution of requests that can easily reach the limitations imposed by the respective APIs to shield their services from overload and abuse. Therefore, data mining clients are left alone to deal with such protective service policies which usually involves an extensive amount of manual implementation effort. In this work we present RestMule, a framework for handling various service policies, such as limited number of requests within a period of time and multi-page responses, by generating resilient clients that are able to handle request rate limits, network failures, response caching, and paging in a graceful and transparent manner. As a result, RestMule clients generated from OpenAPI specifications (i.e. standardized REST API descriptors), are suitable for intensive data-fetching scenarios. We evaluate our framework by reproducing an existing repository mining use case and comparing the results produced by employing a popular hand-written client and a RestMule client

    Towards Scalable Validation of Low-Code System Models: Mapping EVL to VIATRA Patterns

    Get PDF
    Adoption of low-code engineering in complex enterprise applications also increases the size of the underlying models. In such cases, the increasing complexity of the applications and the growing size of the underlying artefacts, various scalability challenges might arise for low-code platforms. Task-specific programming languages, such as OCL and EOL, are tailored to manage the underlying models. Existing model management languages have significant performance impact when it comes to complex queries operating over large-scale models reaching magnitudes of millions of elements in size. We propose an approach for automatically mapping expressions in Epsilon validation programs to VIATRA graph patterns to make the validation of large-scale low-code system models scalable by leveraging the incremental execution engine of VIATRA. Finally, we evaluate the performance of the proposed approach on large Java models of the Eclipse source code. Our results show performance speed-up up to 1481x compared to the sequential execution in Epsilon
    corecore