5 research outputs found

    Modeling Big Data Systems by Extending the Palladio Component Model

    Get PDF
    ABSTRACT The growing availability of big data has induced new storing and processing techniques implemented in big data systems such as Apache Hadoop or Apache Spark. With increased implementations of these systems in organizations, simultaneously, the requirements regarding performance qualities such as response time, throughput, and resource utilization increase to create added value. Guaranteeing these performance requirements as well as efficiently planning needed capacities in advance is an enormous challenge. Performance models such as the Palladio component model (PCM) allow for addressing such problems. Therefore, we propose a metamodel extension for PCM to be able to model typical characteristics of big data systems. The extension consists of two parts. First, the meta-model is extended to support parallel computing by forking an operation multiple times on a computer cluster as intended by the single instruction, multiple data (SIMD) architecture. Second, modeling of computer clusters is integrated into the meta-model so operations can be properly scheduled on contained computing nodes

    Modeling performances of concurrent big data applications

    No full text
    Summary Big Data applications are characterized by a non-negligible number of complex parallel transactions on a huge amount of data that continuously varies, generally increasing over time. Because of the amount of needed resources, the ideal runtime scenario for these applications is based on complex cloud computing and storage infrastructures, providing a scalable degree of parallelism together with isolation between different applications and resource abstraction. However, such additional abstraction degree also introduces significant complexity in performance modeling and decision making. Potential concurrency of many applications on the same cloud infrastructure has to be evaluated, and, simultaneously, scalability of applications over time has to be studied through proper modeling practices, in order to predict the system behavior as the usage patterns evolve and the load increases. For this purpose, in this paper, we propose an analytic modeling technique based on the use of Markovian Agents and Mean Field Analysis that allows the effective description of different concurrent Big Data applications on a same, multi-site cloud infrastructure, accounting for mutual interactions, in order to support the careful evaluation of several elements in terms of real costs/risks/benefits for correctly dimensioning and allocating the resources and verifying the existing service level agreements
    corecore