189 research outputs found

    ATOM: model-driven autoscaling for microservices

    Get PDF
    Microservices based architectures are increasinglywidespread in the cloud software industry. Still, there is ashortage of auto-scaling methods designed to leverage the uniquefeatures of these architectures, such as the ability to indepen-dently scale a subset of microservices, as well as the ease ofmonitoring their state and reciprocal calls.We propose to address this shortage with ATOM, a model-driven autoscaling controller for microservices. ATOM instanti-ates and solves at run-time a layered queueing network model ofthe application. Computational optimization is used to dynami-cally control the number of replicas for each microservice and itsassociated container CPU share, overall achieving a fine-grainedcontrol of the application capacity at run-time.Experimental results indicate that for heavy workloads ATOMoffers around 30%-37% higher throughput than baseline model-agnostic controllers based on simple static rules. We also find thatmodel-driven reasoning reduces the number of actions needed toscale the system as it reduces the number of bottleneck shiftsthat we observe with model-agnostic controllers

    Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY

    Get PDF
    Convergence between high-performance computing (HPC) and big data analytics (BDA) is currently an established research area that has spawned new opportunities for unifying the platform layer and data abstractions in these ecosystems. This work presents an architectural model that enables the interoperability of established BDA and HPC execution models, reflecting the key design features that interest both the HPC and BDA communities, and including an abstract data collection and operational model that generates a unified interface for hybrid applications. This architecture can be implemented in different ways depending on the process- and data-centric platforms of choice and the mechanisms put in place to effectively meet the requirements of the architecture. The Spark-DIY platform is introduced in the paper as a prototype implementation of the architecture proposed. It preserves the interfaces and execution environment of the popular BDA platform Apache Spark, making it compatible with any Spark-based application and tool, while providing efficient communication and kernel execution via DIY, a powerful communication pattern library built on top of MPI. Later, Spark-DIY is analyzed in terms of performance by building a representative use case from the hydrogeology domain, EnKF-HGS. This application is a clear example of how current HPC simulations are evolving toward hybrid HPC-BDA applications, integrating HPC simulations within a BDA environment.This work was supported in part by the Spanish Ministry of Economy, Industry and Competitiveness under Grant TIN2016-79637-P(toward Unification of HPC and Big Data Paradigms), in part by the Spanish Ministry of Education under Grant FPU15/00422 TrainingProgram for Academic and Teaching Staff Grant, in part by the Advanced Scientific Computing Research, Office of Science, U.S.Department of Energy, under Contract DE-AC02-06CH11357, and in part by the DOE with under Agreement DE-DC000122495,Program Manager Laura Biven

    Self-management for large-scale distributed systems

    Get PDF
    Autonomic computing aims at making computing systems self-managing by using autonomic managers in order to reduce obstacles caused by management complexity. This thesis presents results of research on self-management for large-scale distributed systems. This research was motivated by the increasing complexity of computing systems and their management. In the first part, we present our platform, called Niche, for programming self-managing component-based distributed applications. In our work on Niche, we have faced and addressed the following four challenges in achieving self-management in a dynamic environment characterized by volatile resources and high churn: resource discovery, robust and efficient sensing and actuation, management bottleneck, and scale. We present results of our research on addressing the above challenges. Niche implements the autonomic computing architecture, proposed by IBM, in a fully decentralized way. Niche supports a network-transparent view of the system architecture simplifying the design of distributed self-management. Niche provides a concise and expressive API for self-management. The implementation of the platform relies on the scalability and robustness of structured overlay networks. We proceed by presenting a methodology for designing the management part of a distributed self-managing application. We define design steps that include partitioning of management functions and orchestration of multiple autonomic managers. In the second part, we discuss robustness of management and data consistency, which are necessary in a distributed system. Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of Robust Management Elements, which are able to heal themselves under continuous churn. Our approach is based on replicating a management element using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. For data consistency, we propose a majority-based distributed key-value store supporting multiple consistency levels that is based on a peer-to-peer network. The store enables the tradeoff between high availability and data consistency. Using majority allows avoiding potential drawbacks of a master-based consistency control, namely, a single-point of failure and a potential performance bottleneck. In the third part, we investigate self-management for Cloud-based storage systems with the focus on elasticity control using elements of control theory and machine learning. We have conducted research on a number of different designs of an elasticity controller, including a State-Space feedback controller and a controller that combines feedback and feedforward control. We describe our experience in designing an elasticity controller for a Cloud-based key-value store using state-space model that enables to trade-off performance for cost. We describe the steps in designing an elasticity controller. We continue by presenting the design and evaluation of ElastMan, an elasticity controller for Cloud-based elastic key-value stores that combines feedforward and feedback control

    Secure Virtual Machine Migration in Cloud Data Centers

    Get PDF
    While elasticity represents a valuable asset in cloud computing environments, it may bring critical security issues. In the cloud, virtual machines (VMs) are dynamically and frequently migrated across data centers from one host to another. This frequent modification in the topology requires constant reconfiguration of security mechanisms particularly as we consider, in terms of firewalls, intrusion detection/prevention as well as IPsec policies. However, managing manually complex security rules is time-consuming and error-prone. Furthermore, scale and complexity of data centers are continually increasing, which makes it difficult to rely on the cloud provider administrators to update and validate the security mechanisms. In this thesis, we propose a security verification framework with a particular interest in the abovementioned security mechanisms to address the issue of security policy preservation in a highly dynamic context of cloud computing. This framework enables us to verify that the global security policy after the migration is consistently preserved with respect to the initial one. Thus, we propose a systematic procedure to verify security compliance of firewall policies, intrusion detection/prevention, and IPsec configurations after VM migration. First, we develop a process algebra called cloud calculus, which allows specifying network topology and security configurations. It also enables specifying the virtual machines migration along with their security policies. Then, the distributed firewall configurations in the involved data centers are defined according to the network topology expressed using cloud calculus. We show how our verification problem can be reduced to a constraint satisfaction problem that once solved allows reasoning about firewall traffic filtering preservation. Similarly, we present our approach to the verification of intrusion detection monitoring preservation as well as IPsec traffic protection preservation using constraint satisfaction problem. We derive a set of constraints that compare security configurations before and after migration. The obtained constraints are formulated as constraint satisfaction problems and then submitted to a SAT solver, namely Sugar, in order to verify security preservation properties and to pinpoint the configuration errors, if any, before the actual migration of the security context and the virtual machine. In addition, we present case studies for the given security mechanisms in order to show the applicability and usefulness of our framework, and demonstrate the scalability of our approach

    A Survey on the Evolution of Stream Processing Systems

    Full text link
    Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. This survey provides a comprehensive overview of fundamental aspects of stream processing systems and their evolution in the functional areas of out-of-order data management, state management, fault tolerance, high availability, load management, elasticity, and reconfiguration. We review noteworthy past research findings, outline the similarities and differences between early ('00-'10) and modern ('11-'18) streaming systems, and discuss recent trends and open problems.Comment: 34 pages, 15 figures, 5 table

    Benchmarking Resource Management For Serverless Computing

    Get PDF
    Serverless computing is a way in which users or companies can build and run applications and services without having to worry about acquiring or maintaining servers and their software stacks. This new technology is a significant innovation because server management incurs a large amount of overhead and can be very complex and difficult to work with. The serverless model also allows for fine-grain billing and demand resource allocation, allowing for better scalability and cost reduction. Academic researchers and industry practitioners agree that serverless computing is an amazing innovation, but it introduces new challenges. The algorithms and protocols currently deployed for virtual server optimization in traditional cloud computing environments are not able to simultaneously achieve low latency, high throughput, and fine-grained scalability while maintaining low cost for the cloud service providers. Furthermore, in the serverless computing paradigm, computation units (i.e., functions) are stateless. Applications, specified through function workflows, do not have control over specific states or their scheduling and placement, which can sometimes lead to significant latency increases and some opportunities to optimize the usage of physical servers. Overcoming these challenges highlights some of the tension between giving programmers control and allowing providers to optimize automatically. This research identifies some of the challenges in exploring new resource management approaches for serverless computing (more specifically, FaaS) as well as attempts to deal with one of these challenges. Our experimental approach includes the deployment of an open-source serverless function framework, OpenFaaS. We focus on faasd, a more lightweight variant of OpenFaaS. Faasd was chosen over the normal OpenFaaS due to not having the higher complexity and cost of Kubernetes. As researchers in academia and industry develop new approaches for optimizing the usage of CPU, memory, and I/O for serverless platforms, the community needs to establish benchmark workloads for evaluating proposed methods. Several research groups have proposed benchmark suites in the last two years, and many others are still in development. A commonality among these benchmark tools is their complexity; for junior researchers without experience in the deployment of distributed systems, a lot of time and effort goes into deploying the benchmarking, hindering their progress in evaluating newly proposed ideas. In our work, we demonstrate that even well-regarded proposals still introduce deficiencies and deployment challenges, proposing that a simplified, constrained benchmark can be useful in preparing execution environments for the experimental evaluation with serverless services

    Scalability Benchmarking of Cloud-Native Applications Applied to Event-Driven Microservices

    Get PDF
    Cloud-native applications constitute a recent trend for designing large-scale software systems. This thesis introduces the Theodolite benchmarking method, allowing researchers and practitioners to conduct empirical scalability evaluations of cloud-native applications, their frameworks, configurations, and deployments. The benchmarking method is applied to event-driven microservices, a specific type of cloud-native applications that employ distributed stream processing frameworks to scale with massive data volumes. Extensive experimental evaluations benchmark and compare the scalability of various stream processing frameworks under different configurations and deployments, including different public and private cloud environments. These experiments show that the presented benchmarking method provides statistically sound results in an adequate amount of time. In addition, three case studies demonstrate that the Theodolite benchmarking method can be applied to a wide range of applications beyond stream processing

    Processamento de eventos complexos como serviço em ambientes multi-nuvem

    Get PDF
    Orientadores: Luiz Fernando Bittencourt, Miriam Akemi Manabe CapretzTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O surgimento das tecnologias de dispositivos mĂłveis e da Internet das Coisas, combinada com avanços das tecnologias Web, criou um novo mundo de Big Data em que o volume e a velocidade da geração de dados atingiu uma escala sem precedentes. Por ser uma tecnologia criada para processar fluxos contĂ­nuos de dados, o Processamento de Eventos Complexos (CEP, do inglĂȘs Complex Event Processing) tem sido frequentemente associado a Big Data e aplicado como uma ferramenta para obter informaçÔes em tempo real. Todavia, apesar desta onda de interesse, o mercado de CEP ainda Ă© dominado por soluçÔes proprietĂĄrias que requerem grandes investimentos para sua aquisição e nĂŁo proveem a flexibilidade que os usuĂĄrios necessitam. Como alternativa, algumas empresas adotam soluçÔes de baixo nĂ­vel que demandam intenso treinamento tĂ©cnico e possuem alto custo operacional. A fim de solucionar esses problemas, esta pesquisa propĂ”e a criação de um sistema de CEP que pode ser oferecido como serviço e usado atravĂ©s da Internet. Um sistema de CEP como Serviço (CEPaaS, do inglĂȘs CEP as a Service) oferece aos usuĂĄrios as funcionalidades de CEP aliadas Ă s vantagens do modelo de serviços, tais como redução do investimento inicial e baixo custo de manutenção. No entanto, a criação de tal serviço envolve inĂșmeros desafios que nĂŁo sĂŁo abordados no atual estado da arte de CEP. Em especial, esta pesquisa propĂ”e soluçÔes para trĂȘs problemas em aberto que existem neste contexto. Em primeiro lugar, para o problema de entender e reusar a enorme variedade de procedimentos para gerĂȘncia de sistemas CEP, esta pesquisa propĂ”e o formalismo Reescrita de Grafos com Atributos para GerĂȘncia de Processamento de Eventos Complexos (AGeCEP, do inglĂȘs Attributed Graph Rewriting for Complex Event Processing Management). Este formalismo inclui modelos para consultas CEP e transformaçÔes de consultas que sĂŁo independentes de tecnologia e linguagem. Em segundo lugar, para o problema de avaliar estratĂ©gias de gerĂȘncia e processamento de consultas CEP, esta pesquisa apresenta CEPSim, um simulador de sistemas CEP baseado em nuvem. Por fim, esta pesquisa tambĂ©m descreve um sistema CEPaaS fundamentado em ambientes multi-nuvem, sistemas de gerĂȘncia de contĂȘineres e um design multiusuĂĄrio baseado em AGeCEP. Para demonstrar sua viabilidade, o formalismo AGeCEP foi usado para projetar um gerente autĂŽnomo e um conjunto de polĂ­ticas de auto-gerenciamento para sistemas CEP. AlĂ©m disso, o simulador CEPSim foi minuciosamente avaliado atravĂ©s de experimentos que demonstram sua capacidade de simular sistemas CEP com acurĂĄcia e baixo custo adicional de processamento. Por fim, experimentos adicionais validaram o sistema CEPaaS e demonstraram que o objetivo de oferecer funcionalidades CEP como um serviço escalĂĄvel e tolerante a falhas foi atingido. Em conjunto, esses resultados confirmam que esta pesquisa avança significantemente o estado da arte e tambĂ©m oferece novas ferramentas e metodologias que podem ser aplicadas Ă  pesquisa em CEPAbstract: The rise of mobile technologies and the Internet of Things, combined with advances in Web technologies, have created a new Big Data world in which the volume and velocity of data generation have achieved an unprecedented scale. As a technology created to process continuous streams of data, Complex Event Processing (CEP) has been often related to Big Data and used as a tool to obtain real-time insights. However, despite this recent surge of interest, the CEP market is still dominated by solutions that are costly and inflexible or too low-level and hard to operate. To address these problems, this research proposes the creation of a CEP system that can be offered as a service and used over the Internet. Such a CEP as a Service (CEPaaS) system would give its users CEP functionalities associated with the advantages of the services model, such as no up-front investment and low maintenance cost. Nevertheless, creating such a service involves challenges that are not addressed by current CEP systems. This research proposes solutions for three open problems that exist in this context. First, to address the problem of understanding and reusing existing CEP management procedures, this research introduces the Attributed Graph Rewriting for Complex Event Processing Management (AGeCEP) formalism as a technology- and language-agnostic representation of queries and their reconfigurations. Second, to address the problem of evaluating CEP query management and processing strategies, this research introduces CEPSim, a simulator of cloud-based CEP systems. Finally, this research also introduces a CEPaaS system based on a multi-cloud architecture, container management systems, and an AGeCEP-based multi-tenant design. To demonstrate its feasibility, AGeCEP was used to design an autonomic manager and a selected set of self-management policies. Moreover, CEPSim was thoroughly evaluated by experiments that showed it can simulate existing systems with accuracy and low execution overhead. Finally, additional experiments validated the CEPaaS system and demonstrated it achieves the goal of offering CEP functionalities as a scalable and fault-tolerant service. In tandem, these results confirm this research significantly advances the CEP state of the art and provides novel tools and methodologies that can be applied to CEP researchDoutoradoCiĂȘncia da ComputaçãoDoutor em CiĂȘncia da Computação140920/2012-9CNP

    Reliable Actors with Retry Orchestration

    Full text link
    Enterprise cloud developers have to build applications that are resilient to failures and interruptions. We advocate for, formalize, implement, and evaluate a simple, albeit effective, fault-tolerant programming model for the cloud based on actors, reliable message delivery, and retry orchestration. Our model guarantees that (1) failed actor invocations are retried until success, (2) in a distributed chain of invocations only the last one may be retried, (3) pending synchronous invocations with a failed caller are automatically cancelled. These guarantees make it possible to productively develop fault-tolerant distributed applications ranging from classic problems of concurrency theory to complex enterprise applications. Built as a service mesh, our runtime system can interface application components written in any programming language and scale with the application. We measure overhead relative to reliable message queues. Using an application inspired by a typical enterprise scenario, we assess fault tolerance and the impact of fault recovery on application performance.Comment: 14 pages, 6 figure

    A comprehensive meta-analysis of cryptographic security mechanisms for cloud computing

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.The concept of cloud computing offers measurable computational or information resources as a service over the Internet. The major motivation behind the cloud setup is economic benefits, because it assures the reduction in expenditure for operational and infrastructural purposes. To transform it into a reality there are some impediments and hurdles which are required to be tackled, most profound of which are security, privacy and reliability issues. As the user data is revealed to the cloud, it departs the protection-sphere of the data owner. However, this brings partly new security and privacy concerns. This work focuses on these issues related to various cloud services and deployment models by spotlighting their major challenges. While the classical cryptography is an ancient discipline, modern cryptography, which has been mostly developed in the last few decades, is the subject of study which needs to be implemented so as to ensure strong security and privacy mechanisms in today’s real-world scenarios. The technological solutions, short and long term research goals of the cloud security will be described and addressed using various classical cryptographic mechanisms as well as modern ones. This work explores the new directions in cloud computing security, while highlighting the correct selection of these fundamental technologies from cryptographic point of view
    • 

    corecore