143 research outputs found

    Technical Report: A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters

    Get PDF
    To improve customer experience, datacenter operators offer support for simplifying application and resource management. For example, running workloads of workflows on behalf of customers is desirable, but requires increasingly more sophisticated autoscaling policies, that is, policies that dynamically provision resources for the customer. Although selecting and tuning autoscaling policies is a challenging task for datacenter operators, so far relatively few studies investigate the performance of autoscaling for workloads of workflows. Complementing previous knowledge, in this work we propose the first comprehensive performance study in the field. Using trace-based simulation, we compare state-of-the-art autoscaling policies across multiple application domains, workload arrival patterns (e.g., burstiness), and system utilization levels. We further investigate the interplay between autoscaling and regular allocation policies, and the complexity cost of autoscaling. Our quantitative study focuses not only on traditional performance metrics and on state-of-the-art elasticity metrics, but also on time- and memory-related autoscaling-complexity metrics. Our main results give strong and quantitative evidence about previously unreported operational behavior, for example, that autoscaling policies perform differently across application domains and by how much they differ.Comment: Technical Report for the CCGrid 2018 submission "A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters

    Active Data: A Data-Centric Approach to Data Life-Cycle Management

    Get PDF
    International audienceData-intensive science offers new opportunities for innovation and discoveries, provided that large datasets can be handled efficiently. Data management for data-intensive science applications is challenging; requiring support for complex data life cycles, coordination across multiple sites, fault tolerance, and scalability to support tens of sites and petabytes of data. In this paper, we argue that data management for data-intensive science applications requires a fundamentally different management approach than the current ad-hoc task centric approach. We propose Active Data, a fundamentally novel paradigm for data life cycle management. Active Data follows two principles: data-centric and event-driven. We report on the Active Data programming model and its preliminary implementation, and discuss the benefits and limitations of the approach on recognized challenging data-intensive science use-cases.Les importants volumes de données produits par la science présentent de nouvelles opportunités d'innovation et de découvertes. Cependant ceci sera conditionné par notre capacité à gérer efficacement de très grands jeux de données. La gestion de données pour les applications scientifiques data-intensive présente un véritable défi~; elle requière le support de cycles de vie très complexes, la coordination de plusieurs sites, de la tolérance aux pannes et de passer à l'échelle sur des dizaines de sites avec plusieurs péta-octets de données. Dans cet article nous argumentons que la gestion des données pour les applications scientifiques data-intensive nécessite une approche fondamentalement différente de l'actuel paradigme centré sur les tâches. Nous proposons Active Data, un nouveau paradigme pour la gestion du cycle de vie des données. Active Data suit deux principes~: il est centré sur les données et à base d'événements. Nous présentons le modèle de programmation Active Data, un prototype d'implémentation et discutons des avantages et limites de notre approche à partir d'étude de cas d'applications scientifiques
    • …
    corecore