27 research outputs found

    Resource Provisioning Exploiting Cost and Performance Diversity within IaaS Cloud Providers

    Get PDF
    IaaS platforms such as Amazon EC2 allow clients access to massive computational power in the form of instances. Amazon hosts three different instance purchasing options, each with its own SLA covering pricing and availability. Amazon also offers access to a number of geographical regions, zones, and instance types to select from. In this thesis, the problem of utilizing Spot and On-Demand instances is analyzed and two approaches are presented in order to exploit the cost and performance diversity among different instance types and availability zones, and among the Spot markets they represent. We first develop RAMP, a framework designed to calculate the expected profit of using a specific Spot or On-Demand instance through an evaluation of instance reliability. RAMP is extended to develop RAMC-DC, a framework designed to allocate the most cost effective instance through strategies that facilitate interchangeability of instances among short jobs, reliability of instances among long jobs, and a comparison of the estimated costs of possible allocations. RAMC-DC achieves fault tolerance through comparisons of the price dynamics across instance types and availability zones, and through an examination of three basic checkpointing methods. Evaluations demonstrate that both frameworks take a large step toward low-volatility, high cost-efficiency resource provisioning. While achieving early-termination rates as low as 2.2%, RAMP can completely offset the total cost when charging the user just 17.5% of the On-Demand price. Moreover, the increases in profit resulting from relatively small additional charges to users are notably high, i.e., 100% profit compared to the resource provisioning cost with 35% of the equivalent On-Demand price. RAMC-DC can maintain deadline breaches below 1.8% of all jobs, achieve both early-termination and deadline breach rates as low as 0.5% of all jobs, and lowers total costs by between 80% and 87% compared to using only On-Demand instances

    Resource Provisioning Exploiting Cost and Performance Diversity within IaaS Cloud Providers

    Get PDF
    IaaS platforms such as Amazon EC2 allow clients access to massive computational power in the form of instances. Amazon hosts three different instance purchasing options, each with its own SLA covering pricing and availability. Amazon also offers access to a number of geographical regions, zones, and instance types to select from. In this thesis, the problem of utilizing Spot and On-Demand instances is analyzed and two approaches are presented in order to exploit the cost and performance diversity among different instance types and availability zones, and among the Spot markets they represent. We first develop RAMP, a framework designed to calculate the expected profit of using a specific Spot or On-Demand instance through an evaluation of instance reliability. RAMP is extended to develop RAMC-DC, a framework designed to allocate the most cost effective instance through strategies that facilitate interchangeability of instances among short jobs, reliability of instances among long jobs, and a comparison of the estimated costs of possible allocations. RAMC-DC achieves fault tolerance through comparisons of the price dynamics across instance types and availability zones, and through an examination of three basic checkpointing methods. Evaluations demonstrate that both frameworks take a large step toward low-volatility, high cost-efficiency resource provisioning. While achieving early-termination rates as low as 2.2%, RAMP can completely offset the total cost when charging the user just 17.5% of the On-Demand price. Moreover, the increases in profit resulting from relatively small additional charges to users are notably high, i.e., 100% profit compared to the resource provisioning cost with 35% of the equivalent On-Demand price. RAMC-DC can maintain deadline breaches below 1.8% of all jobs, achieve both early-termination and deadline breach rates as low as 0.5% of all jobs, and lowers total costs by between 80% and 87% compared to using only On-Demand instances

    GWpilot: Enabling multi-level scheduling in distributed infrastructures with GridWay and pilot jobs

    Get PDF
    Current systems based on pilot jobs are not exploiting all the scheduling advantages that the technique offers, or they lack compatibility or adaptability. To overcome the limitations or drawbacks in existing approaches, this study presents a different general-purpose pilot system, GWpilot. This system provides individual users or institutions with a more easy-to-use, easy-toinstall, scalable, extendable, flexible and adjustable framework to efficiently run legacy applications. The framework is based on the GridWay meta-scheduler and incorporates the powerful features of this system, such as standard interfaces, fair-share policies, ranking, migration, accounting and compatibility with diverse infrastructures. GWpilot goes beyond establishing simple network overlays to overcome the waiting times in remote queues or to improve the reliability in task production. It properly tackles the characterisation problem in current infrastructures, allowing users to arbitrarily incorporate customised monitoring of resources and their running applications into the system. This functionality allows the new framework to implement innovative scheduling algorithms that accomplish the computational needs of a wide range of calculations faster and more efficiently. The system can also be easily stacked under other software layers, such as self-schedulers. The advanced techniques included by default in the framework result in significant performance improvements even when very short tasks are scheduled

    Optimization techniques for adaptability in MPI application

    Get PDF
    The first version of MPI (Message Passing Interface) was released in 1994. At that time, scientific applications for HPC (High Performance Computing) were characterized by a static execution environment. These applications usually had regular computation and communication patterns, operated on dense data structures accessed with good data locality, and ran on homogeneous computing platforms. For these reasons, MPI has become the de facto standard for developing scientific parallel applications for HPC during the last decades. In recent years scientific applications have evolved in order to cope with several challenges posed by different fields of engineering, economics and medicine among others. These challenges include large amounts of data stored in irregular and sparse data structures with poor data locality to be processed in parallel (big data), algorithms with irregular computation and communication patterns, and heterogeneous computing platforms (grid, cloud and heterogeneous cluster). On the other hand, over the last years MPI has introduced relevant improvements and new features in order to meet the requirements of dynamic execution environments. Some of them include asynchronous non-blocking communications, collective I/O routines and the dynamic process management interface introduced in MPI 2.0. The dynamic process management interface allows the application to spawn new processes at runtime and enable communication with them. However, this feature has some technical limitations that make the implementation of malleable MPI applications still a challenge. This thesis proposes FLEX-MPI, a runtime system that extends the functionalities of the MPI standard library and features optimization techniques for adaptability of MPI applications to dynamic execution environments. These techniques can significantly improve the performance and scalability of scientific applications and the overall efficiency of the HPC system on which they run. Specifically, FLEX-MPI focuses on dynamic load balancing and performance-aware malleability for parallel applications. The main goal of the design and implementation of the adaptability techniques is to efficiently execute MPI applications on a wide range of HPC platforms ranging from small to large-scale systems. Dynamic load balancing allows FLEX-MPI to adapt the workload assignments at runtime to the performance of the computing elements that execute the parallel application. On the other hand, performance-aware malleability leverages the dynamic process management interface of MPI to change the number of processes of the application at runtime. This feature allows to improve the performance of applications that exhibit irregular computation patterns and execute in computing systems with dynamic availability of resources. One of the main features of these techniques is that they do not require user intervention nor prior knowledge of the underlying hardware. We have validated and evaluated the performance of the adaptability techniques with three parallel MPI benchmarks and different execution environments with homogeneous and heterogeneous cluster configurations. The results show that FLEXMPI significantly improves the performance of applications when running with the support of dynamic load balancing and malleability, along with a substantial enhancement of their scalability and an improvement of the overall system efficiency.La primera versión de MPI (Message Passing Interface) fue publicada en 1994, cuando la base común de las aplicaciones científicas para HPC (High Performance Computing) se caracterizaba por un entorno de ejecución estático. Dichas aplicaciones presentaban generalmente patrones regulares de cómputo y comunicaciones, accesos a estructuras de datos densas con alta localidad, y ejecución sobre plataformas de computación homogéneas. Esto ha hecho que MPI haya sido la alternativa más adecuada para la implementación de aplicaciones científicas para HPC durante más de 20 años. Sin embargo, en los últimos años las aplicaciones científicas han evolucionado para adaptarse a diferentes retos propuestos por diferentes campos de la ingeniería, la economía o la medicina entre otros. Estos nuevos retos destacan por características como grandes cantidades de datos almacenados en estructuras de datos irregulares con baja localidad para el análisis en paralelo (big data), algoritmos con patrones irregulares de cómputo y comunicaciones, e infraestructuras de computación heterogéneas (cluster heterogéneos, grid y cloud). Por otra parte, MPI ha evolucionado significativamente en cada una de sus sucesivas versiones, siendo algunas de las mejoras más destacables presentadas hasta la reciente versión 3.0 las operaciones de comunicación asíncronas no bloqueantes, rutinas de E/S colectiva, y la interfaz de procesos dinámicos presentada en MPI 2.0. Esta última proporciona un procedimiento para la creación de procesos en tiempo de ejecución de la aplicación. Sin embargo, la implementación de la interfaz de procesos dinámicos por parte de las diferentes distribuciones de MPI aún presenta numerosas limitaciones que condicionan el desarrollo de aplicaciones maleables en MPI. Esta tesis propone FLEX-MPI, un sistema que extiende las funcionalidades de la librería MPI y proporciona técnicas de optimización para la adaptación de aplicaciones MPI a entornos de ejecución dinámicos. Las técnicas integradas en FLEX-MPI permiten mejorar el rendimiento y escalabilidad de las aplicaciones científicas y la eficiencia de las plataformas sobre las que se ejecutan. Entre estas técnicas destacan el balanceo de carga dinámico y maleabilidad para aplicaciones MPI. El diseño e implementación de estas técnicas está dirigido a plataformas de cómputo HPC de pequeña a gran escala. El balanceo de carga dinámico permite a las aplicaciones adaptar de forma eficiente su carga de trabajo a las características y rendimiento de los elementos de procesamiento sobre los que se ejecutan. Por otro lado, la técnica de maleabilidad aprovecha la interfaz de procesos dinámicos de MPI para modificar el número de procesos de la aplicación en tiempo de ejecución, una funcionalidad que permite mejorar el rendimiento de aplicaciones con patrones irregulares o que se ejecutan sobre plataformas de cómputo con disponibilidad dinámica de recursos. Una de las principales características de estas técnicas es que no requieren intervención del usuario ni conocimiento previo de la arquitectura sobre la que se ejecuta la aplicación. Hemos llevado a cabo un proceso de validación y evaluación de rendimiento de las técnicas de adaptabilidad con tres diferentes aplicaciones basadas en MPI, bajo diferentes escenarios de computación homogéneos y heterogéneos. Los resultados demuestran que FLEX-MPI permite obtener un significativo incremento del rendimiento de las aplicaciones, unido a una mejora sustancial de la escalabilidad y un aumento de la eficiencia global del sistema.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Francisco Fernández Rivera.- Secretario: Florín Daniel Isaila.- Vocal: María Santos Pérez Hernánde

    Parallel Processes in HPX: Designing an Infrastructure for Adaptive Resource Management

    Get PDF
    Advancement in cutting edge technologies have enabled better energy efficiency as well as scaling computational power for the latest High Performance Computing(HPC) systems. However, complexity, due to hybrid architectures as well as emerging classes of applications, have shown poor computational scalability using conventional execution models. Thus alternative means of computation, that addresses the bottlenecks in computation, is warranted. More precisely, dynamic adaptive resource management feature, both from systems as well as application\u27s perspective, is essential for better computational scalability and efficiency. This research presents and expands the notion of Parallel Processes as a placeholder for procedure definitions, targeted at one or more synchronous domains, meta data for computation and resource management as well as infrastructure for dynamic policy deployment. In addition to this, the research presents additional guidelines for a framework for resource management in HPX runtime system. Further, this research also lists design principles for scalability of Active Global Address Space (AGAS), a necessary feature for Parallel Processes. Also, to verify the usefulness of Parallel Processes, a preliminary performance evaluation of different task scheduling policies is carried out using two different applications. The applications used are: Unbalanced Tree Search, a reference dynamic graph application, implemented by this research in HPX and MiniGhost, a reference stencil based application using bulk synchronous parallel model. The results show that different scheduling policies provide better performance for different classes of applications; and for the same application class, in certain instances, one policy fared better than the others, while vice versa in other instances, hence supporting the hypothesis of the need of dynamic adaptive resource management infrastructure, for deploying different policies and task granularities, for scalable distributed computing

    Dagstuhl Reports : Volume 1, Issue 2, February 2011

    Get PDF
    Online Privacy: Towards Informational Self-Determination on the Internet (Dagstuhl Perspectives Workshop 11061) : Simone Fischer-Hübner, Chris Hoofnagle, Kai Rannenberg, Michael Waidner, Ioannis Krontiris and Michael Marhöfer Self-Repairing Programs (Dagstuhl Seminar 11062) : Mauro Pezzé, Martin C. Rinard, Westley Weimer and Andreas Zeller Theory and Applications of Graph Searching Problems (Dagstuhl Seminar 11071) : Fedor V. Fomin, Pierre Fraigniaud, Stephan Kreutzer and Dimitrios M. Thilikos Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081) : Maxime Crochemore, Lila Kari, Mehryar Mohri and Dirk Nowotka Packing and Scheduling Algorithms for Information and Communication Services (Dagstuhl Seminar 11091) Klaus Jansen, Claire Mathieu, Hadas Shachnai and Neal E. Youn

    Cloud Workload Allocation Approaches for Quality of Service Guarantee and Cybersecurity Risk Management

    Get PDF
    It has become a dominant trend in industry to adopt cloud computing --thanks to its unique advantages in flexibility, scalability, elasticity and cost efficiency -- for providing online cloud services over the Internet using large-scale data centers. In the meantime, the relentless increase in demand for affordable and high-quality cloud-based services, for individuals and businesses, has led to tremendously high power consumption and operating expense and thus has posed pressing challenges on cloud service providers in finding efficient resource allocation policies. Allowing several services or Virtual Machines (VMs) to commonly share the cloud\u27s infrastructure enables cloud providers to optimize resource usage, power consumption, and operating expense. However, servers sharing among users and VMs causes performance degradation and results in cybersecurity risks. Consequently, how to develop efficient and effective resource management policies to make the appropriate decisions to optimize the trade-offs among resource usage, service quality, and cybersecurity loss plays a vital role in the sustainable future of cloud computing. In this dissertation, we focus on cloud workload allocation problems for resource optimization subject to Quality of Service (QoS) guarantee and cybersecurity risk constraints. To facilitate our research, we first develop a cloud computing prototype that we utilize to empirically validate the performance of different proposed cloud resource management schemes under a close to practical, but also isolated and well-controlled, environment. We then focus our research on the resource management policies for real-time cloud services with QoS guarantee. Based on queuing model with reneging, we establish and formally prove a series of fundamental principles, between service timing characteristics and their resource demands, and based on which we develop several novel resource management algorithms that statically guarantee the QoS requirements for cloud users. We then study the problem of mitigating cybersecurity risk and loss in cloud data centers via cloud resource management. We employ game theory to model the VM-to-VM interdependent cybersecurity risks in cloud clusters. We then conduct a thorough analysis based on our game-theory-based model and develop several algorithms for cybersecurity risk management. Specifically, we start our cybersecurity research from a simple case with only two types of VMs and next extend it to a more general case with an arbitrary number of VM types. Our intensive numerical and experimental results show that our proposed algorithms can significantly outperform the existing methodologies for large-scale cloud data centers in terms of resource usage, cybersecurity loss, and computational effectiveness

    Scalable Task Schedulers for Many-Core Architectures

    Get PDF
    This thesis develops schedulers for many-cores with different optimization objectives. The proposed schedulers are designed to be scale up as the number of cores in many-cores increase while continuing to provide guarantees on the quality of the schedule

    Performance-aware task scheduling in multi-core computers

    Get PDF
    Multi-core systems become more and more popular as they can satisfy the increasing computation capacity requirements of complex applications. Task scheduling strategy plays a key role in this vision and ensures that the task processing is both Quality-of-Service (QoS, in this thesis, refers to deadline) satisfied and energy-efficient. In this thesis, we develop task scheduling strategies for multi-core computing systems. We start by looking at two objectives of a multi-core computing system. The first objective aims at ensuring all tasks can satisfy their time constraints (i.e. deadline), while the second strives to minimize the overall energy consumption of the platform. We develop three power-aware scheduling strategies in virtualized systems managed by Xen. Comparing with the original scheduling strategy in Xen, these scheduling algorithms are able to reduce energy consumption without reducing the performance for the jobs. Then, we find that modelling the makespan of a task (before execution) accurately is very important for making scheduling decisions. Our studies show that the discrepancy between the assumption of (commonly used) sequential execution and the reality of time sharing execution may lead to inaccurate calculation of the task makespan. Thus, we investigate the impact of the time sharing execution on the task makespan, and propose the method to model and determine the makespan with the time-sharing execution. Thereafter, we extend our work to a more complex scenario: scheduling DAG applications for time sharing systems. Based on our time-sharing makespan model, we further develop the scheduling strategies for DAG jobs in time-sharing execution, which achieves more effective at task execution. Finally, as the resource interference also makes a big difference to the performance of co-running tasks in multi-core computers (which may further influence the scheduling decision making), we investigate the influential factors that impact on the performance when the tasks ii are co-running on a multicore computer and propose the machine learning-based prediction frameworks to predict the performance of the co-running tasks. The experimental results show that the techniques proposed in this thesis is effective
    corecore