286 research outputs found

    HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges

    Full text link
    High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of moving resource-intensive applications from on-premise environments to public cloud platforms. Industry trends show hybrid environments are the natural path to get the best of the on-premise and cloud resources---steady (and sensitive) workloads can run on on-premise resources and peak demand can leverage remote resources in a pay-as-you-go manner. Nevertheless, there are plenty of questions to be answered in HPC cloud, which range from how to extract the best performance of an unknown underlying platform to what services are essential to make its usage easier. Moreover, the discussion on the right pricing and contractual models to fit small and large users is relevant for the sustainability of HPC clouds. This paper brings a survey and taxonomy of efforts in HPC cloud and a vision on what we believe is ahead of us, including a set of research challenges that, once tackled, can help advance businesses and scientific discoveries. This becomes particularly relevant due to the fast increasing wave of new HPC applications coming from big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR

    A Survey on Meta-Heuristic Scheduling Optimization Techniques in Cloud Computing Environment

    Get PDF
    As cloud computing is turning out to be evident that the eventual fate of the cloud industry relies on interconnected cloud systems where the resources are probably going to be provided by various cloud service suppliers. Clouds are also seen as being multifaceted; if the user requires only computing capacity and wishes to personalize it as per his requirements, the infrastructure cloud suppliers are able to provide this convenience as virtual machines.Many optimized meta-heuristic scheduling techniques are introduced for scheduling of bag-of-tasks applications in heterogeneous framework of clouds.The overall analysis demonstrates that, utilizing different meta-heuristic techniques can offer noteworthy benefits in the terms of speed and performance

    Dynamic energy-aware scheduling for parallel task-based application in cloud computing

    Get PDF
    Green Computing is a recent trend in computer science, which tries to reduce the energy consumption and carbon footprint produced by computers on distributed platforms such as clusters, grids, and clouds. Traditional scheduling solutions attempt to minimize processing times without taking into account the energetic cost. One of the methods for reducing energy consumption is providing scheduling policies in order to allocate tasks on specific resources that impact over the processing times and energy consumption. In this paper, we propose a real-time dynamic scheduling system to execute efficiently task-based applications on distributed computing platforms in order to minimize the energy consumption. Scheduling tasks on multiprocessors is a well known NP-hard problem and optimal solution of these problems is not feasible, we present a polynomial-time algorithm that combines a set of heuristic rules and a resource allocation technique in order to get good solutions on an affordable time scale. The proposed algorithm minimizes a multi-objective function which combines the energy-consumption and execution time according to the energy-performance importance factor provided by the resource provider or user, also taking into account sequence-dependent setup times between tasks, setup times and down times for virtual machines (VM) and energy profiles for different architectures. A prototype implementation of the scheduler has been tested with different kinds of DAG generated at random as well as on real task-based COMPSs applications. We have tested the system with different size instances and importance factors, and we have evaluated which combination provides a better solution and energy savings. Moreover, we have also evaluated the introduced overhead by measuring the time for getting the scheduling solutions for a different number of tasks, kinds of DAG, and resources, concluding that our method is suitable for run-time scheduling.This work has been supported by the Spanish Government (contracts TIN2015-65316-P, TIN2012-34557, CSD2007-00050, CAC2007-00052 and SEV-2011-00067), by Generalitat de Catalunya (contract 2014-SGR-1051), by the European Commission (Euroserver project, contract 610456) and by Consejo Nacional de Ciencia y TecnologĂ­a of Mexico (special program for postdoctoral position BSC-CNS-CONACYT contract 290790, grant number 265937).Peer ReviewedAward-winningPostprint (published version

    Power Bounded Computing on Current & Emerging HPC Systems

    Get PDF
    Power has become a critical constraint for the evolution of large scale High Performance Computing (HPC) systems and commercial data centers. This constraint spans almost every level of computing technologies, from IC chips all the way up to data centers due to physical, technical, and economic reasons. To cope with this reality, it is necessary to understand how available or permissible power impacts the design and performance of emergent computer systems. For this reason, we propose power bounded computing and corresponding technologies to optimize performance on HPC systems with limited power budgets. We have multiple research objectives in this dissertation. They center on the understanding of the interaction between performance, power bounds, and a hierarchical power management strategy. First, we develop heuristics and application aware power allocation methods to improve application performance on a single node. Second, we develop algorithms to coordinate power across nodes and components based on application characteristic and power budget on a cluster. Third, we investigate performance interference induced by hardware and power contentions, and propose a contention aware job scheduling to maximize system throughput under given power budgets for node sharing system. Fourth, we extend to GPU-accelerated systems and workloads and develop an online dynamic performance & power approach to meet both performance requirement and power efficiency. Power bounded computing improves performance scalability and power efficiency and decreases operation costs of HPC systems and data centers. This dissertation opens up several new ways for research in power bounded computing to address the power challenges in HPC systems. The proposed power and resource management techniques provide new directions and guidelines to green exscale computing and other computing systems

    Load Balancing in Distributed Cloud Computing: A Reinforcement Learning Algorithms in Heterogeneous Environment

    Get PDF
    Balancing load in cloud based is an important aspect that plays a vital role in order to achieve sharing of load between different types of resources such as virtual machines that lay on servers, storage in the form of hard drives and servers. Reinforcement learning approaches can be adopted with cloud computing to achieve quality of service factors such as minimized cost and response time, increased throughput, fault tolerance and utilization of all available resources in the network, thus increasing system performance. Reinforcement Learning based approaches result in making effective resource utilization by selecting the best suitable processor for task execution with minimum makespan. Since in the earlier related work done on sharing of load, there are limited reinforcement learning based approaches. However this paper, focuses on the importance of RL based approaches for achieving balanced load in the area of distributed cloud computing. A Reinforcement Learning framework is proposed and implemented for execution of tasks in heterogeneous environments, particularly, Least Load Balancing (LLB) and Booster Reinforcement Controller (BRC) Load Balancing. With the help of reinforcement learning approaches an optimal result is achieved for load sharing and task allocation. In this RL based framework processor workload is taken as an input. In this paper, the results of proposed RL based approaches have been evaluated for cost and makespan and are compared with existing load balancing techniques for task execution and resource utilization.

    Energy-aware Graph Job Allocation in Software Defined Air-Ground Integrated Vehicular Networks

    Full text link
    The software defined air-ground integrated vehicular (SD-AGV) networks have emerged as a promising paradigm, which realize the flexible on-ground resource sharing to support innovative applications for UAVs with heavy computational overhead. In this paper, we investigate a vehicular cloud-assisted graph job allocation problem in SD-AGV networks, where the computation-intensive jobs carried by UAVs, and the vehicular cloud are modeled as graphs. To map each component of the graph jobs to a feasible vehicle, while achieving the trade-off among minimizing UAVs' job completion time, energy consumption, and the data exchange cost among vehicles, we formulate the problem as a mixed-integer non-linear programming problem, which is Np-hard. Moreover, the constraint associated with preserving job structures poses addressing the subgraph isomorphism problem, that further complicates the algorithm design. Motivated by which, we propose an efficient decoupled approach by separating the template (feasible mappings between components and vehicles) searching from the transmission power allocation. For the former, we present an efficient algorithm of searching for all the subgraph isomorphisms with low computation complexity. For the latter, we introduce a power allocation algorithm by applying convex optimization techniques. Extensive simulations demonstrate that the proposed approach outperforms the benchmark methods considering various problem sizes.Comment: 14 pages, 7 figure

    HTC Scientific Computing in a Distributed Cloud Environment

    Full text link
    This paper describes the use of a distributed cloud computing system for high-throughput computing (HTC) scientific applications. The distributed cloud computing system is composed of a number of separate Infrastructure-as-a-Service (IaaS) clouds that are utilized in a unified infrastructure. The distributed cloud has been in production-quality operation for two years with approximately 500,000 completed jobs where a typical workload has 500 simultaneous embarrassingly-parallel jobs that run for approximately 12 hours. We review the design and implementation of the system which is based on pre-existing components and a number of custom components. We discuss the operation of the system, and describe our plans for the expansion to more sites and increased computing capacity
    • …
    corecore