3,474 research outputs found

    HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges

    Full text link
    High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of moving resource-intensive applications from on-premise environments to public cloud platforms. Industry trends show hybrid environments are the natural path to get the best of the on-premise and cloud resources---steady (and sensitive) workloads can run on on-premise resources and peak demand can leverage remote resources in a pay-as-you-go manner. Nevertheless, there are plenty of questions to be answered in HPC cloud, which range from how to extract the best performance of an unknown underlying platform to what services are essential to make its usage easier. Moreover, the discussion on the right pricing and contractual models to fit small and large users is relevant for the sustainability of HPC clouds. This paper brings a survey and taxonomy of efforts in HPC cloud and a vision on what we believe is ahead of us, including a set of research challenges that, once tackled, can help advance businesses and scientific discoveries. This becomes particularly relevant due to the fast increasing wave of new HPC applications coming from big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR

    3E: Energy-Efficient Elastic Scheduling for Independent Tasks in Heterogeneous Computing Systems

    Get PDF
    Reducing energy consumption is a major design constraint for modern heterogeneous computing systems to minimize electricity cost, improve system reliability and protect environment. Conventional energy-efficient scheduling strategies developed on these systems do not sufficiently exploit the system elasticity and adaptability for maximum energy savings, and do not simultaneously take account of user expected finish time. In this paper, we develop a novel scheduling strategy named energy-efficient elastic (3E) scheduling for aperiodic, independent and non-real-time tasks with user expected finish times on DVFS-enabled heterogeneous computing systems. The 3E strategy adjusts processors’ supply voltages and frequencies according to the system workload, and makes trade-offs between energy consumption and user expected finish times. Compared with other energy-efficient strategies, 3E significantly improves the scheduling quality and effectively enhances the system elasticity

    TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications

    Get PDF
    Datacenters running on-line, data-intensive applications (OLDIs) consume significant amounts of energy. However, reducing their energy is challenging due to their tight response time requirements. A key aspect of OLDIs is that each user query goes to all or many of the nodes in the cluster, so that the overall time budget is dictated by the tail of the replies' latency distribution; replies see latency variations both in the network and compute. Previous work proposes to achieve load-proportional energy by slowing down the computation at lower datacenter loads based directly on response times (i.e., at lower loads, the proposal exploits the average slack in the time budget provisioned for the peak load). In contrast, we propose TimeTrader to reduce energy by exploiting the latency slack in the sub- critical replies which arrive before the deadline (e.g., 80% of replies are 3-4x faster than the tail). This slack is present at all loads and subsumes the previous work's load-related slack. While the previous work shifts the leaves' response time distribution to consume the slack at lower loads, TimeTrader reshapes the distribution at all loads by slowing down individual sub-critical nodes without increasing missed deadlines. TimeTrader exploits slack in both the network and compute budgets. Further, TimeTrader leverages Earliest Deadline First scheduling to largely decouple critical requests from the queuing delays of sub- critical requests which can then be slowed down without hurting critical requests. A combination of real-system measurements and at-scale simulations shows that without adding to missed deadlines, TimeTrader saves 15-19% and 41-49% energy at 90% and 30% loading, respectively, in a datacenter with 512 nodes, whereas previous work saves 0% and 31-37%.Comment: 13 page

    Many-Task Computing and Blue Waters

    Full text link
    This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

    Power Bounded Computing on Current & Emerging HPC Systems

    Get PDF
    Power has become a critical constraint for the evolution of large scale High Performance Computing (HPC) systems and commercial data centers. This constraint spans almost every level of computing technologies, from IC chips all the way up to data centers due to physical, technical, and economic reasons. To cope with this reality, it is necessary to understand how available or permissible power impacts the design and performance of emergent computer systems. For this reason, we propose power bounded computing and corresponding technologies to optimize performance on HPC systems with limited power budgets. We have multiple research objectives in this dissertation. They center on the understanding of the interaction between performance, power bounds, and a hierarchical power management strategy. First, we develop heuristics and application aware power allocation methods to improve application performance on a single node. Second, we develop algorithms to coordinate power across nodes and components based on application characteristic and power budget on a cluster. Third, we investigate performance interference induced by hardware and power contentions, and propose a contention aware job scheduling to maximize system throughput under given power budgets for node sharing system. Fourth, we extend to GPU-accelerated systems and workloads and develop an online dynamic performance & power approach to meet both performance requirement and power efficiency. Power bounded computing improves performance scalability and power efficiency and decreases operation costs of HPC systems and data centers. This dissertation opens up several new ways for research in power bounded computing to address the power challenges in HPC systems. The proposed power and resource management techniques provide new directions and guidelines to green exscale computing and other computing systems

    A Framework for Approximate Optimization of BoT Application Deployment in Hybrid Cloud Environment

    Get PDF
    We adopt a systematic approach to investigate the efficiency of near-optimal deployment of large-scale CPU-intensive Bag-of-Task applications running on cloud resources with the non-proportional cost to performance ratios. Our analytical solutions perform in both known and unknown running time of the given application. It tries to optimize users' utility by choosing the most desirable tradeoff between the make-span and the total incurred expense. We propose a schema to provide a near-optimal deployment of BoT application regarding users' preferences. Our approach is to provide user with a set of Pareto-optimal solutions, and then she may select one of the possible scheduling points based on her internal utility function. Our framework can cope with uncertainty in the tasks' execution time using two methods, too. First, an estimation method based on a Monte Carlo sampling called AA algorithm is presented. It uses the minimum possible number of sampling to predict the average task running time. Second, assuming that we have access to some code analyzer, code profiling or estimation tools, a hybrid method to evaluate the accuracy of each estimation tool in certain interval times for improving resource allocation decision has been presented. We propose approximate deployment strategies that run on hybrid cloud. In essence, proposed strategies first determine either an estimated or an exact optimal schema based on the information provided from users' side and environmental parameters. Then, we exploit dynamic methods to assign tasks to resources to reach an optimal schema as close as possible by using two methods. A fast yet simple method based on First Fit Decreasing algorithm, and a more complex approach based on the approximation solution of the transformed problem into a subset sum problem. Extensive experiment results conducted on a hybrid cloud platform confirm that our framework can deliver a near optimal solution respecting user's utility function

    Technical Report: A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters

    Get PDF
    To improve customer experience, datacenter operators offer support for simplifying application and resource management. For example, running workloads of workflows on behalf of customers is desirable, but requires increasingly more sophisticated autoscaling policies, that is, policies that dynamically provision resources for the customer. Although selecting and tuning autoscaling policies is a challenging task for datacenter operators, so far relatively few studies investigate the performance of autoscaling for workloads of workflows. Complementing previous knowledge, in this work we propose the first comprehensive performance study in the field. Using trace-based simulation, we compare state-of-the-art autoscaling policies across multiple application domains, workload arrival patterns (e.g., burstiness), and system utilization levels. We further investigate the interplay between autoscaling and regular allocation policies, and the complexity cost of autoscaling. Our quantitative study focuses not only on traditional performance metrics and on state-of-the-art elasticity metrics, but also on time- and memory-related autoscaling-complexity metrics. Our main results give strong and quantitative evidence about previously unreported operational behavior, for example, that autoscaling policies perform differently across application domains and by how much they differ.Comment: Technical Report for the CCGrid 2018 submission "A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters
    corecore