263 research outputs found

    Holistic VM Placement for Distributed Parallel Applications in Heterogeneous Clusters

    Get PDF
    In a heterogeneous cluster, virtual machine (VM) placement for a distributed parallel application is challenging due to numerous possible ways of placing the application and complexity of estimating the performance of the application. This study investigates a holistic VM placement technique for distributed parallel applications in a heterogeneous cluster, aiming to maximize the efficiency of the cluster and consequently reduce the costs for service providers and users. The proposed technique accommodates various factors that have an impact on performance in a combined manner. First, we analyze the effects of the heterogeneity of resources, different VM configurations, and interference between VMs on the performance of distributed parallel applications with a wide diversity of characteristics, including scientific and big data analytics applications. We then propose a placement technique that uses a machine learning algorithm to estimate the runtime of a distributed parallel application. To train a performance estimation model, a distributed parallel application is profiled against synthetic workloads that mostly utilize the dominant resource of the application, which strongly affects the application performance, reducing the profiling space dramatically. Through experimental and simulation studies, we show that the proposed placement technique can find good VM placement configurations for various workloads

    Colocation-aware Resource Management for Distributed Parallel Applications in Consolidated Clusters

    Get PDF
    Department of Computer Science and EngineeringConsolidated clusters, which run various distributed parallel applications such as big data frameworks, machine learning applications, and scienti???c applications to solve complex problems in wide range of fields, are already used commonly. Resource providers allow various applications with different characteristics to execute together to efficiently utilize their resources. There are some important issues about scheduling applications to resources. When applications share the same resources, interference between them affects their performance. The performance of applications can be improved or degraded depending on which resources are used to execute them based on various characteristics of applications and resources. Characteristics and resource requirements of applications can constrain their placement, and these constraints can be extended to constraints between applications. These issues should be considered to manage resource e???ciently and improve the performance of applications. In this thesis, we study how to manage resources e???ciently while scheduling distributed parallel applications in consolidated clusters. First, we present a holistic VM placement technique for distributed parallel applications in heterogeneous virtual cluster, aiming to maximize the e???ciency of the cluster and consequently reduce cost for service providers and users. We analyze the e???ect of heterogeneity of resource, di???erent VM con???gurations, and interference between VMs on the performance of distributed parallel applications and propose a placement technique that uses a machine learning algorithm to estimate the runtime of a distributed parallel application. Second, we present a two-level scheduling algorithms, which distribute applications to platforms then map tasks to each node. we analyze the platform and co-runner a???nities of looselycoupled applications and use them for scheduling decision. Third, we study constraint-aware VM placement in heterogeneous clusters. We present a modelofVMplacementconstraintsandconstraint-awareVMplacementalgorithms. Weanalyze the e???ect of VM placement constraint, and evaluate the performance of algorithms over various settings with simulation and experiments in a small cluster. Finally, we propose interference-awareresource management system for CNN models in GPU cluster. We analyze the e???ect of interference between CNN models. We then propose techniques to mitigate slowdown from interference for target model, and to predict performance of CNN models when they are co-located. We propose heuristic algorithm to schedule CNN models, and evaluate the techniques and algorithm from experiments in GPU cluster.clos

    Compilation of Abstracts for SC12 Conference Proceedings

    Get PDF
    1 A Breakthrough in Rotorcraft Prediction Accuracy Using Detached Eddy Simulation; 2 Adjoint-Based Design for Complex Aerospace Configurations; 3 Simulating Hypersonic Turbulent Combustion for Future Aircraft; 4 From a Roar to a Whisper: Making Modern Aircraft Quieter; 5 Modeling of Extended Formation Flight on High-Performance Computers; 6 Supersonic Retropropulsion for Mars Entry; 7 Validating Water Spray Simulation Models for the SLS Launch Environment; 8 Simulating Moving Valves for Space Launch System Liquid Engines; 9 Innovative Simulations for Modeling the SLS Solid Rocket Booster Ignition; 10 Solid Rocket Booster Ignition Overpressure Simulations for the Space Launch System; 11 CFD Simulations to Support the Next Generation of Launch Pads; 12 Modeling and Simulation Support for NASA's Next-Generation Space Launch System; 13 Simulating Planetary Entry Environments for Space Exploration Vehicles; 14 NASA Center for Climate Simulation Highlights; 15 Ultrascale Climate Data Visualization and Analysis; 16 NASA Climate Simulations and Observations for the IPCC and Beyond; 17 Next-Generation Climate Data Services: MERRA Analytics; 18 Recent Advances in High-Resolution Global Atmospheric Modeling; 19 Causes and Consequences of Turbulence in the Earths Protective Shield; 20 NASA Earth Exchange (NEX): A Collaborative Supercomputing Platform; 21 Powering Deep Space Missions: Thermoelectric Properties of Complex Materials; 22 Meeting NASA's High-End Computing Goals Through Innovation; 23 Continuous Enhancements to the Pleiades Supercomputer for Maximum Uptime; 24 Live Demonstrations of 100-Gbps File Transfers Across LANs and WANs; 25 Untangling the Computing Landscape for Climate Simulations; 26 Simulating Galaxies and the Universe; 27 The Mysterious Origin of Stellar Masses; 28 Hot-Plasma Geysers on the Sun; 29 Turbulent Life of Kepler Stars; 30 Modeling Weather on the Sun; 31 Weather on Mars: The Meteorology of Gale Crater; 32 Enhancing Performance of NASAs High-End Computing Applications; 33 Designing Curiosity's Perfect Landing on Mars; 34 The Search Continues: Kepler's Quest for Habitable Earth-Sized Planets

    Improving efficiency and resilience in large-scale computing systems through analytics and data-driven management

    Full text link
    Applications running in large-scale computing systems such as high performance computing (HPC) or cloud data centers are essential to many aspects of modern society, from weather forecasting to financial services. As the number and size of data centers increase with the growing computing demand, scalable and efficient management becomes crucial. However, data center management is a challenging task due to the complex interactions between applications, middleware, and hardware layers such as processors, network, and cooling units. This thesis claims that to improve robustness and efficiency of large-scale computing systems, significantly higher levels of automated support than what is available in today's systems are needed, and this automation should leverage the data continuously collected from various system layers. Towards this claim, we propose novel methodologies to automatically diagnose the root causes of performance and configuration problems and to improve efficiency through data-driven system management. We first propose a framework to diagnose software and hardware anomalies that cause undesired performance variations in large-scale computing systems. We show that by training machine learning models on resource usage and performance data collected from servers, our approach successfully diagnoses 98% of the injected anomalies at runtime in real-world HPC clusters with negligible computational overhead. We then introduce an analytics framework to address another major source of performance anomalies in cloud data centers: software misconfigurations. Our framework discovers and extracts configuration information from cloud instances such as containers or virtual machines. This is the first framework to provide comprehensive visibility into software configurations in multi-tenant cloud platforms, enabling systematic analysis for validating the correctness of software configurations. This thesis also contributes to the design of robust and efficient system management methods that leverage continuously monitored resource usage data. To improve performance under power constraints, we propose a workload- and cooling-aware power budgeting algorithm that distributes the available power among servers and cooling units in a data center, achieving up to 21% improvement in throughput per Watt compared to the state-of-the-art. Additionally, we design a network- and communication-aware HPC workload placement policy that reduces communication overhead by up to 30% in terms of hop-bytes compared to existing policies.2019-07-02T00:00:00

    Evolutionary Game Theoretic Multi-Objective Optimization Algorithms and Their Applications

    Get PDF
    Multi-objective optimization problems require more than one objective functions to be optimized simultaneously. They are widely applied in many science fields, including engineering, economics and logistics where optimal decisions need to be taken in the presence of trade-offs between two or more conicting objectives. Most of the real world multi-objective optimization problems are NP-Hard problems. It may be too computationally costly to find an exact solution but sometimes a near optimal solution is sufficient. In these cases, Multi-Objective Evolutionary Algorithms (MOEAs) provide good approximate solutions to problems that cannot be solved easily using other techniques. However Evolutionary Algorithm is not stable due to its random nature, it may produce very different results every time it runs. This dissertation proposes an Evolutionary Game Theory (EGT) framework based algorithm (EGTMOA) that provides optimality and stability at the same time. EGTMOA combines the notion of stability from EGT and optimality from MOEA to form a novel and promising algorithm to solve multi-objective optimization problems. This dissertation studies three different multi-objective optimization applications, Cloud Virtual Machine Placement, Body Sensor Networks, and Multi-Hub Molecular Communication along with their proposed EGTMOA framework based algorithms. Experiment results show that EGTMOAs outperform many well known multi-objective evolutionary algorithms in stability, performance and runtime

    Putting the User at the Centre of the Grid: Simplifying Usability and Resource Selection for High Performance Computing

    Get PDF
    Computer simulation is finding a role in an increasing number of scientific disciplines, concomitant with the rise in available computing power. Realizing this inevitably re- quires access to computational power beyond the desktop, making use of clusters, supercomputers, data repositories, networks and distributed aggregations of these re- sources. Accessing one such resource entails a number of usability and security prob- lems; when multiple geographically distributed resources are involved, the difficulty is compounded. However, usability is an all too often neglected aspect of computing on e-infrastructures, although it is one of the principal factors militating against the widespread uptake of distributed computing. The usability problems are twofold: the user needs to know how to execute the applications they need to use on a particular resource, and also to gain access to suit- able resources to run their workloads as they need them. In this thesis we present our solutions to these two problems. Firstly we propose a new model of e-infrastructure resource interaction, which we call the user–application interaction model, designed to simplify executing application on high performance computing resources. We describe the implementation of this model in the Application Hosting Environment, which pro- vides a Software as a Service layer on top of distributed e-infrastructure resources. We compare the usability of our system with commonly deployed middleware tools using five usability metrics. Our middleware and security solutions are judged to be more usable than other commonly deployed middleware tools. We go on to describe the requirements for a resource trading platform that allows users to purchase access to resources within a distributed e-infrastructure. We present the implementation of this Resource Allocation Market Place as a distributed multi- agent system, and show how it provides a highly flexible, efficient tool to schedule workflows across high performance computing resources