1,234 research outputs found
Collocation Games and Their Application to Distributed Resource Management
We introduce Collocation Games as the basis of a general framework for modeling, analyzing, and facilitating the interactions between the various stakeholders in distributed systems in general, and in cloud computing environments in particular. Cloud computing enables fixed-capacity (processing, communication, and storage) resources to be offered by infrastructure providers as commodities for sale at a fixed cost in an open marketplace to independent, rational parties (players) interested in setting up their own applications over the Internet. Virtualization technologies enable the partitioning of such fixed-capacity resources so as to allow each player to dynamically acquire appropriate fractions of the resources for unencumbered use. In such a paradigm, the resource management problem reduces to that of partitioning the entire set of applications (players) into subsets, each of which is assigned to fixed-capacity cloud resources. If the infrastructure and the various applications are under a single administrative domain, this partitioning reduces to an optimization problem whose objective is to minimize the overall deployment cost. In a marketplace, in which the infrastructure provider is interested in maximizing its own profit, and in which each player is interested in minimizing its own cost, it should be evident that a global optimization is precisely the wrong framework. Rather, in this paper we use a game-theoretic framework in which the assignment of players to fixed-capacity resources is the outcome of a strategic "Collocation Game". Although we show that determining the existence of an equilibrium for collocation games in general is NP-hard, we present a number of simplified, practically-motivated variants of the collocation game for which we establish convergence to a Nash Equilibrium, and for which we derive convergence and price of anarchy bounds. In addition to these analytical results, we present an experimental evaluation of implementations of some of these variants for cloud infrastructures consisting of a collection of multidimensional resources of homogeneous or heterogeneous capacities. Experimental results using trace-driven simulations and synthetically generated datasets corroborate our analytical results and also illustrate how collocation games offer a feasible distributed resource management alternative for autonomic/self-organizing systems, in which the adoption of a global optimization approach (centralized or distributed) would be neither practical nor justifiable.NSF (CCF-0820138, CSR-0720604, EFRI-0735974, CNS-0524477, CNS-052016, CCR-0635102); Universidad Pontificia Bolivariana; COLCIENCIAS–Instituto Colombiano para el Desarrollo de la Ciencia y la Tecnología "Francisco José de Caldas
Bandwidth-Aware On-Line Scheduling in SMT Multicores
© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The memory hierarchy plays a critical role on the performance of current chip multiprocessors. Main memory is shared by all the running processes, which can cause important bandwidth contention. In addition, when the processor implements SMT cores, the L1 bandwidth becomes shared among the threads running on each core. In such a case, bandwidth-aware schedulers emerge as an interesting approach to mitigate the contention. This work investigates the performance degradation that the processes suffer due to memory bandwidth constraints. Experiments show that main memory and L1 bandwidth contention negatively impact the process performance; in both cases, performance degradation can grow up to 40 percent for some of applications. To deal with contention, we devise a scheduling algorithm that consists of two policies guided by the bandwidth consumption gathered at runtime. The process selection policy balances the number of memory requests over the execution time to address main memory bandwidth contention. The process allocation policy tackles L1 bandwidth contention by balancing the L1 accesses among the L1 caches. The proposal is evaluated on a Xeon E5645 platform using a wide set of multiprogrammed workloads, achieving performance benefits up to 6.7 percent with respect to the Linux scheduler.This work was supported by the Spanish Ministerio de Economia y Competitividad (MINECO) and by FEDER funds under Grant TIN2012-38341-C04-01, and by the Intel Early Career Faculty Honor Program Award.Feliu-Pérez, J.; Sahuquillo Borrás, J.; Petit Martí, SV.; Duato Marín, JF. (2016). Bandwidth-Aware On-Line Scheduling in SMT Multicores. IEEE Transactions on Computers. 65(2):422-434. https://doi.org/10.1109/TC.2015.2428694S42243465
Task Activity Vectors: A Novel Metric for Temperature-Aware and Energy-Efficient Scheduling
This thesis introduces the abstraction of the task activity vector to characterize applications by the processor resources they utilize. Based on activity vectors, the thesis introduces scheduling policies for improving the temperature distribution on the processor chip and for increasing energy efficiency by reducing the contention for shared resources of multicore and multithreaded processors
Recommended from our members
Final Report: Performance Modeling Activities in PERC2
Progress in Performance Modeling for PERC2 resulted in: • Automated modeling tools that are robust, able to characterize large applications running at scale while simultaneously simulating the memory hierarchies of mul-tiple machines in parallel. • Porting of the requisite tracer tools to multiple platforms. • Improved performance models by using higher resolution memory models that ever before. • Adding control-flow and data dependency analysis to the tracers used in perform-ance tools. • Exploring and developing several new modeling methodologies. • Using modeling tools to develop performance models for strategic codes. • Application of modeling methodology to make a large number of “blind” per-formance predictions on certain mission partner applications, targeting most cur-rently available system architectures. • Error analysis to correct some systematic biases encountered as part of the large-scale blind prediction exercises. • Addition of instrumentation capabilities for communication libraries other than MPI. • Dissemination the tools and modeling methods to several mission partners, in-cluding DoD HPCMO and two DARPA HPCS vendors (Cray and IBM), as well as to the wider HPC community via a series of tutorials
Blox: A Modular Toolkit for Deep Learning Schedulers
Deep Learning (DL) workloads have rapidly increased in popularity in
enterprise clusters and several new cluster schedulers have been proposed in
recent years to support these workloads. With rapidly evolving DL workloads, it
is challenging to quickly prototype and compare scheduling policies across
workloads. Further, as prior systems target different aspects of scheduling
(resource allocation, placement, elasticity etc.), it is also challenging to
combine these techniques and understand the overall benefits. To address these
challenges we propose Blox, a modular toolkit which allows developers to
compose individual components and realize diverse scheduling frameworks. We
identify a set of core abstractions for DL scheduling, implement several
existing schedulers using these abstractions, and verify the fidelity of these
implementations by reproducing results from prior research. We also highlight
how we can evaluate and compare existing schedulers in new settings: different
workload traces, higher cluster load, change in DNN workloads and deployment
characteristics. Finally, we showcase Blox's extensibility by composing
policies from different schedulers, and implementing novel policies with
minimal code changes. Blox is available at
\url{https://github.com/msr-fiddle/blox}.Comment: To be presented at Eurosys'2
Hybrid scheduling algorithms in cloud computing: a review
Cloud computing is one of the emerging fields in computer science due to its several advancements like on-demand processing, resource sharing, and pay per use. There are several cloud computing issues like security, quality of service (QoS) management, data center energy consumption, and scaling. Scheduling is one of the several challenging problems in cloud computing, where several tasks need to be assigned to resources to optimize the quality of service parameters. Scheduling is a well-known NP-hard problem in cloud computing. This will require a suitable scheduling algorithm. Several heuristics and meta-heuristics algorithms were proposed for scheduling the user's task to the resources available in cloud computing in an optimal way. Hybrid scheduling algorithms have become popular in cloud computing. In this paper, we reviewed the hybrid algorithms, which are the combinations of two or more algorithms, used for scheduling in cloud computing. The basic idea behind the hybridization of the algorithm is to take useful features of the used algorithms. This article also classifies the hybrid algorithms and analyzes their objectives, quality of service (QoS) parameters, and future directions for hybrid scheduling algorithms
L1-Bandwidth Aware Thread Allocation in Multicore SMT Processors
© 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Improving the utilization of shared resources is a
key issue to increase performance in SMT processors. Recent
work has focused on resource sharing policies to enhance the
processor performance, but their proposals mainly concentrate on
novel hardware mechanisms that adapt to the dynamic resource
requirements of the running threads.
This work addresses the L1 cache bandwidth problem in SMT
processors experimentally on real hardware. Unlike previous
work, this paper concentrates on thread allocation, by selecting
the proper pair of co-runners to be launched to the same
core. The relation between L1 bandwidth requirements of each
benchmark and its performance (IPC) is analyzed. We found that
for individual benchmarks, performance is strongly connected to
L1 bandwidth consumption, and this observation remains valid
when several co-runners are launched to the same SMT core.
Based on these findings we propose two L1 bandwidth
aware thread to core (t2c) allocation policies, namely Static and
Dynamic t2c allocation, respectively. The aim of these policies is
to properly balance L1 bandwidth requirements of the running
threads among the processor cores. Experiments on a Xeon E5645
processor show that the proposed policies significantly improve
the performance of the Linux OS kernel regardless the number
of cores considered.This work was supported by the Spanish Ministerio de
Econom´ıa y Competitividad (MINECO) and by FEDER funds
under Grant TIN2012-38341-C04-01; and by Programa de
Apoyo a la Investigacion y Desarrollo (PAID-05-12) of the ´
Universitat Politecnica de Val ` encia under Grant SP20120748Feliu Pérez, J.; Sahuquillo Borrás, J.; Petit Martí, SV.; Duato Marín, JF. (2013). L1-Bandwidth Aware Thread Allocation in Multicore SMT Processors. IEEE. https://doi.org/10.1109/PACT.2013.6618810
- …