6 research outputs found

    An approach to resource-aware coscheduling for cmps.

    Get PDF
    ABSTRACT We develop real-time scheduling techniques for improving performance and energy for multiprogrammed workloads that scale nonuniformly with increasing thread counts. Multithreaded programs generally deliver higher throughput than single-threaded programs on chip multiprocessors, but performance gains from increasing threads decrease when there is contention for shared resources. We use analytic metrics to derive local search heuristics for creating efficient multiprogrammed, multithreaded workload schedules. Programs are allocated fewer cores than requested, and scheduled to space-share the CMP to improve global throughput. Our holistic approach attempts to co-schedule programs that complement each other with respect to shared resource consumption. We find application co-scheduling for performance and energy in a resource-aware manner achieves better results than solely targeting total throughput or concurrently co-scheduling all programs. Our schedulers improve overall energy delay (E*D) by a factor of 1.5 over time-multiplexed gang scheduling

    Adaptive space-time sharing with SCOJO.

    Get PDF
    Coscheduling is a technique used to improve the performance of parallel computer applications under time sharing, i.e., to provide better response times than standard time sharing or space sharing. Dynamic coscheduling and gang scheduling are two main forms of coscheduling. In SCOJO (Share-based Job Coscheduling), we have introduced our own original framework to employ loosely coordinated dynamic coscheduling and a dynamic directory service in support of scheduling cross-site jobs in grid scheduling. SCOJO guarantees effective CPU shares by taking coscheduling effects into consideration and supports both time and CPU share reservation for cross-site job. However, coscheduling leads to high memory pressure and still involves problems like fragmentation and context-switch overhead, especially when applying higher multiprogramming levels. As main part of this thesis, we employ gang scheduling as more directly suitable approach for combined space-time sharing and extend SCOJO for clusters to incorporate adaptive space sharing into gang scheduling. We focus on taking advantage of moldable and malleable characteristics of realistic job mixes to dynamically adapt to varying system workloads and flexibly reduce fragmentation. In addition, our adaptive scheduling approach applies standard job-scheduling techniques like a priority and aging system, backfilling or easy backfilling. We demonstrate by the results of a discrete-event simulation that this dynamic adaptive space-time sharing approach can deliver better response times and bounded relative response times even with a lower multiprogramming level than traditional gang scheduling.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .H825. Source: Masters Abstracts International, Volume: 43-01, page: 0237. Adviser: A. Sodan. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    Simulation techniques in an artificial society model

    Get PDF
    Artificial society refers to a generic class of agent-based simulation models used to discover global social structures and collective behavior produced by simple local rules and interaction mechanisms. Artificial society models are applicable in a variety of disciplines, including the modeling of chemical and biological processes, natural phenomena, and complex adaptive systems. We focus on the underlying simulation techniques used in artificial society discrete-event simulation models, including model time evolution and computational performance.;Although for some applications synchronous time evolution is the correct modeling approach, many other applications are better represented using asynchronous time evolution. We claim that asynchronous time evolution can eliminate potential simulation artifacts produced using synchronous time evolution. Using an adaptation of a popular artificial society model, we show that very different output can result based solely on the choice of asynchronous or synchronous time evolution. Based on the event list implementation chosen, the use of discrete-event simulation to incorporate asynchronous time evolution can incur a substantial loss in computational performance. Accordingly, we evaluate select event list implementations within the artificial society simulation model and demonstrate that acceptable performance can be achieved.;In addition to the artificial society model, we show that transforming from a synchronous to an asynchronous system proves beneficial for scheduling resources in a parallel system. We focus on non-FCFS job scheduling policies that permit jobs to backfill, i.e., to move ahead in the queue, given that they do not delay certain previously submitted jobs. Instead of using a single queue of jobs, we propose a simple yet effective backfilling scheduling policy that effectively separates short from long jobs by incorporating multiple queues. By monitoring system performance, our policy adapts its configuration parameters in response to severe changes in the job arrival pattern and/or resource demands. Detailed performance comparisons via simulation using actual parallel workload traces indicate that our proposed policy consistently outperforms traditional backfilling in a variety of contexts

    Improving Gang Scheduling through Job Performance Analysis and Malleability

    No full text
    The OpenMP programming model provides parallel applications a very important feature: job malleability. Job malleability is the capacity of an application to dynamically adapt its parallelism to the number of processors allocated to it. We believe that job malleability provides to applications the flexibility that a system needs to achieve its maximum performance. We also defend that a system has to take its decisions not only based on user requirements but also based on run-time performance measurements to ensure the efficient use of resources. Job malleability is the application characteristic that makes possible the run-time performance analysis. Without malleability applications would not be able to adapt their parallelism to the system decisions. To support these ideas, we present two new approaches to attack the two main problems of Gang Scheduling: the excessive number of time slots and the fragmentation. Our first proposal is to apply a scheduling policy inside each time slot of Gang Scheduling to distribute processors among applications considering their efficiency, calculated based on runtime measurements. We call this policy Performance-Driven Gang Scheduling. Our second approach is a new re-packing algorithm, Compress&Join, that exploits the job malleability. This algorithm modifies the processor allocation of running applications to adapt it to the system necessities and minimize the fragmentation and number of time slots. These proposals have been implemented in a SGI Origin 2000 with 64 processors. Results show the validity and convenience of both, to consider the job performance analysis calculated at run-time to decide the processor allocation, and to use a flexible programming model that adapts applications to system decisions

    Systeme für Hochleistungsrechnen. Seminar SS 2003

    Get PDF
    Systeme für Hochleistungsrechnen sind Parallelrechner, die eingesetzt werden, wenn die Rechenleistung herkömmlicher Einzelprozessorsysteme nicht ausreicht. Die früher verwendeten, eng gekoppelten Multiprozessorsysteme werden, dem Trend zur globalen Vernetzung folgend, zunehmend durch preiswertere, lose gekoppelte Rechnerverbünde aus Standardrechnerknoten und Massenspeichern ersetzt. Die lose Kopplung ergibt vielfältige neue Herausforderungen in der Koordinierung zwischen den Rechnerknoten wie auch innerhalb jedes Knotens, um die Ressourcen im Verbund effizient nutzen zu können. Dies betrifft die koordinierte Zuteilung von Prozessoren und Speicher auf Prozesse ebenso wie die selbstorganisierende Abstimmung der Kommunikation zwischen den Knoten unter Berücksichtigung der Verbundtopologie. Vielfältige aktuell diskutierte Lösungsansätze von der Hardwareschicht über das Betriebssystem bis zur Anwendungsschicht werden in einer Reihe von Beiträgen, die im Rahmen des Seminars "Systeme für Hochleistungsrechnen" im Sommersemester 2003 erarbeitet wurden, aufgezeigt und erörtert
    corecore