7 research outputs found

    Expanding symmetric multiprocessor capability through gang scheduling

    Full text link

    Coarse-grain time sharing with advantageous overhead minimization for parallel job scheduling

    Get PDF
    Parallel job scheduling on cluster computers involves the usage of several strategies to maximize both the utilization of the hardware as well as the throughput at which jobs are processed. Another consideration is the response times, or how quickly a job finishes after submission. One possible solution toward achieving these goals is the use of preemption. Preemptive scheduling techniques involve an overhead cost typically associated with swapping jobs in and out of memory. As memory and data sets increase in size, overhead costs increase. Here is presented a technique for reducing the overhead incurred by swapping jobs in and out of memory as a result of preemption. This is done in the context of the Scojo-PECT preemptive scheduler. Additionally a design for expanding the existing Cluster Simulator to support analysis of scheduling overhead in preemptive scheduling techniques is presented. A reduction in the overhead incurred through preemptive scheduling by the application of standard fitting algorithms in a multi-state job allocation heuristic is shown

    Systeme für Hochleistungsrechnen. Seminar SS 2003

    Get PDF
    Systeme für Hochleistungsrechnen sind Parallelrechner, die eingesetzt werden, wenn die Rechenleistung herkömmlicher Einzelprozessorsysteme nicht ausreicht. Die früher verwendeten, eng gekoppelten Multiprozessorsysteme werden, dem Trend zur globalen Vernetzung folgend, zunehmend durch preiswertere, lose gekoppelte Rechnerverbünde aus Standardrechnerknoten und Massenspeichern ersetzt. Die lose Kopplung ergibt vielfältige neue Herausforderungen in der Koordinierung zwischen den Rechnerknoten wie auch innerhalb jedes Knotens, um die Ressourcen im Verbund effizient nutzen zu können. Dies betrifft die koordinierte Zuteilung von Prozessoren und Speicher auf Prozesse ebenso wie die selbstorganisierende Abstimmung der Kommunikation zwischen den Knoten unter Berücksichtigung der Verbundtopologie. Vielfältige aktuell diskutierte Lösungsansätze von der Hardwareschicht über das Betriebssystem bis zur Anwendungsschicht werden in einer Reihe von Beiträgen, die im Rahmen des Seminars "Systeme für Hochleistungsrechnen" im Sommersemester 2003 erarbeitet wurden, aufgezeigt und erörtert

    Performance Characteristics of Gang Scheduling in Multiprogrammed Environments

    No full text
    : Gang scheduling provides both space-slicing and time-slicing of computer resources for parallel programs. Each thread of execution from a parallel job is concurrently scheduled on an independent processor in order to achieve an optimal level of program performance. Time-slicing of parallel jobs provides for better overall system responsiveness and utilization than otherwise possible. Lawrence Livermore National Laboratory has deployed three generations of its gang scheduler on a variety of computing platforms. Results indicate the potential benefits of this technology to parallel processing are no less significant than time-sharing was in the 1960's. Keywords: gang scheduling, multiprogramming, parallel system, scheduling, space-slicing, time-slicing. Introduction Interest in parallel computers has been propelled by both the economics of commodity priced microprocessors and a growth rate in computational requirements exceeding processor speed increases. The symmetric multiprocess..

    Scratchpad Memory Management For Multicore Real-Time Embedded Systems

    Get PDF
    Multicore systems will continue to spread in the domain of real-time embedded systems due to the increasing need for high-performance applications. This research discusses some of the challenges associated with employing multicore systems for safety-critical real-time applications. Mainly, this work is concerned with providing: 1) efficient inter-core timing isolation for independent tasks, and 2) predictable task communication for communicating tasks. Principally, we introduce a new task execution model, based on the 3-phase execution model, that exploits the Direct Memory Access (DMA) controllers available in modern embedded platforms along with ScratchPad Memories (SPMs) to enforce strong timing isolation between tasks. The DMA and the SPMs are explicitly managed to pre-load tasks from main memory into the local (private) scratchpad memories. Tasks are then executed from the local SPMs without accessing main memory. This model allows CPU execution to be overlapped with DMA loading/unloading operations from and to main memory. We show that by co-scheduling task execution on CPUs and using DMA to access memory and I/O, we can efficiently hide access latency to physical resources. In turn, this leads to significant improvements in system schedulability, compared to both the case of unregulated contention for access to physical resources and to previous cache and SPM management techniques for real-time systems. The presented SPM-centric scheduling algorithms and analyses cover single-core, partitioned, and global real-time systems. The proposed scheme is also extended to support large tasks that do not fit entirely into the local SPM. Moreover, the schedulability analysis considers the case of recovering from transient soft errors (bit flips caused by a single event upset) in several levels of memories, that cannot be automatically corrected in hardware by the ECC unit. The proposed SPM-centric scheduling is integrated at the OS level; thus it is transparent to applications. The proposed scheme is implemented and evaluated on an FPGA platform and a Commercial-Off-The-Shelf (COTS) platform. In regards to real-time task communication, two types of communication are considered. 1) Asynchronous inter-task communication, between either sequential tasks (single-threaded) or parallel tasks (multi-threaded). 2) Intra-task communication, where parallel threads of the same application exchange data. A new task scheduling model for parallel tasks (Bundled Scheduling) is proposed to facilitate intra-task communication and reduce synchronization overheads. We show that the proposed bundled scheduling model can be applied to several parallel programming models, such as fork-join and DAG-based applications, leading to improved system schedulability. Finally, intra-task communication is governed by a predictable inter-core communication platform. Specifically, we propose HopliteRT, a lean and predictable Network-on-Chip that connects the private SPMs

    Applications Development for the Computational Grid

    Get PDF
    corecore