300 research outputs found

    Space sharing job scheduling policies for parallel computers

    Get PDF
    The distinguishing characteristic of space sharing parallel job scheduling policies is that applications are allocated non-overlapping processor subsets. The interference among jobs is reduced, the synchronization delays and message latencies can be predictable, and distinct processors may be allocated to cooperating processes so as to avoid the overhead of context switches associated with traditional time-multiplexing;The processor allocation strategy, the job selection criteria, and workload characteristics are fundamental factors that influence system performance under space sharing. Allocation can be static or dynamic. The processor subset allocated to an application is fixed under static space sharing, whereas it can change during execution under dynamic space sharing. Static allocation can produce more predictable run times, permits a wide range of compiler optimizations (e.g., static data distribution and binding), and avoids the processor releases and reallocations associated with dynamic allocation. Its major problem is that it can induce high processor fragmentation;In this dissertation, alternative static and dynamic space sharing policies that differ in the allocation discipline and the job selection criteria are studied. The results show that significantly superior performance can be achieved under static space sharing if applications can be folded (i.e., allocated fewer processors than they requested). Folding typically increases program efficiency and can reduce processor fragmentation. Policies that increase folding with the system load are proposed and compared to schemes that use unconstrained folding, no folding, and fixed maximum folding factors. The adaptive policies produced higher and more stable system utilization, significantly shorter mean response times, and good fairness curves. However, unconstrained folding resulted in considerably more severe processor fragmentation than no folding. Its advantage is that it exploits the efficiency improvement that typically results when an application is allocated fewer processors. Consequently, it can produce shorter mean response times than no folding under medium to heavy loads;Also because of this efficiency improvement, dynamic policies that reduce waiting times by executing a large number of jobs simultaneously are more promising than schemes that limit the number of active jobs. However, limiting the number of active applications can be the superior approach when folding does not improve application efficiency

    Concurrent Access Algorithms for Different Data Structures: A Research Review

    Get PDF
    Algorithms for concurrent data structure have gained attention in recent years as multi-core processors have become ubiquitous. Several features of shared-memory multiprocessors make concurrent data structures significantly more difficult to design and to verify as correct than their sequential counterparts. The primary source of this additional difficulty is concurrency. This paper provides an overview of the some concurrent access algorithms for different data structures

    Cache-affinity scheduling for fine grain multithreading

    Get PDF
    Cache utilisation is often very poor in multithreaded applications, due to the loss of data access locality incurred by frequent context switching. This problem is compounded on shared memory multiprocessors when dynamic load balancing is introduced and thread migration disrupts cache content. In this paper, we present a technique, which we refer to as ‘batching’, for reducing the negative impact of fine grain multithreading on cache performance. Prototype schedulers running on uniprocessors and shared memory multiprocessors are described, and finally experimental results which illustrate the improvements observed after applying our techniques are presented.peer-reviewe

    Scheduling support for concurrency and parallelism in the Mach operating system

    Full text link

    Accelerating sequential programs using FastFlow and self-offloading

    Full text link
    FastFlow is a programming environment specifically targeting cache-coherent shared-memory multi-cores. FastFlow is implemented as a stack of C++ template libraries built on top of lock-free (fence-free) synchronization mechanisms. In this paper we present a further evolution of FastFlow enabling programmers to offload part of their workload on a dynamically created software accelerator running on unused CPUs. The offloaded function can be easily derived from pre-existing sequential code. We emphasize in particular the effective trade-off between human productivity and execution efficiency of the approach.Comment: 17 pages + cove

    "Virtual malleability" applied to MPI jobs to improve their execution in a multiprogrammed environment"

    Get PDF
    This work focuses on scheduling of MPI jobs when executing in shared-memory multiprocessors (SMPs). The objective was to obtain the best performance in response time in multiprogrammed multiprocessors systems using batch systems, assuming all the jobs have the same priority. To achieve that purpose, the benefits of supporting malleability on MPI jobs to reduce fragmentation and consequently improve the performance of the system were studied. The contributions made in this work can be summarized as follows:· Virtual malleability: A mechanism where a job is assigned a dynamic processor partition, where the number of processes is greater than the number of processors. The partition size is modified at runtime, according to external requirements such as the load of the system, by varying the multiprogramming level, making the job contend for resources with itself. In addition to this, a mechanism which decides at runtime if applying local or global process queues to an application depending on the load balancing between processes of it. · A job scheduling policy, that takes decisions such as how many processes to start with and the maximum multiprogramming degree based on the type and number of applications running and queued. Moreover, as soon as a job finishes execution and where there are queued jobs, this algorithm analyzes whether it is better to start execution of another job immediately or just wait until there are more resources available. · A new alternative to backfilling strategies for the problema of window execution time expiring. Virtual malleability is applied to the backfilled job, reducing its partition size but without aborting or suspending it as in traditional backfilling. The evaluation of this thesis has been done using a practical approach. All the proposals were implemented, modifying the three scheduling levels: queuing system, processor scheduler and runtime library. The impact of the contributions were studied under several types of workloads, varying machine utilization, communication and, balance degree of the applications, multiprogramming level, and job size. Results showed that it is possible to offer malleability over MPI jobs. An application obtained better performance when contending for the resources with itself than with other applications, especially in workloads with high machine utilization. Load imbalance was taken into account obtaining better performance if applying the right queue type to each application independently.The job scheduling policy proposed exploited virtual malleability by choosing at the beginning of execution some parameters like the number of processes and maximum multiprogramming level. It performed well under bursty workloads with low to medium machine utilizations. However as the load increases, virtual malleability was not enough. That is because, when the machine is heavily loaded, the jobs, once shrunk are not able to expand, so they must be executed all the time with a partition smaller than the job size, thus degrading performance. Thus, at this point the job scheduling policy concentrated just in moldability.Fragmentation was alleviated also by applying backfilling techniques to the job scheduling algorithm. Virtual malleability showed to be an interesting improvement in the window expiring problem. Backfilled jobs even on a smaller partition, can continue execution reducing memory swapping generated by aborts/suspensions In this way the queueing system is prevented from reinserting the backfilled job in the queue and re-executing it in the future.Postprint (published version

    Lock-free Concurrent Data Structures

    Full text link
    Concurrent data structures are the data sharing side of parallel programming. Data structures give the means to the program to store data, but also provide operations to the program to access and manipulate these data. These operations are implemented through algorithms that have to be efficient. In the sequential setting, data structures are crucially important for the performance of the respective computation. In the parallel programming setting, their importance becomes more crucial because of the increased use of data and resource sharing for utilizing parallelism. The first and main goal of this chapter is to provide a sufficient background and intuition to help the interested reader to navigate in the complex research area of lock-free data structures. The second goal is to offer the programmer familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computin
    corecore