653 research outputs found

    Evaluation of a local strategy for high performance memory management

    Get PDF
    Conventional operating systems, like Silicon Graphics' IRIX and IBM's AIX, adopt a single Memory Management algorithm. The choice of this algorithm is usually based on its good performance in relation to the set of programs executed in the computer. Some approximation of LRU (least­recently used) is usually adopted. This choice can take to certain situations in that the computer presents a bad performance due to its bad behavior for certain programs. A possible solution for such cases is to enable each program to have a specific Management algorithm (local strategy) that is adapted to its Memory access pattern. For example, programs with sequential access pattern, such as SOR, should be managed by the algorithm MRU (most­recently used) because its bad performance when managed by LRU. In this strategy it is very important to decide the Memory partitioning strategy among the programs in execution in a multiprogramming environment. Our strategy named CAPR (Compiler­Aided Page Replacement) analyze the pattern of Memory references from the source program of an application and communicate these characteristics to the operating system that will make the choice of the best Management algorithm and Memory partitioning strategy. This paper evaluates the influence of the Management algorithms and Memory partitioning strategy in the global system performance and in the individual performance of each program. It is also presented a comparison of this local strategy with the classic global strategy and the viability of the strategy is analyzed. The obtained results showed a difference of at least an order of magnitude in the number of page faults among the algorithms LRU and MRU in the global strategy. After that, starting from the analysis of the intrinsic behavior of each application in relation to its Memory access pattern and of the number of page faults, an optimization procedure of Memory system performance was developed for multiprogramming environments. This procedure allows to decide system performance parameters, such as Memory partitioning strategy among the programs and the appropriate Management algorithm for each program. The results showed that, with the local Management strategy, it was obtained a reduction of at least an order of magnitude in the number of page faults and a reduction in the mean Memory usage of about 3 to 4 times in relation to the global strategy. This performance improvement shows the viability of our strategy. It is also presented some implementation aspects of this strategy in traditional operating systems.Sistemas Distribuidos - Redes ConcurrenciaRed de Universidades con Carreras en Informática (RedUNCI

    The PISCES 2 parallel programming environment

    Get PDF
    PISCES 2 is a programming environment for scientific and engineering computations on MIMD parallel computers. It is currently implemented on a flexible FLEX/32 at NASA Langley, a 20 processor machine with both shared and local memories. The environment provides an extended Fortran for applications programming, a configuration environment for setting up a run on the parallel machine, and a run-time environment for monitoring and controlling program execution. This paper describes the overall design of the system and its implementation on the FLEX/32. Emphasis is placed on several novel aspects of the design: the use of a carefully defined virtual machine, programmer control of the mapping of virtual machine to actual hardware, forces for medium-granularity parallelism, and windows for parallel distribution of data. Some preliminary measurements of storage use are included

    Enabling preemptive multiprogramming on GPUs

    Get PDF
    GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments from mobile systems to cloud computing. These systems are usually running multiple applications, from one or several users. However GPUs do not provide the support for resource sharing traditionally expected in these scenarios. Thus, such systems are unable to provide key multiprogrammed workload requirements, such as responsiveness, fairness or quality of service. In this paper, we propose a set of hardware extensions that allow GPUs to efficiently support multiprogrammed GPU workloads. We argue for preemptive multitasking and design two preemption mechanisms that can be used to implement GPU scheduling policies. We extend the architecture to allow concurrent execution of GPU kernels from different user processes and implement a scheduling policy that dynamically distributes the GPU cores among concurrently running kernels, according to their priorities. We extend the NVIDIA GK110 (Kepler) like GPU architecture with our proposals and evaluate them on a set of multiprogrammed workloads with up to eight concurrent processes. Our proposals improve execution time of high-priority processes by 15.6x, the average application turnaround time between 1.5x to 2x, and system fairness up to 3.4x.We would like to thank the anonymous reviewers, Alexan- der Veidenbaum, Carlos Villavieja, Lluis Vilanova, Lluc Al- varez, and Marc Jorda on their comments and help improving our work and this paper. This work is supported by Euro- pean Commission through TERAFLUX (FP7-249013), Mont- Blanc (FP7-288777), and RoMoL (GA-321253) projects, NVIDIA through the CUDA Center of Excellence program, Spanish Government through Programa Severo Ochoa (SEV-2011-0067) and Spanish Ministry of Science and Technology through TIN2007-60625 and TIN2012-34557 projects.Peer ReviewedPostprint (author’s final draft

    "Virtual malleability" applied to MPI jobs to improve their execution in a multiprogrammed environment"

    Get PDF
    This work focuses on scheduling of MPI jobs when executing in shared-memory multiprocessors (SMPs). The objective was to obtain the best performance in response time in multiprogrammed multiprocessors systems using batch systems, assuming all the jobs have the same priority. To achieve that purpose, the benefits of supporting malleability on MPI jobs to reduce fragmentation and consequently improve the performance of the system were studied. The contributions made in this work can be summarized as follows:· Virtual malleability: A mechanism where a job is assigned a dynamic processor partition, where the number of processes is greater than the number of processors. The partition size is modified at runtime, according to external requirements such as the load of the system, by varying the multiprogramming level, making the job contend for resources with itself. In addition to this, a mechanism which decides at runtime if applying local or global process queues to an application depending on the load balancing between processes of it. · A job scheduling policy, that takes decisions such as how many processes to start with and the maximum multiprogramming degree based on the type and number of applications running and queued. Moreover, as soon as a job finishes execution and where there are queued jobs, this algorithm analyzes whether it is better to start execution of another job immediately or just wait until there are more resources available. · A new alternative to backfilling strategies for the problema of window execution time expiring. Virtual malleability is applied to the backfilled job, reducing its partition size but without aborting or suspending it as in traditional backfilling. The evaluation of this thesis has been done using a practical approach. All the proposals were implemented, modifying the three scheduling levels: queuing system, processor scheduler and runtime library. The impact of the contributions were studied under several types of workloads, varying machine utilization, communication and, balance degree of the applications, multiprogramming level, and job size. Results showed that it is possible to offer malleability over MPI jobs. An application obtained better performance when contending for the resources with itself than with other applications, especially in workloads with high machine utilization. Load imbalance was taken into account obtaining better performance if applying the right queue type to each application independently.The job scheduling policy proposed exploited virtual malleability by choosing at the beginning of execution some parameters like the number of processes and maximum multiprogramming level. It performed well under bursty workloads with low to medium machine utilizations. However as the load increases, virtual malleability was not enough. That is because, when the machine is heavily loaded, the jobs, once shrunk are not able to expand, so they must be executed all the time with a partition smaller than the job size, thus degrading performance. Thus, at this point the job scheduling policy concentrated just in moldability.Fragmentation was alleviated also by applying backfilling techniques to the job scheduling algorithm. Virtual malleability showed to be an interesting improvement in the window expiring problem. Backfilled jobs even on a smaller partition, can continue execution reducing memory swapping generated by aborts/suspensions In this way the queueing system is prevented from reinserting the backfilled job in the queue and re-executing it in the future.Postprint (published version

    Major Trends in Operating Systems Development

    Get PDF
    Operating systems have changed in nature in response to demands of users, and in response to advances in hardware and software technology. The purpose of this paper is to trace the development of major themes in operating system design from their beginnings through the present. This is not an exhaustive history of operating systems, but instead is intended to give the reader the flavor of the dif ferent periods in operating systems\u27 development. To this end, the paper will be organized by topic in approximate order of development. Each chapter will start with an introduction to the factors behind the rise of the period. This will be fol lowed by a survey of the state-of-the-art systems, and the conditions influencing them. The chapters close with a summation of the significant hardware and software contributions from the period

    Adaptive space-time sharing with SCOJO.

    Get PDF
    Coscheduling is a technique used to improve the performance of parallel computer applications under time sharing, i.e., to provide better response times than standard time sharing or space sharing. Dynamic coscheduling and gang scheduling are two main forms of coscheduling. In SCOJO (Share-based Job Coscheduling), we have introduced our own original framework to employ loosely coordinated dynamic coscheduling and a dynamic directory service in support of scheduling cross-site jobs in grid scheduling. SCOJO guarantees effective CPU shares by taking coscheduling effects into consideration and supports both time and CPU share reservation for cross-site job. However, coscheduling leads to high memory pressure and still involves problems like fragmentation and context-switch overhead, especially when applying higher multiprogramming levels. As main part of this thesis, we employ gang scheduling as more directly suitable approach for combined space-time sharing and extend SCOJO for clusters to incorporate adaptive space sharing into gang scheduling. We focus on taking advantage of moldable and malleable characteristics of realistic job mixes to dynamically adapt to varying system workloads and flexibly reduce fragmentation. In addition, our adaptive scheduling approach applies standard job-scheduling techniques like a priority and aging system, backfilling or easy backfilling. We demonstrate by the results of a discrete-event simulation that this dynamic adaptive space-time sharing approach can deliver better response times and bounded relative response times even with a lower multiprogramming level than traditional gang scheduling.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .H825. Source: Masters Abstracts International, Volume: 43-01, page: 0237. Adviser: A. Sodan. Thesis (M.Sc.)--University of Windsor (Canada), 2004
    • …
    corecore