158 research outputs found

    A modelling approach to the evalution of computer system performance

    Get PDF
    Imperial Users onl

    Studies of some feedback control mechanisms in operating systems

    Get PDF
    PhD ThesisThe possibility of enhancing the effectiveness of an operating system by the introduction of appropriate feedback controls is explored by examining some resource allocation problems. The allocation of and I/O processors in a multiprogramming core, CPU demand paging environment is studied in terms of feedback control. A major part of this study is devoted to the application of feedback control concepts to core allocation to prevent thrashing and develop algorithms of practical value. To aid this study a simulator is developed which uses probability distributions to represent program behaviour. Successful algorithms are developed employing a two stage page replacement function which selects a process from which a page is then chosen to be replaced. Improving the performance of these algorithms by using a 'drain process' to aid the dynamic determination of the current locality of a process is also discussed. The complexity of the overall resource allocation problem is dealt with by employing a hierarchy of individual resource allocation policies. These control scheduling, core allocation and dispatching. By considering the levels of the hierarchy as separate feedback control systems the restrictions which must be placed upon the individual levels are derived. The extension of these results to further levels is also discussed.Science Research Counci

    Performance measurement and evaluation of time-shared operating systems

    Get PDF
    Time-shared, virtual memory systems are very complex and changes in their performance may be caused by many factors - by variations in the workload as well as changes in system configuration. The evaluation of these systems can thus best be carried out by linking results obtained from a planned programme of measurements, taken on the system, to some model of it. Such a programme of measurements is best carried out under conditions in which all the parameters likely to affect the system's performance are reproducible, and under the control of the experimenter. In order that this be possible the workload used must be simulated and presented to the target system through some form of automatic workload driver. A case study of such a methodology is presented in which the system (in this case the Edinburgh Multi-Access System) is monitored during a controlled experiment (designed and analysed using standard techniques in common use in many other branches of experimental science) and the results so obtained used to calibrate and validate a simple simulation model of the system. This model is then used in further investigation of the effect of certain system parameters upon the system performance. The factors covered by this exercise include the effect of varying: main memory size, process loading algorithm and secondary memory characteristics

    The evaluation of computer performance by means of state-dependent queueing network models

    Get PDF
    Imperial Users onl

    Intra-node Memory Safe GPU Co-Scheduling

    Get PDF
    [EN] GPUs in High-Performance Computing systems remain under-utilised due to the unavailability of schedulers that can safely schedule multiple applications to share the same GPU. The research reported in this paper is motivated to improve the utilisation of GPUs by proposing a framework, we refer to as schedGPU, to facilitate intra-node GPU co-scheduling such that a GPU can be safely shared among multiple applications by taking memory constraints into account. Two approaches, namely a client-server and a shared memory approach are explored. However, the shared memory approach is more suitable due to lower overheads when compared to the former approach. Four policies are proposed in schedGPU to handle applications that are waiting to access the GPU, two of which account for priorities. The feasibility of schedGPU is validated on three real-world applications. The key observation is that a performance gain is achieved. For single applications, a gain of over 10 times, as measured by GPU utilisation and GPU memory utilisation, is obtained. For workloads comprising multiple applications, a speed-up of up to 5x in the total execution time is noted. Moreover, the average GPU utilisation and average GPU memory utilisation is increased by 5 and 12 times, respectively.This work was funded by Generalitat Valenciana under grant PROMETEO/2017/77.Reaño González, C.; Silla Jiménez, F.; Nikolopoulos, DS.; Varghese, B. (2018). Intra-node Memory Safe GPU Co-Scheduling. IEEE Transactions on Parallel and Distributed Systems. 29(5):1089-1102. https://doi.org/10.1109/TPDS.2017.2784428S1089110229

    GPRM: a high performance programming framework for manycore processors

    Get PDF
    Processors with large numbers of cores are becoming commonplace. In order to utilise the available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge. In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads. We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations. We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi. The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without sacrificing productivity

    Asymmetric Cache Coherency: Policy Modifications to Improve Multicore Performance

    No full text
    International audienceAsymmetric coherency is a new optimisation method for coherency policies to support non-uniform work- loads in multicore processors. Asymmetric coherency assists in load balancing a workload and this is applica- ble to SoC multicores where the applications are not evenly spread among the processors and customization of the coherency is possible. Asymmetric coherency is a policy change, and consequently our designs re- quire little or no additional hardware over an existing system. We explore two different types of asymmetric coherency policies. Our bus based asymmetric coherency policy, generated a 60% coherency cost reduction (reduction of latencies due to coherency messages) for non-shared data. Our directory based asymmetric co- herency policy, showed up to a 5.8% execution time improvement and up to a 22% improvement in average memory latency for the parallel benchmarks Sha, using a statically allocated asymmetry. Dynamically allo- cated asymmetry was found to generate further improvements in access latency, increasing the effectiveness of asymmetric coherency by up to 73.8% when compared to the static asymmetric solution

    A comparative study of the performance of concurrency control algorithms in a centralised database

    Get PDF
    Abstract unavailable. Please refer to PDF

    Adaptive Real-Time Scheduling for Legacy Multimedia Applications

    Get PDF
    Multimedia applications are often executed on standard Personal Computers. The absence of established standards has hindered the adoption of real-time scheduling solutions in this class of applications. Developers have adopted a wide range of heuristic approaches to achieve an acceptable timing behaviour but the result is often unreliable. We propose a mechanism to extend the benefits of real-time scheduling to legacy applications based on the combination of two techniques: 1) a real-time monitor that observes and infers the activation period of the application, and 2) a feedback mechanism that adapts the scheduling parameters to improve its real-time performance
    corecore