158 research outputs found
A modelling approach to the evalution of computer system performance
Imperial Users onl
Studies of some feedback control mechanisms in operating systems
PhD ThesisThe possibility of enhancing the effectiveness of an
operating system by the introduction of appropriate
feedback controls is explored by examining some
resource allocation problems. The allocation of
and I/O processors in a multiprogramming core, CPU
demand paging environment is studied in terms of
feedback control.
A major part of this study is devoted to the application
of feedback control concepts to core allocation to
prevent thrashing and develop algorithms of practical
value. To aid this study a simulator is developed
which uses probability distributions to represent
program behaviour. Successful algorithms are developed
employing a two stage page replacement function which
selects a process from which a page is then chosen to
be replaced. Improving the performance of these
algorithms by using a 'drain process' to aid the
dynamic determination of the current locality of a
process is also discussed.
The complexity of the overall resource allocation
problem is dealt with by employing a hierarchy of
individual resource allocation policies. These
control scheduling, core allocation and dispatching. By considering the levels of the hierarchy as separate
feedback control systems the restrictions which must
be placed upon the individual levels are derived. The
extension of these results to further levels is also
discussed.Science Research Counci
Performance measurement and evaluation of time-shared operating systems
Time-shared, virtual memory systems
are very complex and changes in their performance may
be caused by many factors - by variations in the
workload as well as changes in system configuration.
The evaluation of these systems can thus best be
carried out by linking results obtained from a
planned programme of measurements, taken on the
system, to some model of it. Such a programme of
measurements is best carried out under conditions in
which all the parameters likely to affect the system's
performance are reproducible, and under the control of
the experimenter. In order that this be possible the
workload used must be simulated and presented to the
target system through some form of automatic
workload driver. A case study of such a methodology
is presented in which the system (in this case the
Edinburgh Multi-Access System) is monitored during a
controlled experiment (designed and analysed using
standard techniques in common use in many other branches
of experimental science) and the results so obtained
used to calibrate and validate a simple simulation
model of the system. This model is then used in
further investigation of the effect of certain system parameters upon the system performance. The
factors covered by this exercise include the effect
of varying: main memory size, process loading
algorithm and secondary memory characteristics
The evaluation of computer performance by means of state-dependent queueing network models
Imperial Users onl
Intra-node Memory Safe GPU Co-Scheduling
[EN] GPUs in High-Performance Computing systems remain under-utilised due to the unavailability of schedulers that can safely schedule multiple applications to share the same GPU. The research reported in this paper is motivated to improve the utilisation of GPUs by proposing a framework, we refer to as schedGPU, to facilitate intra-node GPU co-scheduling such that a GPU can be safely shared among multiple applications by taking memory constraints into account. Two approaches, namely a client-server and a shared memory approach are explored. However, the shared memory approach is more suitable due to lower overheads when compared to the former approach. Four policies are proposed in schedGPU to handle applications that are waiting to access the GPU, two of which account for priorities. The feasibility of schedGPU is validated on three real-world applications. The key observation is that a performance gain is achieved. For single applications, a gain of over 10 times, as measured by GPU utilisation and GPU memory utilisation, is obtained. For workloads comprising multiple applications, a speed-up of up to 5x in the total execution time is noted. Moreover, the average GPU utilisation and average GPU memory utilisation is increased by 5 and 12 times, respectively.This work was funded by Generalitat Valenciana under grant PROMETEO/2017/77.Reaño González, C.; Silla Jiménez, F.; Nikolopoulos, DS.; Varghese, B. (2018). Intra-node Memory Safe GPU Co-Scheduling. IEEE Transactions on Parallel and Distributed Systems. 29(5):1089-1102. https://doi.org/10.1109/TPDS.2017.2784428S1089110229
GPRM: a high performance programming framework for manycore processors
Processors with large numbers of cores are becoming commonplace. In order to utilise the
available resources in such systems, the programming paradigm has to move towards increased parallelism. However, increased parallelism does not necessarily lead to better performance. Parallel programming models have to provide not only flexible ways of defining
parallel tasks, but also efficient methods to manage the created tasks. Moreover, in a general-purpose system, applications residing in the system compete for the shared resources. Thread
and task scheduling in such a multiprogrammed multithreaded environment is a significant challenge.
In this thesis, we introduce a new task-based parallel reduction model, called the Glasgow Parallel Reduction Machine (GPRM). Our main objective is to provide high performance while maintaining ease of programming. GPRM supports native parallelism; it provides a modular way of expressing parallel tasks and the communication patterns between them. Compiling a GPRM program results in an Intermediate Representation (IR) containing useful information about tasks, their dependencies, as well as the initial mapping information. This compile-time information helps reduce the overhead of runtime task scheduling and is key to high performance. Generally speaking, the granularity and the number of tasks are major factors in achieving high performance. These factors are even more important in the case of GPRM, as it is highly dependent on tasks, rather than threads.
We use three basic benchmarks to provide a detailed comparison of GPRM with Intel OpenMP, Cilk Plus, and Threading Building Blocks (TBB) on the Intel Xeon Phi, and with GNU OpenMP on the Tilera TILEPro64. GPRM shows superior performance in almost all cases, only by controlling the number of tasks. GPRM also provides a low-overhead mechanism, called “Global Sharing”, which improves performance in multiprogramming situations.
We use OpenMP, as the most popular model for shared-memory parallel programming as the main GPRM competitor for solving three well-known problems on both platforms: LU factorisation of Sparse Matrices, Image Convolution, and Linked List Processing. We focus on proposing solutions that best fit into the GPRM’s model of execution. GPRM outperforms OpenMP in all cases on the TILEPro64. On the Xeon Phi, our solution for the LU Factorisation results in notable performance improvement for sparse matrices with large numbers of small blocks. We investigate the overhead of GPRM’s task creation and distribution for very short computations using the Image Convolution benchmark. We show that this overhead can be mitigated by combining smaller tasks into larger ones. As a result, GPRM can outperform OpenMP for convolving large 2D matrices on the Xeon Phi. Finally, we demonstrate that our parallel worksharing construct provides an efficient solution for Linked List processing and performs better than OpenMP implementations on the Xeon Phi.
The results are very promising, as they verify that our parallel programming framework for manycore processors is flexible and scalable, and can provide high performance without
sacrificing productivity
Asymmetric Cache Coherency: Policy Modifications to Improve Multicore Performance
International audienceAsymmetric coherency is a new optimisation method for coherency policies to support non-uniform work- loads in multicore processors. Asymmetric coherency assists in load balancing a workload and this is applica- ble to SoC multicores where the applications are not evenly spread among the processors and customization of the coherency is possible. Asymmetric coherency is a policy change, and consequently our designs re- quire little or no additional hardware over an existing system. We explore two different types of asymmetric coherency policies. Our bus based asymmetric coherency policy, generated a 60% coherency cost reduction (reduction of latencies due to coherency messages) for non-shared data. Our directory based asymmetric co- herency policy, showed up to a 5.8% execution time improvement and up to a 22% improvement in average memory latency for the parallel benchmarks Sha, using a statically allocated asymmetry. Dynamically allo- cated asymmetry was found to generate further improvements in access latency, increasing the effectiveness of asymmetric coherency by up to 73.8% when compared to the static asymmetric solution
A comparative study of the performance of concurrency control algorithms in a centralised database
Abstract unavailable. Please refer to PDF
Adaptive Real-Time Scheduling for Legacy Multimedia Applications
Multimedia applications are often executed on standard Personal Computers. The absence of established standards has hindered the adoption of real-time scheduling solutions in this class of applications. Developers have adopted a wide range of heuristic approaches to achieve an acceptable timing behaviour but the result is often unreliable. We propose a mechanism to extend the benefits of real-time scheduling to legacy applications based on the combination of two techniques: 1) a real-time monitor that observes and infers the activation period of the application, and 2) a feedback mechanism that adapts the scheduling parameters to improve its real-time performance
- …