58 research outputs found

    Automated progress identification and feedback in large experimental laboratories

    Get PDF
    In this work we describe a novel web-based system whose aim is to enhance the learning environment within experimental laboratories, and report on its deployment in undergraduate computer architecture modules. Student progress is tracked and recorded throughout the practical work, and supervisory facilities are provided including the visualisation of the progress of everyone in the laboratory on a management console. The system delivers information concerning the practical work to be undertaken, and uses carefully designed sets of questions based on the observations to be made by students in the laboratory. The responses made in this system are used to feedback further specific information to the student to aid their individual progress

    Predicting the cache miss ratio of loop-nested array references

    Get PDF
    The time a program takes to execute can be massively affected by the efficiency with which it utilizes cache memory. Moreover the cache-miss behavior of a program can be highly unpredictable, in that small changes to input parameters can cause large changes in the number of misses. In this paper we present novel analytical models of the cache behavior of programs consisting mainly of array operations inside nested loops, for direct-mapped caches. The models are used to predict the miss-ratios of three example loop nests; the results are shown to be largely within ten percent of simulated values. A significant advantage is that the calculation time is proportional to the number of array references in the program, typically several orders of magnitude faster than traditional cache simulation methods

    Efficient load balancing techniques for image analysis on an M-SIMD machine

    Get PDF
    The computational requirements for the real time processing of image sequences is sufficiently high that some form of parallel hardware is essential. In the analysis of a sequence of images the areas of interest are moving objects which usually occupy only small distinct areas within the full field of view. A single instruction multiple data (SIMD) machine has considerable advantages for these types of operations where there is a high requirement for data parallel processing. However, on conventional SIMD machines, only the processors to which the moving objects are mapped onto have significant work-load. The remaining processors are idle during most of the processing period resulting in significant load imbalance and poor utilisation. We describe here load balancing techniques for a Multiple-SIMD (M-SIMD) machine, consisting of a number of small conventional SIMD arrays (patches) connected together to form a larger M-SIMD array. Each SIMD patch can perform independent computations. Using the M-SIMD configuration idle processors can be re-allocated to process active regions of other images from an image sequence or from multiple sensors, significantly increasing the throughput and flexibility of the system. A TvotingU algorithm is presented for the calculation of the minimum number of patches the object is mapped onto along with a heuristic (near optimum) patch allocation process

    A layered approach to parallel software performance prediction : a case study

    Get PDF
    An approach to the characterisation of parallel systems using a structured layered methodology is described here. The aim of this is to produce accurate performance predictions which maybe used to influence the choice of machines and investigate implementation trade-offs. The methodology described enables the separate characterisation of both application, and parallel machine to be developed independently but integrated through an intermediary layer encompassing mapping and parallelisation techniques. The layered approach enables characterisations which are modular, re-usable, and can be evaluated using analytical techniques. The approach is based upon methods introduced in Software Performance Engineering (SPE) and structural model decomposition but due to its modular nature, takes less time for development. A case study in image synthesis is considered in which factors from both the application and parallel system are investigated, including the accuracy of predictions, the parallelisation strategy, and scaling behaviour

    Analytical modeling of set-associative cache behaviour

    Get PDF
    Cache behavior is complex and inherently unstable, yet is a critical factor affecting program performance. A method of evaluating cache performance is required, both to give quantitative predictions of miss-ratio, and information to guide optimization of cache use. Traditional cache simulation gives accurate predictions of miss-ratio, but little to direct optimization. Also, the simulation time is usually far greater than the program execution time. Several analytical models have been developed, but concentrate mainly on direct-mapped caches, often for specific types of algorithm, or to give qualitative predictions. In this work novel analytical models of cache phenomena are presented, applicable to numerical codes consisting mostly of array operations in looping constructs. Set-associative caches are considered, through an extensive hierarchy of cache reuse and interference effects, including numerous forms of temporal and spatial locality. Models of each effect are given, which, when combined, predict the overall miss-ratio. An advantage is that the models also indicate sources of cache interference. The accuracy of the models is validated through example program fragments. The predicted miss-ratios are compared with simulations, and shown typically to be within fifteen percent. The evaluation time of the models is shown to be independent of the problem size, typically several orders of magnitude faster than simulation

    Performance optimisation of a lossless compression algorithm using the PACE toolkit

    Get PDF
    With the large increase in interest and research in distributed systems, the need for performance prediction and modelling of such systems has become important to decrease the system's complexity. One such prediction technique is PACE, a series of performance modelling tools developed at Warwick, which allows a user to create very accurate performance models of sequential and distributed systems. Such a model can be used in an application steering method, where a performance model is added to the overall system such that the application can be optimised to match a particular constraint. To describe this method, a lossless compression technique, also developed at Warwick, is optimised to fit a distributed system where a time constraint is necessary. It is shown that the compression ratio gained by the compression technique grows, as expected, while the time constraint given to the application becomes more relaxed

    An analysis of processor resource models for use in performance prediction

    Get PDF
    With the increasing sophistication of both software and hardware systems, methodologies to analyse and predict system performance is a topic of vital interest. This is particularly true for parallel systems where there is currently a wide choice of both architectural and parallelisation options; and where the costs are likely to be high. Performance data is vital to a diffuse range of users including; system developers, application programmers and tuning experts. However, the level of sophistication and accuracy required by each of these users is substantially different. In this paper characterisation based technique is described (considering both application and hardware resources) which addresses these issues directly. Initially, a framework is described for characterisation based approaches. A classification scheme is used to illustrate differences in the level of sophistication and detail in which the underlying resource models are specified. Finally, verification is provided of the characterisation techniques applied to several application kernels on a MIMD system. The performance predictions and error bounds resulting from the level of resource specification are also discussed

    An introduction to the CHIP3S language for characterising parallel systems in performance studies

    Get PDF
    A characterisation toolset, Characterisation Instrumentation for Performance Prediction of Parallel Systems (CHIP3S), for predicting the performance of parallel systems is presented in this report. In this toolset expert knowledge about the performance evaluation techniques is not required as a prerequisite for the user. Instead a declarative approach to the performance study is taken by describing the application in a way that is both intuitive to the user, but can also be used to obtain performance results. The underlying performance related characterisation models and their evaluation processes are hidden from the user. This document describes the special purpose language, and the evaluation system, that form the core of the CHIP3S toolset. Amongst the aims of the toolset is the support of characterisation model reusability, ease of experimentation, provide different levels of prediction accuracy, and support of different levels of characterisation model abstraction

    Theory and operation of the Warwick multiprocessor scheduling (MS) system

    Get PDF
    This paper is concerned with the application of performance prediction techniques to the optimisation of parallel systems, and, in particular, the use of these techniques on-the-fly for optimising performance at run-time. In contrast to other performance tools, performance prediction results are made available very rapidly, which allows their use in real-time environments. When applied to program optimisation, this allows consideration of run-time variables such as input data and resource availability that are not, in general, available during the traditional (ahead-of-time) performance tuning stage. The main contribution of this work is the application of predictive performance data to the scheduling of a number of parallel tasks across a large heterogeneous distributed computing system. This is achieved through use of just-in-time performance prediction coupled with iterative heuristic algorithms for optimisation of the meta-schedule. The paper describes the main theoretical considerations for development of such a scheduling system, and then describes a prototype implementation, the MS scheduling system, together with some results obtained from this system when operated over a medium-sized (campus-wide) distributed computing network

    Characterising computational kernels : a case study

    Get PDF
    We describe the characterisation of an application kernel on a parallel system of two processors. The application kernel is a one-dimensional Fast Fourier Transform (FFT) and the processors used are two T800 transputers. Analytical expressions for the "execution time" for a single and two processors are discussed and used to obtain the performance measure
    • …
    corecore