1,121 research outputs found

    Parallel Astronomical Data Processing with Python: Recipes for multicore machines

    Full text link
    High performance computing has been used in various fields of astrophysical research. But most of it is implemented on massively parallel systems (supercomputers) or graphical processing unit clusters. With the advent of multicore processors in the last decade, many serial software codes have been re-implemented in parallel mode to utilize the full potential of these processors. In this paper, we propose parallel processing recipes for multicore machines for astronomical data processing. The target audience are astronomers who are using Python as their preferred scripting language and who may be using PyRAF/IRAF for data processing. Three problems of varied complexity were benchmarked on three different types of multicore processors to demonstrate the benefits, in terms of execution time, of parallelizing data processing tasks. The native multiprocessing module available in Python makes it a relatively trivial task to implement the parallel code. We have also compared the three multiprocessing approaches - Pool/Map, Process/Queue, and Parallel Python. Our test codes are freely available and can be downloaded from our website.Comment: 15 pages, 7 figures, 1 table, "for associated test code, see http://astro.nuigalway.ie/staff/navtejs", Accepted for publication in Astronomy and Computin

    East Lancashire Research 2007

    Get PDF

    Predictive analysis and optimisation of pipelined wavefront applications using reusable analytic models

    Get PDF
    Pipelined wavefront computations are an ubiquitous class of high performance parallel algorithms used for the solution of many scientific and engineering applications. In order to aid the design and optimisation of these applications, and to ensure that during procurement platforms are chosen best suited to these codes, there has been considerable research in analysing and evaluating their operational performance. Wavefront codes exhibit complex computation, communication, synchronisation patterns, and as a result there exist a large variety of such codes and possible optimisations. The problem is compounded by each new generation of high performance computing system, which has often introduced a previously unexplored architectural trait, requiring previous performance models to be rewritten and reevaluated. In this thesis, we address the performance modelling and optimisation of this class of application, as a whole. This differs from previous studies in which bespoke models are applied to specific applications. The analytic performance models are generalised and reusable, and we demonstrate their application to the predictive analysis and optimisation of pipelined wavefront computations running on modern high performance computing systems. The performance model is based on the LogGP parameterisation, and uses a small number of input parameters to specify the particular behaviour of most wavefront codes. The new parameters and model equations capture the key structural and behavioural differences among different wavefront application codes, providing a succinct summary of the operations for each application and insights into alternative wavefront application design. The models are applied to three industry-strength wavefront codes and are validated on several systems including a Cray XT3/XT4 and an InfiniBand commodity cluster. Model predictions show high quantitative accuracy (less than 20% error) for all high performance configurations and excellent qualitative accuracy. The thesis presents applications, projections and insights for optimisations using the model, which show the utility of reusable analytic models for performance engineering of high performance computing codes. In particular, we demonstrate the use of the model for: (1) evaluating application configuration and resulting performance; (2) evaluating hardware platform issues including platform sizing, configuration; (3) exploring hardware platform design alternatives and system procurement and, (4) considering possible code and algorithmic optimisations

    A proactive fault tolerance framework for high performance computing (HPC) systems in the cloud

    Get PDF
    High Performance Computing (HPC) systems have been widely used by scientists and researchers in both industry and university laboratories to solve advanced computation problems. Most advanced computation problems are either data-intensive or computation-intensive. They may take hours, days or even weeks to complete execution. For example, some of the traditional HPC systems computations run on 100,000 processors for weeks. Consequently traditional HPC systems often require huge capital investments. As a result, scientists and researchers sometimes have to wait in long queues to access shared, expensive HPC systems. Cloud computing, on the other hand, offers new computing paradigms, capacity, and flexible solutions for both business and HPC applications. Some of the computation-intensive applications that are usually executed in traditional HPC systems can now be executed in the cloud. Cloud computing price model eliminates huge capital investments. However, even for cloud-based HPC systems, fault tolerance is still an issue of growing concern. The large number of virtual machines and electronic components, as well as software complexity and overall system reliability, availability and serviceability (RAS), are factors with which HPC systems in the cloud must contend. The reactive fault tolerance approach of checkpoint/restart, which is commonly used in HPC systems, does not scale well in the cloud due to resource sharing and distributed systems networks. Hence, the need for reliable fault tolerant HPC systems is even greater in a cloud environment. In this thesis we present a proactive fault tolerance approach to HPC systems in the cloud to reduce the wall-clock execution time, as well as dollar cost, in the presence of hardware failure. We have developed a generic fault tolerance algorithm for HPC systems in the cloud. We have further developed a cost model for executing computation-intensive applications on HPC systems in the cloud. Our experimental results obtained from a real cloud execution environment show that the wall-clock execution time and cost of running computation-intensive applications in the cloud can be considerably reduced compared to checkpoint and redundancy techniques used in traditional HPC systems

    Developing graph-based co-scheduling algorithms on multicore computers

    Get PDF
    It is common that multiple cores reside on the same chip and share the on-chip cache. As a result, resource sharing can cause performance degradation of co-running jobs.Job co-scheduling is a technique that can effectively alleviate this contention and many co-schedulers have been reported in related literature. Most solutions however do not aim to find the optimal co-scheduling solution. Being able to determine the optimal solution is critical for evaluating co-scheduling systems. Moreover, most co-schedulers only consider serial jobs, and there often exist both parallel and serial jobs in real-world systems. In this paper a graph-based method is developed to find the optimal co-scheduling solution for serial jobs; the method is then extended to incorporate parallel jobs, including multi-process, and multithreaded parallel jobs. A number of optimization measures are also developed to accelerate the solving process. Moreover, a flexible approximation technique is proposed to strike a balance between the solving speed and the solution quality. Extensive experiments are conducted to evaluate the effectiveness of the proposed co-scheduling algorithms. The results show that the proposed algorithms can find the optimal co-scheduling solution for both serial and parallel jobs. The proposed approximation technique is also shown to be flexible in the sense that we can control the solving speed by setting the requirement for the solution quality

    Feasibility of a low-cost computing cluster in comparison to a high-performance computing cluster: A developing country perspective

    Get PDF
    In recent years, many organisations use high-performance computing clusters to, within a few days, perform complex simulations and calculations that otherwise would have taken years, even lifetimes, with a single computer. However, these high-performance computing clusters can be very expensive to purchase and maintain. For developing countries, these factors are viewed as barriers that will slow them in their quest to develop the necessary computing platforms to solve complex, real-world problems. From previous studies, it was unclear if an off-the-shelf personal computer (single computer) and low-cost computing clusters are feasible alternatives to high- performance computing clusters for smaller scientific problems. The aim of this study was to investigate this gap in literature since according to our knowledge, this kind of study has not been conducted before. The study made use of High Performance Linpack benchmark applications to collect quantitative data comparing the time-to-complete, operational costs and computational efficiency of a single computer, a low-cost computing cluster and a high-performance cluster. The benchmark used the HPL main algorithm and matrix sizes for the n x n dense linear system ranged from 10 000 to 60 0000. The costs of the low-cost computing cluster were kept to the minimum (USD4000.00) and the cluster was constructed using locally available computer hardware components. In this study for the cases we studied, we found that a low-cost computing cluster was a viable alternative to a high-performance cluster if the environment requires that costs be kept to a minimum. We concluded that for smaller scientific problems, both the single computer and low- cost computing cluster was better alternatives to a high-performance cluster. However, with large scientific problems and where performance and time are of more importance than costs, a high- performance cluster is still the best solution, offering the best efficiency for both theoretical energy consumption and computation
    corecore