4 research outputs found

    64-bit architechtures and compute clusters for high performance simulations

    Get PDF
    Simulation of large complex systems remains one of the most demanding of high performance computer systems both in terms of raw compute performance and efficient memory management. Recent availability of 64-bit architectures has opened up the possibilities of commodity computers accessing more than the 4 Gigabyte memory limit previously enforced by 32-bit addressing. We report on some performance measurements we have made on two 64-bit architectures and their consequences for some high performance simulations. We discuss performance of our codes for simulations of artificial life models; computational physics models of point particles on lattices; and with interacting clusters of particles. We have summarised pertinent features of these codes into benchmark kernels which we discuss in the context of wellknown benchmark kernels of the 32-bit era. We report on how these these findings were useful in the context of designing 64-bit compute clusters for high-performance simulations

    Performance modelling, analysis and prediction of Spark jobs in Hadoop cluster : a thesis by publications presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, School of Mathematical & Computational Sciences, Massey University, Auckland, New Zealand

    Get PDF
    Big Data frameworks have received tremendous attention from the industry and from academic research over the past decade. The advent of distributed computing frameworks such as Hadoop MapReduce and Spark are powerful frameworks that offer an efficient solution for analysing large-scale datasets running under the Hadoop cluster. Spark has been established as one of the most popular large-scale data processing engines because of its speed, low latency in-memory computation, and advanced analytics. Spark computational performance heavily depends on the selection of suitable parameters, and the configuration of these parameters is a challenging task. Although Spark has default parameters and can deploy applications without much effort, a significant drawback of default parameter selection is that it is not always the best for cluster performance. A major limitation for Spark performance prediction using existing models is that it requires either large input data or system configuration that is time-consuming. Therefore, an analytical model could be a better solution for performance prediction and for establishing appropriate job configurations. This thesis proposes two distinct parallelisation models for performance prediction: the 2D-Plate model and the Fully-Connected Node model. Both models were constructed based on serial boundaries for a certain arrangement of executors and size of the data. In order to evaluate the cluster performance, various HiBench workloads were used, and workload’s empirical data were fitted with the models for performance prediction analysis. The developed models were benchmarked with the existing models such as Amdahl’s, Gustafson, ERNEST, and machine learning. Our experimental results show that the two proposed models can quickly and accurately predict performance in terms of runtime, and they can outperform the accuracy of machine learning models when extrapolating predictions

    Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer

    Get PDF
    This paper presents some performance results obtained from a new Beowulf cluster, the Helix, built at Massey University, Auckland funded by the Allan Wilson Center for Evolutionary Ecology. Issues concerning network latency and the e#ect of the switching fabric and network topology on performance are discussed. In order to assess how the system performed using the message passing interface (MPI), two test suites (mpptest and jumpshot) were used to provide a comprehensive network performance analysis. The performance of an older fast-ethernet/single processor based cluster is compared to the new Gigabit/SMP cluster. The Linpack performance of Helix is investigated. The Linpack Rmax rating of 234.8 Gflops puts the cluster at third place in the Australia/ New Zealand sublist of the Top500 supercomputers, an extremely good performance considering the commodity parts and its low cost (US$125000

    Smart Distributed Processing Technologies For Hedge Fund Management

    Get PDF
    Distributed processing cluster design using commodity hardware and software has proven to be a technological breakthrough in the field of parallel and distributed computing. The research presented herein is the original investigation on distributed processing using hybrid processing clusters to improve the calculation efficiency of the compute-intensive applications. This has opened a new frontier in affordable supercomputing that can be utilised by businesses and industries at various levels. Distributed processing that uses commodity computer clusters has become extremely popular over recent years, particularly among university research groups and research organisations. The research work discussed herein addresses a bespoke-oriented design and implementation of highly specific and different types of distributed processing clusters with applied load balancing techniques that are well suited for particular business requirements. The research was performed in four phases, which are cohesively interconnected, to find a suitable solution using a new type of distributed processing approaches. The first phase is an implementation of a bespoke-type distributed processing cluster using an existing network of workstations as a calculation cluster based on a loosely coupled distributed process system design that has improved calculation efficiency of certain legacy applications. This approach has demonstrated how to design an innovative, cost-effective, and efficient way to utilise a workstation cluster for distributed processing. The second phase is to improve the calculation efficiency of the distributed processing system; a new type of load balancing system is designed to incorporate multiple processing devices. The load balancing system incorporates hardware, software and application related parameters to assigned calculation tasks to each processing devices accordingly. Three types of load balancing methods are tested, static, dynamic and hybrid, which each of them has their own advantages, and all three of them have further improved the calculation efficiency of the distributed processing system.   The third phase is to facilitate the company to improve the batch processing application calculation time, and two separate dedicated calculation clusters are built using small form factor (SFF) computers and PCs as separate peer-to-peer (P2P) network based calculation clusters. Multiple batch processing applications were tested on theses clusters, and the results have shown consistent calculation time improvement across all the applications tested. In addition, dedicated clusters are built using SFF computers with reduced power consumption, small cluster size, and comparatively low cost to suit particular business needs. The fourth phase incorporates all the processing devices available in the company as a hybrid calculation cluster utilises various type of servers, workstations, and SFF computers to form a high-throughput distributed processing system that consolidates multiple calculations clusters. These clusters can be utilised as multiple mutually exclusive multiple clusters or combined as a single cluster depending on the applications used. The test results show considerable calculation time improvements by using consolidated calculation cluster in conjunction with rule-based load balancing techniques. The main design concept of the system is based on the original design that uses first principle methods and utilises existing LAN and separate P2P network infrastructures, hardware, and software. Tests and investigations conducted show promising results where the company’s legacy applications can be modified and implemented with different types of distributed processing clusters to achieve calculation and processing efficiency for various applications within the company. The test results have confirmed the expected calculation time improvements in controlled environments and show that it is feasible to design and develop a bespoke-type dedicated distributed processing cluster using existing hardware, software, and low-cost SFF computers. Furthermore, a combination of bespoke distributed processing system with appropriate load balancing algorithms has shown considerable calculation time improvements for various legacy and bespoke applications. Hence, the bespoke design is better suited to provide a solution for the calculation of time improvements for critical problems currently faced by the sponsoring company
    corecore