220,486 research outputs found

    Architecture independent parallel selection with applications to parallel priority queues

    Get PDF
    AbstractWe present a randomized selection algorithm whose performance is analyzed in an architecture independent way on the bulk-synchronous parallel (BSP) model of computation along with an application of this algorithm to dynamic data structures, namely parallel priority queues. We show that our algorithms improve previous results upon both the communication requirements and the amount of parallel slack required to achieve optimal performance. We also establish that optimality to within small multiplicative constant factors can be achieved for a wide range of parallel machines. While these algorithms are fairly simple themselves, descriptions of their performance in terms of the BSP parameters is somewhat involved; the main reward of quantifying these complications is that it allows transportable software to be written for parallel machines that fit the model

    Parametric micro-level performance models for parallel computing and parallel implementation of hydrostatic MM5

    Get PDF
    This dissertation presents Parametric micro-level performance models and Parallel implementation of the hydrostatic version of MM5;Parametric micro-level (PM) performance models are introduced to address the important issue of how to realistically model parallel performance. These models can be used to predict execution times and identify performance bottlenecks. The accurate prediction and analysis of execution times is achieved by incorporating precise details of interprocessor communication, memory operations, auxiliary instructions, and effects of communication and computation schedules. The parameters provide the flexibility to study various algorithmic and architectural issues. The development and verification process, parameters and the scope of applicability of these models are discussed. A coherent view of performance is obtained from the execution profiles generated by PM models. The models are targeted at a large class numerical algorithms commonly implemented on both SIMD and MIMD machines. Specific models are presented for matrix multiplication, LU decomposition, and FFT on a 2-D processor array with distributed memory. A case study includes comparison of parallel machines and parallel algorithms. In a comparison of parallel machines, PM models are used to analyze execution times so as to relate the performance to architectural attributes of a machine. In a comparison of parallel algorithms, PM models are used to study performance of two LU decomposition algorithms: non-blocked and blocked. Two algorithms are compared to identify the tradeoffs between them. This analysis is useful to determine an optimum block size for the blocked algorithm. The case study is done on MasPar MP-1 and MP-2 machines;The dissertation also describes the parallel implementation of the hydrostatic version of MM5 (the fifth generation of Mesoscale Model), which has been widely used for climate studies. The model was parallelized in machine-independent manner using the Runtime System Library (RSL), a runtime library for handling message-passing and index transformation. The dissertation discusses validation of the parallel implementation of MM5 using field data and presents performance results. The parallel model was tested on the IBM SP1, a distributed memory parallel computer

    Throughput enhancement with parallel redundancy in multi-product flow line system

    Get PDF
    We develop a new analytic approximation method to replace a set of parallel machines by an equivalent machine in a series-parallel flow line with finite buffer. We develop our method based on discrete state Markov chain. The proposed technique replaces a set of parallel machines at a work centre by an equivalent machine in order to obtain a traditional flow line with machines in series separated by intermediate buffers. We derive equations for the parameters of the equivalent machine when it operates in isolation as well as in flow line. The existing analytic methods for series-parallel systems can tract only lines with a maximum of two machines in series and a buffer in-between them. The method we propose in this thesis can be used in conjunction with an approximation method or simulation to solve flow lines of any length. We also model and evaluate the performance of series-parallel systems manufacturing more than one product types with predefined sequence and lot size. We address this issue for a considerable longer flow line system with finite buffer which is common in industry. We consider the set-up time of the machines as the product type changes, deterministic processing times and operation dependent failures of the machines. We analyze the effects of buffer and number of machines in parallel on the performance of series-parallel systems

    Multiprocessing the Sieve of Eratosthenes

    Get PDF
    The Sieve of Eratosthenes for finding prime numbers in recent years has seen much use as a benchmark algorithm for serial computers while its intrinsically parallel nature has gone largely unnoticed. The implementation of a parallel version of this algorithm for a real parallel computer, the Flex/32, is described and its performance discussed. It is shown that the algorithm is sensitive to several fundamental performance parameters of parallel machines, such as spawning time, signaling time, memory access, and overhead of process switching. Because of the nature of the algorithm, it is impossible to get any speedup beyond 4 or 5 processors unless some form of dynamic load balancing is employed. We describe the performance of our algorithm with and without load balancing and compare it with theoretical lower bounds and simulated results. It is straightforward to understand this algorithm and to check the final results. However, its efficient implementation on a real parallel machine requires thoughtful design, especially if dynamic load balancing is desired. The fundamental operations required by the algorithm are very simple: this means that the slightest overhead appears prominently in performance data. The Sieve thus serves not only as a very severe test of the capabilities of a parallel processor but is also an interesting challenge for the programmer

    Analytical response time estimation in parallel relational database systems

    Get PDF
    Techniques for performance estimation in parallel database systems are well established for parameters such as throughput, bottlenecks and resource utilisation. However, response time estimation is a complex activity which is difficult to predict and has attracted research for a number of years. Simulation is one option for predicting response time but this is a costly process. Analytical modelling is a less expensive option but requires approximations and assumptions about the queueing networks built up in real parallel database machines which are often questionable and few of the papers on analytical approaches are backed by results from validation against real machines. This paper describes a new analytical approach for response time estimation that is based on a detailed study of different approaches and assumptions. The approach has been validated against two commercial parallel DBMSs running on actual parallel machines and is shown to produce acceptable accuracy

    Optimizing simulation on shared-memory platforms: The smart cities case

    Get PDF
    Modern advancements in computing architectures have been accompanied by new emergent paradigms to run Parallel Discrete Event Simulation models efficiently. Indeed, many new paradigms to effectively use the available underlying hardware have been proposed in the literature. Among these, the Share-Everything paradigm tackles massively-parallel shared-memory machines, in order to support speculative simulation by taking into account the limits and benefits related to this family of architectures. Previous results have shown how this paradigm outperforms traditional speculative strategies (such as data-separated Time Warp systems) whenever the granularity of executed events is small. In this paper, we show performance implications of this simulation-engine organization when the simulation models have a variable granularity. To this end, we have selected a traffic model, tailored for smart cities-oriented simulation. Our assessment illustrates the effects of the various tuning parameters related to the approach, opening to a higher understanding of this innovative paradigm

    Optimal Design of IPM Motors With Different Cooling Systems and Winding Configurations

    Get PDF
    Performance improvement of permanent magnet (PM) motors through optimization techniques has been widely investigated in the literature. Oftentimes the practice of design optimization leads to derivation/interpretation of optimal scaling rules of PM motors for a particular loading condition. This paper demonstrates how these derivations vary with respect to the machine ampere loading and ferrous core saturation level. A parallel sensitivity analysis using a second-order response surface methodology followed by a large-scale design optimization based on evolutionary algorithms are pursued in order to establish the variation of the relationships between the main design parameters and the performance characteristics with respect to the ampere loading and magnetic core saturation levels prevalent in the naturally cooled, fan-cooled, and liquid-cooled machines. For this purpose, a finite-element-based platform with a full account of complex geometry, magnetic core nonlinearities, and stator and rotor losses is used. Four main performance metrics including active material cost, power losses, torque ripple, and rotor PM demagnetization are investigated for two generic industrial PM motors with distributed and concentrated windings with subsequent conclusions drawn based on the results

    Performance analysis and optimization of parallel Best-First Search algorithms on multicore and cluster of multicore

    Get PDF
    The contribution of the thesis is the development of two parallel Best-First Search algorithms, one that is suitable for execution on shared-memory machines (multicore), and another one that is suitable for execution on distributed memory machines (cluster). The former is based on the adaptation of the HDA* (Hash Distributed A*) algorithm for multicore machines proposed by (Burns et al., 2010), while the latter is based on the HDA* (Hash Distributed A*) algorithm proposed by (Kishimoto, et al., 2013). The implemented algorithms incorporate parameters and/or techniques that improve their performance, with respect to the original algorithms proposed by the authors mentioned above.Es revisión de: http://sedici.unlp.edu.ar/handle/10915/44478Resumen de la tesis presentada por la autora para obtener el título de Doctor en Ciencias Informáticas (UNLP, 2015).Facultad de Informátic

    Performance analysis and optimization of parallel Best-First Search algorithms on multicore and cluster of multicore

    Get PDF
    The contribution of the thesis is the development of two parallel Best-First Search algorithms, one that is suitable for execution on shared-memory machines (multicore), and another one that is suitable for execution on distributed memory machines (cluster). The former is based on the adaptation of the HDA* (Hash Distributed A*) algorithm for multicore machines proposed by (Burns et al., 2010), while the latter is based on the HDA* (Hash Distributed A*) algorithm proposed by (Kishimoto, et al., 2013). The implemented algorithms incorporate parameters and/or techniques that improve their performance, with respect to the original algorithms proposed by the authors mentioned above.Es revisión de: http://sedici.unlp.edu.ar/handle/10915/44478Resumen de la tesis presentada por la autora para obtener el título de Doctor en Ciencias Informáticas (UNLP, 2015).Facultad de Informátic
    • …
    corecore