5 research outputs found

    A comparative study on different parallel solvers for nonlinear analysis of complex structures

    Get PDF
    The parallelization of 2D/3D software SAPTIS is discussed for nonlinear analysis of complex structures. A comparative study is made on different parallel solvers. The numerical models are presented, including hydration models, water cooling models, modulus models, creep model, and autogenous deformation models. A finite element simulation is made for the whole process of excavation and pouring of dams using these models. The numerical results show a good agreement with the measured ones. To achieve a better computing efficiency, four parallel solvers utilizing parallelization techniques are employed: (1) a parallel preconditioned conjugate gradient (PCG) solver based on OpenMP, (2) a parallel preconditioned Krylov subspace solver based on MPI, (3) a parallel sparse equation solver based on OpenMP, and (4) a parallel GPU equation solver. The parallel solvers run either in a shared memory environment OpenMP or in a distributed memory environment MPI. A comparative study on these parallel solvers is made, and the results show that the parallelization makes SAPTIS more efficient, powerful, and adaptable

    A performance focused, development friendly and model aided parallelization strategy for scientific applications

    Get PDF
    The amelioration of high performance computing platforms has provided unprecedented computing power with the evolution of multi-core CPUs, massively parallel architectures such as General Purpose Graphics Processing Units (GPGPUs) and Many Integrated Core (MIC) architectures such as Intel\u27s Xeon phi coprocessor. However, it is a great challenge to leverage capabilities of such advanced supercomputing hardware, as it requires efficient and effective parallelization of scientific applications. This task is difficult mainly due to complexity of scientific algorithms coupled with the variety of available hardware and disparate programming models. To address the aforementioned challenges, this thesis presents a parallelization strategy to accelerate scientific applications that maximizes the opportunities of achieving speedup while minimizing the development efforts. Parallelization is a three step process (1) choose a compatible combination of architecture and parallel programming language, (2) translate base code/algorithm to a parallel language and (3) optimize and tune the application. In this research, a quantitative comparison of run time for various implementations of k-means algorithm, is used to establish that native languages (OpenMP, MPI, CUDA) perform better on respective architectures as opposed to vendor-neutral languages such as OpenCL. A qualitative model is used to select an optimal architecture for a given application by aligning the capabilities of accelerators with characteristics of the application. Once the optimal architecture is chosen, the corresponding native language is employed. This approach provides the best performance with reasonable accuracy (78%) of predicting a fitting combination, while eliminating the need for exploring different architectures individually. It reduces the required development efforts considerably as the application need not be re-written in multiple languages. The focus can be solely on optimization and tuning to achieve the best performance on available architectures with minimized investment in terms of cost and efforts. To verify the prediction accuracy of the qualitative model, the OpenDwarfs benchmark suite, which implements the Berkeley\u27s dwarfs in OpenCL, is used. A dwarf is an algorithmic method that captures a pattern of computation and communication. For the purpose of this research, the focus is on 9 application from various algorithmic domains that cover the seven dwarfs of symbolic computation, which were identified by Phillip Colella, as omnipresent in scientific and engineering applications. To validate the parallelization strategy collectively, a case study is undertaken. This case study involves parallelization of the Lower Upper Decomposition for the Gaussian Elimination algorithm from the linear algebra domain, using conventional trial and error methods as well as the proposed \u27Architecture First, Language Later\u27\u27 strategy. The development efforts incurred are contrasted for both methods. The aforesaid proposed strategy is observed to reduce the development efforts by an average of 50%

    Models for Type I X-Ray Bursts Nucleosynthesis with Parallelisation and Improved Nuclear Physics

    Get PDF
    Type I XRBs are thermonuclear flashes on the surface of neutron stars (NS) associated with mass-accretion from a companion star. Models of type I XRBs and their associated nucleosynthesis are physically complicated and extremely intense as regards the huge computational power required to model the physical processes played out, with the required precision to be truly representative. Until recently, because of these computational limitations, studies of XRB nucleosynthesis have been performed using limited nuclear reaction networks. In the bid to overcome this hurdle, parallel computing has been raised as the main permitting factor of yet more precise and computationally intensive simulations as it offers the potential to concentrate computational resources on intensive computational problems. In this Work, we present a parallelisation of two different applications; a one-zone (i.e. parameterized) nucleosynthesis code, and a one-dimensional (spherically symmetric), hydrodynamic code, in Lagrangian formulation (hereafter SHIVA code), built originally to model classical nova outbursts (José 1996; José & Hernanz 1998). The codes have been parallelised using the MPICH2 implementation of the Message Passing Interface (MPI) specification for the design of parallel applications using clusters of distributed workstations. As an example, to execute a hydrodynamic simulation along 200k time-steps, the SHIVA code requires (in its sequential, single-node version) about 147 hours (6.1 days) to complete when using a reduced nuclear network with 324 isotopes and 1392 nuclear reactions, and 688 hours (28.6 days) when using a network with 606 nuclides and 3551 nuclear reactions for the same number of time-steps. The post-processing nucleosynthesis code is a time-step loosely synchronous application with a very small problem size (limited by the number of isotopes of the nuclear network). As shown by the performance tests, this fact results in the worst possible scenario for parallelisation; results show that the performance of the parallel application is much worst than the sequential, 1-node version of the code. Our results show that it is therefore not possible to parallelise efficiently a post-processing nucleosynthesis code, and efforts in this regard should be avoided. On the contrary, the parallelised version of the SHIVA code yields excellent performance results. A speed-up factor of 26 is achieved in a simulation with a reduced network consisting of 324 isotopes and 1392 nuclear reactions when 42 processors are used in parallel to execute the application along 200k time-steps. On the other hand, an excellent speed-up factor of 35 is accomplished in a simulation with a reaction network up to 606 nuclides and 3551 nuclear reactions. Maximum speed-ups of ~41 and ~85 are predicted by the performance models when using 200 processors, for the reduced and extended simulations respectively. Our results will not only improve the quality of the simulations (and hence publications) in terms of better numerical approaches, finer approximations, and a considerably shorter time-to-publication, but also will allow taking advantage, if desired, of parallel supercomputing facilities like the Mare Nostrum at the Supercomputing Centre in Barcelona (BSC)

    Anwendung von Prädiktivreglern in Verbrennungsmotorsteuerungen

    Get PDF
    Das Ziel dieser Arbeit ist die Anwendbarkeit des numerisch anspruchsvollen modellprädiktiven Regelungskonzeptes innerhalb moderner Verbrennungsmotorsteuerungen zu erreichen. Durch simulative Untersuchungen wird die Eignung der modellprädiktiven Regelung zur Führungs- und Störungsregelung des Motordrehmoments und der Motordrehzahl belegt. Die praktische Anwendbarkeit wird anhand einer Implementierung in einem serienmäßigen Motorsteuergerät und einer anschließenden Fahrt im Fahrzeug auf einer Teststrecke gezeigt und diskutiert.This thesis focusses on usability of the numerically sophisticated model predictive control concept within modern engine control. The effectiveness of model predictive control for tracking and disturbance rejection regarding engine torque and engine speed is proved by simulation. The field of application is evaluated, proven and reflected on the basis of an implementation in an standard electronic control unit and a subsequent drive on a test track