5,502 research outputs found

    A Semicoarsening Multigrid Algorithm for SIMD Machines

    Get PDF
    A semicoarsening multigrid algorithm suitable for use on single instruction multiple data (SIMD) architectures has been implemented on the CM-2. The method performs well for strongly anisotropic problems and for problems with coefficients jumping by orders of magnitude across internal interfaces. The parallel efficiency of this method is analyzed, and its actual performance is compared with its performance on some other machines, both parallel and nonparallel

    A multi-level preconditioned Krylov method for the efficient solution of algebraic tomographic reconstruction problems

    Full text link
    Classical iterative methods for tomographic reconstruction include the class of Algebraic Reconstruction Techniques (ART). Convergence of these stationary linear iterative methods is however notably slow. In this paper we propose the use of Krylov solvers for tomographic linear inversion problems. These advanced iterative methods feature fast convergence at the expense of a higher computational cost per iteration, causing them to be generally uncompetitive without the inclusion of a suitable preconditioner. Combining elements from standard multigrid (MG) solvers and the theory of wavelets, a novel wavelet-based multi-level (WMG) preconditioner is introduced, which is shown to significantly speed-up Krylov convergence. The performance of the WMG-preconditioned Krylov method is analyzed through a spectral analysis, and the approach is compared to existing methods like the classical Simultaneous Iterative Reconstruction Technique (SIRT) and unpreconditioned Krylov methods on a 2D tomographic benchmark problem. Numerical experiments are promising, showing the method to be competitive with the classical Algebraic Reconstruction Techniques in terms of convergence speed and overall performance (CPU time) as well as precision of the reconstruction.Comment: Journal of Computational and Applied Mathematics (2014), 26 pages, 13 figures, 3 table

    Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications

    Get PDF
    NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era

    Comparison of data-driven uncertainty quantification methods for a carbon dioxide storage benchmark scenario

    Full text link
    A variety of methods is available to quantify uncertainties arising with\-in the modeling of flow and transport in carbon dioxide storage, but there is a lack of thorough comparisons. Usually, raw data from such storage sites can hardly be described by theoretical statistical distributions since only very limited data is available. Hence, exact information on distribution shapes for all uncertain parameters is very rare in realistic applications. We discuss and compare four different methods tested for data-driven uncertainty quantification based on a benchmark scenario of carbon dioxide storage. In the benchmark, for which we provide data and code, carbon dioxide is injected into a saline aquifer modeled by the nonlinear capillarity-free fractional flow formulation for two incompressible fluid phases, namely carbon dioxide and brine. To cover different aspects of uncertainty quantification, we incorporate various sources of uncertainty such as uncertainty of boundary conditions, of conceptual model definitions and of material properties. We consider recent versions of the following non-intrusive and intrusive uncertainty quantification methods: arbitary polynomial chaos, spatially adaptive sparse grids, kernel-based greedy interpolation and hybrid stochastic Galerkin. The performance of each approach is demonstrated assessing expectation value and standard deviation of the carbon dioxide saturation against a reference statistic based on Monte Carlo sampling. We compare the convergence of all methods reporting on accuracy with respect to the number of model runs and resolution. Finally we offer suggestions about the methods' advantages and disadvantages that can guide the modeler for uncertainty quantification in carbon dioxide storage and beyond

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

    Efficient Multigrid Preconditioners for Atmospheric Flow Simulations at High Aspect Ratio

    Get PDF
    Many problems in fluid modelling require the efficient solution of highly anisotropic elliptic partial differential equations (PDEs) in "flat" domains. For example, in numerical weather- and climate-prediction an elliptic PDE for the pressure correction has to be solved at every time step in a thin spherical shell representing the global atmosphere. This elliptic solve can be one of the computationally most demanding components in semi-implicit semi-Lagrangian time stepping methods which are very popular as they allow for larger model time steps and better overall performance. With increasing model resolution, algorithmically efficient and scalable algorithms are essential to run the code under tight operational time constraints. We discuss the theory and practical application of bespoke geometric multigrid preconditioners for equations of this type. The algorithms deal with the strong anisotropy in the vertical direction by using the tensor-product approach originally analysed by B\"{o}rm and Hiptmair [Numer. Algorithms, 26/3 (2001), pp. 219-234]. We extend the analysis to three dimensions under slightly weakened assumptions, and numerically demonstrate its efficiency for the solution of the elliptic PDE for the global pressure correction in atmospheric forecast models. For this we compare the performance of different multigrid preconditioners on a tensor-product grid with a semi-structured and quasi-uniform horizontal mesh and a one dimensional vertical grid. The code is implemented in the Distributed and Unified Numerics Environment (DUNE), which provides an easy-to-use and scalable environment for algorithms operating on tensor-product grids. Parallel scalability of our solvers on up to 20,480 cores is demonstrated on the HECToR supercomputer.Comment: 22 pages, 6 Figures, 2 Table

    Lecture 12: Recent Advances in Time Integration Methods and How They Can Enable Exascale Simulations

    Get PDF
    To prepare for exascale systems, scientific simulations are growing in physical realism and thus complexity. This increase often results in additional and changing time scales. Time integration methods are critical to efficient solution of these multiphysics systems. Yet, many large-scale applications have not fully embraced modern time integration methods nor efficient software implementations. Hence, achieving temporal accuracy with new and complex simulations has proved challenging. We will overview recent advances in time integration methods, including additive IMEX methods, multirate methods, and parallel-in-time approaches, expected to help realize the potential of exascale systems on multiphysics simulations. Efficient execution of these methods relies, in turn, on efficient algebraic solvers, and we will discuss the relationships between integrators and solvers. In addition, an effective time integration approach is not complete without efficient software, and we will discuss effective software design approaches for time integrators and their uses in application codes. Lastly, examples demonstrating some of these new methods and their implementations will be presented. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-ABS- 819501

    Evaluation of Distributed Programming Models and Extensions to Task-based Runtime Systems

    Get PDF
    High Performance Computing (HPC) has always been a key foundation for scientific simulation and discovery. And more recently, deep learning models\u27 training have further accelerated the demand of computational power and lower precision arithmetic. In this era following the end of Dennard\u27s Scaling and when Moore\u27s Law seemingly still holds true to a lesser extent, it is not a coincidence that HPC systems are equipped with multi-cores CPUs and a variety of hardware accelerators that are all massively parallel. Coupling this with interconnect networks\u27 speed improvements lagging behind those of computational power increases, the current state of HPC systems is heterogeneous and extremely complex. This was heralded as a great challenge to the software stacks and their ability to extract performance from these systems, but also as a great opportunity to innovate at the programming model level to explore the different approaches and propose new solutions. With usability, portability, and performance as the main factors to consider, this dissertation first evaluates some of the widely used parallel programming models (MPI, MPI+OpenMP, and task-based runtime systems) ability to manage the load imbalance among the processes computing the LU factorization of a large dense matrix stored in the Block Low-Rank (BLR) format. Next I proposed a number of optimizations and implemented them in PaRSEC\u27s Dynamic Task Discovery (DTD) model, including user-level graph trimming and direct Application Programming Interface (API) calls to perform data broadcast operation to further extend the limit of STF model. On the other hand, the Parameterized Task Graph (PTG) approach in PaRSEC is the most scalable approach for many different applications, which I then explored the possibility of combining both the algorithmic approach of Communication-Avoiding (CA) and the communication-computation overlapping benefits provided by runtime systems using 2D five-point stencil as the test case. This broad programming models evaluation and extension work highlighted the abilities of task-based runtime system in achieving scalable performance and portability on contemporary heterogeneous HPC systems. Finally, I summarized the profiling capability of PaRSEC runtime system, and demonstrated with a use case its important role in the performance bottleneck identification leading to optimizations

    Modeling, Simulation and Control of Doubly-Fed Induction Machine Controlled by Back-to-Back converter

    Get PDF
    Aquesta Tesi estudia el control d'un sistema complex, un sistema d'emmagatzement d'energia cinètica, incloent les seves especificacions de control, modelat, disseny de controladors, simulacions, muntatge i validació experimental.Primerament, s'estudia l'interconnexió i control dels sistemes electromecànics. Es presenta el formalisme Hamiltonià (PCHS) en general, i després particularitzant en els sistemes electromecànics, inclòs els sistemes d'estructura variable (VSS).L'IDA-PBC (Interconnection and damping assignment-passivity based control) és una tècnica de control basat en els PCHS. En aquesta Tesi s'estudien el problemes que apareixen en controlar, per IDA-PBC, sortides de grau relatiu u quan el paràmetres nominals del controlador són incerts. Per evitar-los es proposa introduir una acció integral que pot ésser interpretada dins l'estructura Hamiltoniana.En aquesta Tesi també es presenten dues modificacions que permeten millorar el rang d'aplicacions de la tècnica IDA-PBC. Primer, es demostra que el fet de descomposar la tècnica de l'IDA-PBC en deformar la funció d'energia i una injecció de fregament, redueix el conjunt de sistemes que es poden estabilitzar mitjançant aquest mètode. Per evitar aquest problema, es proposa fer simultàniament els dos passos donant lloc a l'anomenat SIDA-PBC. Per altre costat, el mètode IDA-PBC requereix el coneixement de la funció energia (o Hamiltonià). Això representa un problema perquè, en general, el punt d'equilibri depèn de paràmetres incerts. En aquest treball es desenvolupa una metodologia per seleccionar l'estructura Hamiltoniana que redueix aquesta dependència dels paràmetres. Aquesta tècnica permet millorar la robustesa dels les sortides d'ordre relatiu superior a u.El sistema d'emmagatzement d'energia cinètica consisteix en una màquina d'inducció doblament alimentada (DFIM) amb un volant d'inèrcia, controlada pel rotor per un convertidor de potència back-to-back (B2B). L'objectiu és gestionar el flux d'energia entre la DFIM i una càrrega local connectada a la xarxa, commutant entre diferents punts de funcionament. Per això es planteja una gestió de l'energia, basada en la velocitat òptima de la DFIM.Pel què fa al control de la DFIM, es proposa un nou esquema de control que ofereix importants avantatges, i que és considerablement més senzill que el mètode clàssic, el vector control. Aquest nou controlador permet una fàcil descomposició de les potències activa i reactiva de l'estator, i el seu control a través de les tensions de rotor. Aquest disseny s'obté aplicant el procediment que millora la robustesa de l'IDA-PBC.S'han estudiat d'altres controladors, com el vector control clàssic. També a partir de la tècnica IDA-PBC, on l'equació en derivades parcials que apareix en aplicar el mètode es pot resoldre fixant l'energia en llaç tancat, i afegint nous termes a la matriu d'interconnexió. Per obtenir un controlador definit globalment s'afegeix un terme de fregament depenent dels estats, que desacobla la part elèctrica i mecànica del sistema. Finalment, també es prova que mitjançant el SIDA-PBC es pot modelar l'energia total (elèctrica i mecànica) de la DFIM. Tots aquest controladors han estat simulats i comparats. El controlador robust IDA-PBC s'ha validat experimentalment amb uns resultats satisfactoris. A la Tesi també es presenta un controlador que permet el flux bidireccional de potència pel B2B. L'estudi de la dinàmica zero adverteix que les tècniques de control estàndard no garanties en l'estabilitat en ambdós direccions, i per això s'utilitza un controlador IDA-PBC. Pel disseny s'utilitza un model basat en GSSA (generalized state space averaging), on es descomposa i es trunca el sistema per determinades freqüències, i que permet expresar els objectius de control (tensió constant al bus de contínua i factor de potència unitari) com un problema de regulació. Les simulacions i els resultats experimentals validen, tant la llei de control, com les simplificacions efectuades.Els controladors proposats i validats experimentalment són usats, finalment, per implementar la gestió de potència del sistema d'emmegatzement d'energia cinètica. Els resultats confirmen el bon comportament del sistema i dels controladors IDA-PBC proposats.This Thesis studies a complex multidomain system, the Flywheel Energy Storage System, including the control objectives specification, modeling, control design, simulation, experimental setup assembling and experimental validation stages.The port interconnection and control of electromechanical systems is studied. The port Hamiltonian formalism is presented in general, and particularized for generalized electromechanical systems, including variable structure systems (VSS).Interconnection and damping assignment-passivity based control (IDA-PBC) is a well known technique for port Hamiltonian systems (PCHS). In this Thesis we point out the kind of problems that can appear in the closed-loop structure obtained by IDA-PBC methodsfor relative degree one outputs, when nominal values are used in a system with uncertain parameters. To correct this, we introduce an integral control, which can be cast into the Hamiltonian framework.This Thesis also presents two new approaches which improve the range of applicability of the IDA-PBC technique. First, we show that the standard two-stage procedure used in IDA-PBC consisting of splitting the control action into the sum of energy-shaping and damping injection terms is not without loss of generality, and effectively reduces the set of systems that can be stabilized with IDA-PBC. To overcome this problem we suggest to carry out simultaneously both stages and refer to this variation of the method as SIDA-PBC.Secondly, we present an improvement of the IDA-PBC technique. The IDA-PBC method requires the knowledge of the full energy (or Hamiltonian) function. This is a problem because, in general, the equilibrium point which is to be regulated depends on uncertain parameters. We show how select the target port-Hamiltonian structure so that this dependence is reduced. This new approach allows to improve the robustness for higher relative degree outputs.The Flywheel Energy Storage System consists of a doubly-fed induction machine (DFIM), controlled through the rotor voltage by a power electronics subsystem (a back-to-back AC/AC converter (B2B)), and coupled to flywheel. The control objective is to optimally regulate the power flow between the DFIM and a local load connected to the grid, and this is achieved by commuting between different steady-state regimes. A police management based on the optimal speed for the DFIM is proposed.In this Thesis we propose a new control scheme for the DFIM that offers significant advantages, and is considerably simpler, than the classical vector control method. This controller allows an easy decomposition of the active and reactive powers on the stator side and their regulation, acting on the rotor voltage, via stator current control. This design was obtained applying the new robust IDA-PBC procedure.Other controllers are also designed along the dissertation. The classical vector control is studied. We also apply the classic IDA-PBC technique. It is shown that the partial differential equation that appears in this method can be circumvented by fixing the desired closed-loop total energy and adding new terms to the interconnection structure. Furthermore, to obtain a globally defined control law we introduce a state--dependent damping term that has the nice interpretation of effectively decoupling the electrical and mechanical parts of the system. This results in a globally convergent controller parameterized by two degrees of freedom. Finally, we also prove that with SIDA-PBC we can shape the total energy of the full (electrical and mechanical) dynamics of the DFIM. These different controllers (vector control, IDA-PBC, SIDA-PBC and robust IDA-PBC) are simulated and compared. The IDA-PBC robust controller is also experimentally tested and shown to work satisfactorily.A controller able to achieve bidirectional power flow for the B2B converter is presented. Standard techniques cannot be used since it is shown that no single output yields a stable zero dynamics for power flowing both ways. The controller is computed using standard IDA-PBC techniques for a suitable generalized state space averaging truncation of the system, which transforms the control objectives, namely constant output voltage dc-bus and unity input power factor, into a regulation problem. Simulation and experimental results for the full system confirm the correctness of the simplifications introduced to obtain the controller.The proposed and tested controllers for the DFIM and the B2B are used to implement the power management policy. These results show a good performance of the flywheel energy storage system and also validate the IDA-PBC technique, with the proposed improvements
    corecore