996 research outputs found

    Sparse Automatic Differentiation for Large-Scale Computations Using Abstract Elementary Algebra

    Full text link
    Most numerical solvers and libraries nowadays are implemented to use mathematical models created with language-specific built-in data types (e.g. real in Fortran or double in C) and their respective elementary algebra implementations. However, built-in elementary algebra typically has limited functionality and often restricts flexibility of mathematical models and analysis types that can be applied to those models. To overcome this limitation, a number of domain-specific languages with more feature-rich built-in data types have been proposed. In this paper, we argue that if numerical libraries and solvers are designed to use abstract elementary algebra rather than language-specific built-in algebra, modern mainstream languages can be as effective as any domain-specific language. We illustrate our ideas using the example of sparse Jacobian matrix computation. We implement an automatic differentiation method that takes advantage of sparse system structures and is straightforward to parallelize in MPI setting. Furthermore, we show that the computational cost scales linearly with the size of the system.Comment: Submitted to ACM Transactions on Mathematical Softwar

    Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis

    Full text link
    Sympiler is a domain-specific code generator that optimizes sparse matrix computations by decoupling the symbolic analysis phase from the numerical manipulation stage in sparse codes. The computation patterns in sparse numerical methods are guided by the input sparsity structure and the sparse algorithm itself. In many real-world simulations, the sparsity pattern changes little or not at all. Sympiler takes advantage of these properties to symbolically analyze sparse codes at compile-time and to apply inspector-guided transformations that enable applying low-level transformations to sparse codes. As a result, the Sympiler-generated code outperforms highly-optimized matrix factorization codes from commonly-used specialized libraries, obtaining average speedups over Eigen and CHOLMOD of 3.8X and 1.5X respectively.Comment: 12 page

    Sparsing in Real Time Simulation

    Get PDF
    Modelling of mechatronical systems often leads to large DAEs with stiff components. In real time simulation neither implicit nor explicit methods can cope with such systems in an efficient way: explicit methods have to employ too small steps and implicit methods have to solve too large systems of equations. A solution of this general problem is to use a method that allows manipulations of the Jacobian by computing only those parts that are necessary for the stability of the method. Specifically, manipulation by sparsing aims at zeroing out certain elements of the Jacobian leading t a structure that can be exploited using sparse matrix techniques. The elements to be neglected are chosen by an a priori analysis phase that can be accomplished before the real-time simulaton starts. In this article a sparsing criterion for the linearly implicit Euler method is derived that is based on block diagnonalization and matrix perturbation theory

    Non-intrusive parallelization of multibody system dynamic simulations

    Get PDF
    [Abstract] This paper evaluates two non-intrusive parallelization techniques for multibody system dynamics: parallel sparse linear equation solvers and OpenMP. Both techniques can be applied to existing simulation software with minimal changes in the code structure; this is a major advantage over Message Passing Interface, the standard parallelization method in multibody dynamics. Both techniques have been applied to parallelize a starting sequential implementation of a global index-3 augmented Lagrangian formulation combined with the trapezoidal rule as numerical integrator, in order to solve the forward dynamics of a variable-loop four-bar mechanism. Numerical experiments have been performed to measure the efficiency as a function of problem size and matrix filling. Results show that the best parallel solver (Pardiso) performs better than the best sequential solver (CHOLMOD) for multibody problems of large and medium sizes leading to matrix fillings above 10. OpenMP also proved to be advantageous even for problems of small sizes. Both techniques delivered speedups above 70% of the maximum theoretical values for a wide range of multibody problems

    SCALABLE INTEGRATED CIRCUIT SIMULATION ALGORITHMS FOR ENERGY-EFFICIENT TERAFLOP HETEROGENEOUS PARALLEL COMPUTING PLATFORMS

    Get PDF
    Integrated circuit technology has gone through several decades of aggressive scaling.It is increasingly challenging to analyze growing design complexity. Post-layout SPICE simulation can be computationally prohibitive due to the huge amount of parasitic elements, which can easily boost the computation and memory cost. As the decrease in device size, the circuits become more vulnerable to process variations. Designers need to statistically simulate the probability that a circuit does not meet the performance metric, which requires millions times of simulations to capture rare failure events. Recent, multiprocessors with heterogeneous architecture have emerged as mainstream computing platforms. The heterogeneous computing platform can achieve highthroughput energy efficient computing. However, the application of such platform is not trivial and needs to reinvent existing algorithms to fully utilize the computing resources. This dissertation presents several new algorithms to address those aforementioned two significant and challenging issues on the heterogeneous platform. Harmonic Balance (HB) analysis is essential for efficient verification of large postlayout RF and microwave integrated circuits (ICs). However, existing methods either suffer from excessively long simulation time and prohibitively large memory consumption or exhibit poor stability. This dissertation introduces a novel transient-simulation guided graph sparsification technique, as well as an efficient runtime performance modeling approach tailored for heterogeneous manycore CPU-GPU computing system to build nearly-optimal subgraph preconditioners that can lead to minimum HB simulation runtime. Additionally, we propose a novel heterogeneous parallel sparse block matrix algorithm by taking advantages of the structure of HB Jacobian matrices as well as GPU’s streaming multiprocessors to achieve optimal workload balancing during the preconditioning phase of HB analysis. We also show how the proposed preconditioned iterative algorithm can efficiently adapt to heterogeneous computing systems with different CPU and GPU computing capabilities. Extensive experimental results show that our HB solver can achieve up to 20X speedups and 5X memory reduction when compared with the state-of-the-art direct solver highly optimized for twelve-core CPUs. In nowadays variation-aware IC designs, cell characterizations and SRAM memory yield analysis require many thousands or even millions of repeated SPICE simulations for relatively small nonlinear circuits. In this dissertation, for the first time, we present a massively parallel SPICE simulator on GPU, TinySPICE, for efficiently analyzing small nonlinear circuits. TinySPICE integrates a highly-optimized shared-memory based matrix solver and fast parametric three-dimensional (3D) LUTs based device evaluation method. A novel circuit clustering method is also proposed to improve the stability and efficiency of the matrix solver. Compared with CPU-based SPICE simulator, TinySPICE achieves up to 264X speedups for parametric SRAM yield analysis without loss of accuracy

    Software and hardware code generation for predictive control using splitting methods

    Get PDF
    This paper presents SPLIT, a C code generation tool for Model Predictive Control (MPC) based on operator splitting methods. In contrast to existing code generation packages, SPLIT is capable of generating both software and hardware-oriented C code to allow quick prototyping of optimization algorithms on conventional CPUs and field-programmable gate arrays (FPGAs). A Matlab interface is provided for compatibility with existing commercial and open-source software packages. A numerical study compares software, hardware and heterogeneous implementations of splitting methods and investigates MPC design trade-offs. For the considered testcases the reported speedup of hardware implementations over software realizations is 3x to 11x

    Stochastic simulation and robust design optimization of integrated photonic filters

    Get PDF
    Manufacturing variations are becoming an unavoidable issue in modern fabrication processes; therefore, it is crucial to be able to include stochastic uncertainties in the design phase. In this paper, integrated photonic coupled ring resonator filters are considered as an example of significant interest. The sparsity structure in photonic circuits is exploited to construct a sparse combined generalized polynomial chaos model, which is then used to analyze related statistics and perform robust design optimization. Simulation results show that the optimized circuits are more robust to fabrication process variations and achieve a reduction of 11%–35% in the mean square errors of the 3 dB bandwidth compared to unoptimized nominal designs.MIT Skoltech InitiativeProgetto Roberto Rocca (Seed Funds)National Science Foundation (U.S.) (AIM Photonics Center. Contract 1227020-EEC)Semiconductor Research Corporatio
    corecore