5 research outputs found

    Parallel algorithms for inductance extraction

    Get PDF
    In VLSI circuits, signal delays play an important role in design, timing verification and signal integrity checks. These delays are attributed to the presence of parasitic resistance, capacitance and inductance. With increasing clock speed and reducing feature sizes, these delays will be dominated by parasitic inductance. In the next generation VLSI circuits, with more than millions of components and interconnect segments, fast and accurate inductance estimation becomes a crucial step. A generalized approach for inductance extraction requires the solution of a large, dense, complex linear system that models mutual inductive effects among circuit elements. Iterative methods are used to solve the system without explicit computation of the system matrix itself. Fast hierarchical techniques are used to compute approximate matrix-vector products with the dense system matrix in a matrix-free way. Due to unavailability of system matrix, constructing a preconditioner to accelerate the convergence of the iterative method becomes a challenging task. This work presents a class of parallel algorithms for fast and accurate inductance extraction of VLSI circuits. We use the solenoidal basis approach that converts the linear system into a reduced system. The reduced system of equations is solved by a preconditioned iterative solver that uses fast hierarchical methods to compute products with the dense coefficient matrix. A GreenâÃÂÃÂs function based preconditioner is proposed that achieves near-optimal convergence rates in several cases. By formulating the preconditioner as a dense matrix similar to the coefficient matrix, we are able to use fast hierarchical methods for the preconditioning step as well. Experiments on a number of benchmark problems highlight the efficient preconditioning scheme and its advantages over FastHenry. To further reduce the solution time of the software, we have developed a parallel implementation. The parallel software package is capable of analyzing interconnects con- figurations involving several conductors within reasonable time. A two-tier parallelization scheme enables mixed mode parallelization, which uses both OpenMP and MPI directives. The parallel performance of the software is demonstrated through experiments on the IBM p690 and AMD Linux clusters. These experiments highlight the portability and efficiency of the software on multiprocessors with shared, distributed, and distributed-shared memory architectures

    Distribution independent parallel algorithms and software for hierarchical methods with applications to computational electromagnetics

    Get PDF
    Octrees are tree data structures used to represent multidimensional points in space. They are widely used in supporting hierarchical methods for scientific applications such as the N-body problem, molecular dynamics and smoothed particle hydrodynamics. The size of an octree is known to be dependent on the spatial distribution of points in the computational domain and is not just a function of the number of points. For this reason, run-time of an algorithm using octree that depends on the size of the octree is unknown for arbitrary distributions. In this thesis, we present the design and implementation of parallel algorithms for construction of compressed octrees and queries that are typically used by hierarchical methods. Our parallel algorithms and implementation strategies perform well irrespective of the spatial distribution of data, are communication efficient, and require no explicit load balancing. We also developed a software library which provides the functionality of parallel tree construction and various queries on compressed octrees. The purpose of the library is to enable rapid development of applications and to allow application developers to use efficient parallel algorithms without necessity of having detailed knowledge of the algorithms or of implementing them. To demonstrate the performance of our algorithms and to show the effectiveness of the library, we developed a complete end-to-end parallel electromagnetics code for computing the scattered electromagnetic fields from a Perfect Electrically Conducting surface. We used the functions provided by the software library to develop a Fast Multipole Method based solution to this problem. Experimental results show that our algorithms scale well and have bounded communication irrespective of the shape of the scatterer

    Applications on emerging paradigms in parallel computing

    Get PDF
    The area of computing is seeing parallelism increasingly being incorporated at various levels: from the lowest levels of vector processing units following Single Instruction Multiple Data (SIMD) processing, Simultaneous Multi-threading (SMT) architectures, and multi/many-cores with thread-level shared memory and SIMT parallelism, to the higher levels of distributed memory parallelism as in supercomputers and clusters, and scaling them to large distributed systems as server farms and clouds. All together these form a large hierarchy of parallelism. Developing high-performance parallel algorithms and efficient software tools, which make use of the available parallelism, is inevitable in order to harness the raw computational power these emerging systems have to offer. In the work presented in this thesis, we develop architecture-aware parallel techniques on such emerging paradigms in parallel computing, specifically, parallelism offered by the emerging multi- and many-core architectures, as well as the emerging area of cloud computing, to target large scientific applications. First, we develop efficient parallel algorithms to compute optimal pairwise alignments of genomic sequences on heterogeneous multi-core processors, and demonstrate them on the IBM Cell Broadband Engine. Then, we develop parallel techniques for scheduling all-pairs computations on heterogeneous systems, including clusters of Cell processors, and NVIDIA graphics processors. We compare the performance of our strategies on Cell, GPU and Intel Nehalem multi-core processors. Further, we apply our algorithms to specific applications taken from the areas of systems biology, fluid dynamics and materials science: pairwise Mutual Information computations for reconstruction of gene regulatory networks; pairwise Lp-norm distance computations for coherent structures discovery in the design of flapping-wing Micro Air Vehicles, and construction of stochastic models for a set of properties of heterogeneous materials. Lastly, in the area of cloud computing, we propose and develop an abstract framework to enable computations in parallel on large tree structures, to facilitate easy development of a class of scientific applications based on trees. Our framework, in the style of Google\u27s MapReduce paradigm, is based on two generic user-defined functions through which a user writes an application. We implement our framework as a generic programming library for a large cluster of homogeneous multi-core processor, and demonstrate its applicability through two applications: all-k-nearest neighbors computations, and Fast Multipole Method (FMM) based simulations

    Algorithmique hiérarchique parallèle haute performance pour les problèmes à N-corps

    Get PDF
    Cette thèse porte sur la méthode dite « méthode multipôle rapide » qui résout hiérarchiquement le problème à N-corps avec une complexité linéaire pour n'importe quelle précision. Dans le cadre de l'équation de Laplace, nous souhaitons pouvoir traiter efficacement toutes les distributions de particules rencontrées en astrophysique et en dynamique moléculaire. Nous étudions tout d'abord deux expressions distinctes du principal opérateur (« multipôle-to-local ») ainsi que les bornes d'erreur associées. Pour ces deux expressions, nous présentons une formulation matricielle dont l'implémentation avec des routines BLAS (Basic Linear Algebra Subprograms) permet d'améliorer fortement l'efficacité de calcul. Dans la gamme de précisions qui nous intéresse, cette approche se révèle plus performante que les améliorations existantes (FFT, rotations et ondes planes), pour des distributions uniformes ou non. Outre une nouvelle structure de données pour l'octree sous-jacent et des contributions algorithmiques à la version adaptative, nous avons aussi efficacement parallélisé notre méthode en mémoire partagée et en mémoire distribuée. Enfin, des comparaisons avec des codes dédiés justifient l'intérêt de notre code pour des simulations en astrophysique