52 research outputs found

    Iterative and doubling algorithms for Riccati-type matrix equations: a comparative introduction

    Full text link
    We review a family of algorithms for Lyapunov- and Riccati-type equations which are all related to each other by the idea of \emph{doubling}: they construct the iterate Qk=X2kQ_k = X_{2^k} of another naturally-arising fixed-point iteration (Xh)(X_h) via a sort of repeated squaring. The equations we consider are Stein equations X−A∗XA=QX - A^*XA=Q, Lyapunov equations A∗X+XA+Q=0A^*X+XA+Q=0, discrete-time algebraic Riccati equations X=Q+A∗X(I+GX)−1AX=Q+A^*X(I+GX)^{-1}A, continuous-time algebraic Riccati equations Q+A∗X+XA−XGX=0Q+A^*X+XA-XGX=0, palindromic quadratic matrix equations A+QY+A∗Y2=0A+QY+A^*Y^2=0, and nonlinear matrix equations X+A∗X−1A=QX+A^*X^{-1}A=Q. We draw comparisons among these algorithms, highlight the connections between them and to other algorithms such as subspace iteration, and discuss open issues in their theory.Comment: Review article for GAMM Mitteilunge

    Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science

    Get PDF
    Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science is summarized

    Predictive and distributed routing balancing (PR-DRB) : high speed interconnection networks

    Get PDF
    Current parallel applications running on clusters require the use of an interconnection network to perform communications among all computing nodes available. Imbalance of communications can produce network congestion, reducing throughput and increasing latency, degrading the overall system performance. On the other hand, parallel applications running on these networks posses representative stages which allow their characterization, as well as repetitive behavior that can be identified on the basis of this characterization. This work presents the Predictive and Distributed Routing Balancing (PR-DRB), a new method developed to gradually control network congestion, based on paths expansion, traffic distribution and effective traffic load, in order to maintain low latency values. PR-DRB monitors messages latencies on intermediate routers, makes decisions about alternative paths and record communication pattern information encountered during congestion situation. Based on the concept of applications repetitiveness, best solution recorded are reapplied when saved communication pattern re-appears. Traffic congestion experiments were conducted in order to evaluate the performance of the method, and improvements were observed.Les aplicacions paral·leles actuals en els Clústers requereixen l'ús d'una xarxa d'interconnexió per comunicar a tots els nodes de còmput disponibles. El desequilibri en la càrrega de comunicacions pot congestionar la xarxa, incrementant la latència i disminuint el throughput, degradant el rendiment total del sistema. D'altra banda, les aplicacions paral·leles que s'executen sobre aquestes xarxes contenen etapes representatives durant la seva execució les quals permeten caracteritzar-les, a més d'extraure un comportament repetitiu que pot ser identificat en base a aquesta caracterització. Aquest treball presenta el Balanceig Predictiu de Encaminament Distribuït (PR-DRB), un nou mètode desenvolupat per controlar la congestió a la xarxa en forma gradual, basat en l'expansió de camins, la distribució de trànsit i càrrega efectiva actual per tal de mantenir una latència baixa. PR-DRB monitoritza la latència dels missatges en els encaminadors, pren decisions sobre els camins alternatius a utilitzar i registra la informació de la congestió sobre la base del patró de comunicacions detectat, utilitzant com a concepte base la repetitivitat de les aplicacions per després tornar a aplicar la millor solució quan aquest patró es repeteixi. Experiments de trànsit amb congestió van ser portats a terme per avaluar el rendiment del mètode, els quals van mostrar la bondat del mateix.Las aplicaciones paralelas actuales en los Clústeres requieren el uso de una red de interconexión para comunicar a todos los nodos de cómputo disponibles. El desbalance en la carga de comunicaciones puede congestionar la red, incrementando la latencia y disminuyendo el throughput, degradando el rendimiento total del sistema. Por otro lado, las aplicaciones paralelas que corren sobre estas redes contienen etapas representativas durante su ejecución las cuales permiten caracterizarlas, además de un comportamiento repetitivo que puede ser identificado en base a dicha caracterización. Este trabajo presenta el Balanceo Predictivo de Encaminamiento Distribuido (PR-DRB), un nuevo método desarrollado para controlar la congestión en la red en forma gradual; basado en la expansión de caminos, la distribución de tráfico y carga efectiva actual, a fin de mantener una latencia baja. PR-DRB monitorea la latencia de los mensajes en los encaminadores, toma decisiones sobre los caminos alternativos a utilizar y registra la información de la congestión en base al patrón de comunicaciones detectado, usando como concepto base la repetitividad de las aplicaciones para luego volver a aplicar la mejor solución cuando dicho patrón se repita. Experimentos de tráfico con congestión fueron llevados a cabo para evaluar el rendimiento del método, los cuales mostraron la bondad del mismo

    Semiannual report, 1 October 1990 - 31 March 1991

    Get PDF
    Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science is summarized

    Memory-efficient array redistribution through portable collective communication

    Full text link
    Modern large-scale deep learning workloads highlight the need for parallel execution across many devices in order to fit model data into hardware accelerator memories. In these settings, array redistribution may be required during a computation, but can also become a bottleneck if not done efficiently. In this paper we address the problem of redistributing multi-dimensional array data in SPMD computations, the most prevalent form of parallelism in deep learning. We present a type-directed approach to synthesizing array redistributions as sequences of MPI-style collective operations. We prove formally that our synthesized redistributions are memory-efficient and perform no excessive data transfers. Array redistribution for SPMD computations using collective operations has also been implemented in the context of the XLA SPMD partitioner, a production-grade tool for partitioning programs across accelerator systems. We evaluate our approach against the XLA implementation and find that our approach delivers a geometric mean speedup of 1.22×1.22\times, with maximum speedups as a high as 5.7×5.7\times, while offering provable memory guarantees, making our system particularly appealing for large-scale models.Comment: minor errata fixe
    • …
    corecore