Search CORE

52 research outputs found

Iterative and doubling algorithms for Riccati-type matrix equations: a comparative introduction

Author: Poloni Federico
Publication venue
Publication date: 01/01/2020
Field of study

We review a family of algorithms for Lyapunov- and Riccati-type equations which are all related to each other by the idea of \emph{doubling}: they construct the iterate

Q_k = X_{2^k}

of another naturally-arising fixed-point iteration

(X_h)

via a sort of repeated squaring. The equations we consider are Stein equations

X - A^*XA=Q

, Lyapunov equations

A^*X+XA+Q=0

, discrete-time algebraic Riccati equations

X=Q+A^*X(I+GX)^{-1}A

, continuous-time algebraic Riccati equations

Q+A^*X+XA-XGX=0

, palindromic quadratic matrix equations

A+QY+A^*Y^2=0

, and nonlinear matrix equations

X+A^*X^{-1}A=Q

. We draw comparisons among these algorithms, highlight the connections between them and to other algorithms such as subspace iteration, and discuss open issues in their theory.Comment: Review article for GAMM Mitteilunge

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science

Author
Publication venue
Publication date
Field of study

Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science is summarized

NASA Technical Reports Server

Predictive and distributed routing balancing (PR-DRB) : high speed interconnection networks

Author: Franco Puntes Daniel
Núñez Carlos
Universitat Autònoma de Barcelona. Departament d'Arquitectura de Computadors i Sistemes Operatius
Universitat Autònoma de Barcelona. Escola d'Enginyeria
Publication venue
Publication date: 01/01/2010
Field of study

Current parallel applications running on clusters require the use of an interconnection network to perform communications among all computing nodes available. Imbalance of communications can produce network congestion, reducing throughput and increasing latency, degrading the overall system performance. On the other hand, parallel applications running on these networks posses representative stages which allow their characterization, as well as repetitive behavior that can be identified on the basis of this characterization. This work presents the Predictive and Distributed Routing Balancing (PR-DRB), a new method developed to gradually control network congestion, based on paths expansion, traffic distribution and effective traffic load, in order to maintain low latency values. PR-DRB monitors messages latencies on intermediate routers, makes decisions about alternative paths and record communication pattern information encountered during congestion situation. Based on the concept of applications repetitiveness, best solution recorded are reapplied when saved communication pattern re-appears. Traffic congestion experiments were conducted in order to evaluate the performance of the method, and improvements were observed.Les aplicacions paral·leles actuals en els Clústers requereixen l'ús d'una xarxa d'interconnexió per comunicar a tots els nodes de còmput disponibles. El desequilibri en la càrrega de comunicacions pot congestionar la xarxa, incrementant la latència i disminuint el throughput, degradant el rendiment total del sistema. D'altra banda, les aplicacions paral·leles que s'executen sobre aquestes xarxes contenen etapes representatives durant la seva execució les quals permeten caracteritzar-les, a més d'extraure un comportament repetitiu que pot ser identificat en base a aquesta caracterització. Aquest treball presenta el Balanceig Predictiu de Encaminament Distribuït (PR-DRB), un nou mètode desenvolupat per controlar la congestió a la xarxa en forma gradual, basat en l'expansió de camins, la distribució de trànsit i càrrega efectiva actual per tal de mantenir una latència baixa. PR-DRB monitoritza la latència dels missatges en els encaminadors, pren decisions sobre els camins alternatius a utilitzar i registra la informació de la congestió sobre la base del patró de comunicacions detectat, utilitzant com a concepte base la repetitivitat de les aplicacions per després tornar a aplicar la millor solució quan aquest patró es repeteixi. Experiments de trànsit amb congestió van ser portats a terme per avaluar el rendiment del mètode, els quals van mostrar la bondat del mateix.Las aplicaciones paralelas actuales en los Clústeres requieren el uso de una red de interconexión para comunicar a todos los nodos de cómputo disponibles. El desbalance en la carga de comunicaciones puede congestionar la red, incrementando la latencia y disminuyendo el throughput, degradando el rendimiento total del sistema. Por otro lado, las aplicaciones paralelas que corren sobre estas redes contienen etapas representativas durante su ejecución las cuales permiten caracterizarlas, además de un comportamiento repetitivo que puede ser identificado en base a dicha caracterización. Este trabajo presenta el Balanceo Predictivo de Encaminamiento Distribuido (PR-DRB), un nuevo método desarrollado para controlar la congestión en la red en forma gradual; basado en la expansión de caminos, la distribución de tráfico y carga efectiva actual, a fin de mantener una latencia baja. PR-DRB monitorea la latencia de los mensajes en los encaminadores, toma decisiones sobre los caminos alternativos a utilizar y registra la información de la congestión en base al patrón de comunicaciones detectado, usando como concepto base la repetitividad de las aplicaciones para luego volver a aplicar la mejor solución cuando dicho patrón se repita. Experimentos de tráfico con congestión fueron llevados a cabo para evaluar el rendimiento del método, los cuales mostraron la bondad del mismo

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Diposit Digital de Documents de la UAB

Large Periodic Lyapunov Equations: Algorithms and Applications

Author: Kressner Daniel
Publication venue
Publication date: 05/05/2011
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Semiannual report, 1 October 1990 - 31 March 1991

Author
Publication venue
Publication date
Field of study

Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science is summarized

NASA Technical Reports Server

Load Balancing Content-Based Publish/Subscribe Systems

Author: Adler M.
Aleksy M.
Alex King Yeung Cheung
Altinel M.
Banavar G.
Barth T.
Berman F.
Cao F.
Casalicchio E.
Castelli S.
Chen Y.
Cheung A. K. Y.
Dias D. M.
Fidler E.
Gupta A.
Hans-Arno Jacobsen
Ho K. S.
Li G.
Litzkow M. J.
Opyrchal L.
Pallickara S.
Pereira J.
Pietzuch P. R.
Riabov A.
Riabov A.
Rose I.
Schuler C.
Segall B.
Tam D.
Tatbul N.
Triantafillou P.
Voulgaris S.
Zajcew R.
Zhang C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Memory-efficient array redistribution through portable collective communication

Author: Paszke Adam
Rink Norman A.
Schmid Georg Stefan
Vytiniotis Dimitrios
Publication venue
Publication date: 28/11/2022
Field of study

Modern large-scale deep learning workloads highlight the need for parallel execution across many devices in order to fit model data into hardware accelerator memories. In these settings, array redistribution may be required during a computation, but can also become a bottleneck if not done efficiently. In this paper we address the problem of redistributing multi-dimensional array data in SPMD computations, the most prevalent form of parallelism in deep learning. We present a type-directed approach to synthesizing array redistributions as sequences of MPI-style collective operations. We prove formally that our synthesized redistributions are memory-efficient and perform no excessive data transfers. Array redistribution for SPMD computations using collective operations has also been implemented in the context of the XLA SPMD partitioner, a production-grade tool for partitioning programs across accelerator systems. We evaluate our approach against the XLA implementation and find that our approach delivers a geometric mean speedup of

1.22\times

, with maximum speedups as a high as

5.7\times

, while offering provable memory guarantees, making our system particularly appealing for large-scale models.Comment: minor errata fixe

arXiv.org e-Print Archive

Recommended from our members

Algorithm Based Fault Tolerance in Massively Parallel Systems

Author: Lerner Mark D.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1988
Field of study

An A complex computer system consists of billions of transistors, miles of wires, and many interactions with an unpredictable environment. Correct results must be produced despite faults that dynamically occur in some of these components. Many techniques have been developed for fault tolerant computation. General purpose methods are independent of the application, yet incur an overhead cost which may be unacceptable for massively parallel systems. Algorithm-specific methods, which can operate at lower cost, are a developing alternative [1, 72]. This paper first reviews the general-purpose approach and then focuses on the algorithm-specific method, with an eye toward massively parallel processors. Algorithm-based fault tolerance has the attraction of low overhead; furthermore it addresses both the detection and also the correction problems. The principle is to build low-cost checking and correcting mechanism based exclusively on the redundancies inherent in the system

Columbia University Academic Commons