52 research outputs found
Iterative and doubling algorithms for Riccati-type matrix equations: a comparative introduction
We review a family of algorithms for Lyapunov- and Riccati-type equations
which are all related to each other by the idea of \emph{doubling}: they
construct the iterate of another naturally-arising fixed-point
iteration via a sort of repeated squaring.
The equations we consider are Stein equations , Lyapunov
equations , discrete-time algebraic Riccati equations
, continuous-time algebraic Riccati equations
, palindromic quadratic matrix equations , and
nonlinear matrix equations . We draw comparisons among these
algorithms, highlight the connections between them and to other algorithms such
as subspace iteration, and discuss open issues in their theory.Comment: Review article for GAMM Mitteilunge
Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science
Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science is summarized
Predictive and distributed routing balancing (PR-DRB) : high speed interconnection networks
Current parallel applications running on clusters require the use of an interconnection network to perform communications among all computing nodes available. Imbalance of communications can produce network congestion, reducing throughput and increasing latency, degrading the overall system performance. On the other hand, parallel applications running on these networks posses representative stages which allow their characterization, as well as repetitive behavior that can be identified on the basis of this characterization. This work presents the Predictive and Distributed Routing Balancing (PR-DRB), a new method developed to gradually control network congestion, based on paths expansion, traffic distribution and effective traffic load, in order to maintain low latency values. PR-DRB monitors messages latencies on intermediate routers, makes decisions about alternative paths and record communication pattern information encountered during congestion situation. Based on the concept of applications repetitiveness, best solution recorded are reapplied when saved communication pattern re-appears. Traffic congestion experiments were conducted in order to evaluate the performance of the method, and improvements were observed.Les aplicacions paral·leles actuals en els Clústers requereixen l'ús d'una xarxa d'interconnexió per comunicar a tots els nodes de còmput disponibles. El desequilibri en la cà rrega de comunicacions pot congestionar la xarxa, incrementant la latència i disminuint el throughput, degradant el rendiment total del sistema. D'altra banda, les aplicacions paral·leles que s'executen sobre aquestes xarxes contenen etapes representatives durant la seva execució les quals permeten caracteritzar-les, a més d'extraure un comportament repetitiu que pot ser identificat en base a aquesta caracterització. Aquest treball presenta el Balanceig Predictiu de Encaminament Distribuït (PR-DRB), un nou mètode desenvolupat per controlar la congestió a la xarxa en forma gradual, basat en l'expansió de camins, la distribució de trà nsit i cà rrega efectiva actual per tal de mantenir una latència baixa. PR-DRB monitoritza la latència dels missatges en els encaminadors, pren decisions sobre els camins alternatius a utilitzar i registra la informació de la congestió sobre la base del patró de comunicacions detectat, utilitzant com a concepte base la repetitivitat de les aplicacions per després tornar a aplicar la millor solució quan aquest patró es repeteixi. Experiments de trà nsit amb congestió van ser portats a terme per avaluar el rendiment del mètode, els quals van mostrar la bondat del mateix.Las aplicaciones paralelas actuales en los Clústeres requieren el uso de una red de interconexión para comunicar a todos los nodos de cómputo disponibles. El desbalance en la carga de comunicaciones puede congestionar la red, incrementando la latencia y disminuyendo el throughput, degradando el rendimiento total del sistema. Por otro lado, las aplicaciones paralelas que corren sobre estas redes contienen etapas representativas durante su ejecución las cuales permiten caracterizarlas, además de un comportamiento repetitivo que puede ser identificado en base a dicha caracterización. Este trabajo presenta el Balanceo Predictivo de Encaminamiento Distribuido (PR-DRB), un nuevo método desarrollado para controlar la congestión en la red en forma gradual; basado en la expansión de caminos, la distribución de tráfico y carga efectiva actual, a fin de mantener una latencia baja. PR-DRB monitorea la latencia de los mensajes en los encaminadores, toma decisiones sobre los caminos alternativos a utilizar y registra la información de la congestión en base al patrón de comunicaciones detectado, usando como concepto base la repetitividad de las aplicaciones para luego volver a aplicar la mejor solución cuando dicho patrón se repita. Experimentos de tráfico con congestión fueron llevados a cabo para evaluar el rendimiento del método, los cuales mostraron la bondad del mismo
Semiannual report, 1 October 1990 - 31 March 1991
Research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, numerical analysis, and computer science is summarized
Memory-efficient array redistribution through portable collective communication
Modern large-scale deep learning workloads highlight the need for parallel
execution across many devices in order to fit model data into hardware
accelerator memories. In these settings, array redistribution may be required
during a computation, but can also become a bottleneck if not done efficiently.
In this paper we address the problem of redistributing multi-dimensional array
data in SPMD computations, the most prevalent form of parallelism in deep
learning. We present a type-directed approach to synthesizing array
redistributions as sequences of MPI-style collective operations. We prove
formally that our synthesized redistributions are memory-efficient and perform
no excessive data transfers. Array redistribution for SPMD computations using
collective operations has also been implemented in the context of the XLA SPMD
partitioner, a production-grade tool for partitioning programs across
accelerator systems. We evaluate our approach against the XLA implementation
and find that our approach delivers a geometric mean speedup of ,
with maximum speedups as a high as , while offering provable memory
guarantees, making our system particularly appealing for large-scale models.Comment: minor errata fixe
Recommended from our members
Algorithm Based Fault Tolerance in Massively Parallel Systems
An A complex computer system consists of billions of transistors, miles of wires, and many interactions with an unpredictable environment. Correct results must be produced despite faults that dynamically occur in some of these components. Many techniques have been developed for fault tolerant computation. General purpose methods are independent of the application, yet incur an overhead cost which may be unacceptable for massively parallel systems. Algorithm-specific methods, which can operate at lower cost, are a developing alternative [1, 72]. This paper first reviews the general-purpose approach and then focuses on the algorithm-specific method, with an eye toward massively parallel processors. Algorithm-based fault tolerance has the attraction of low overhead; furthermore it addresses both the detection and also the correction problems. The principle is to build low-cost checking and correcting mechanism based exclusively on the redundancies inherent in the system
- …