Search CORE

14 research outputs found

Work-preserving emulations of fixed-connection networks

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1989
Field of study

Crossref

Work-Preserving Emulations of Fixed-Connection Networks

Author
Publication venue: 'Defense Technical Information Center (DTIC)'
Publication date
Field of study

Crossref

Work-preserving real-time emulation of meshes on butterfly networks

Author: Achilles Alf-Christian
Publication venue: Digital Commons @ NJIT
Publication date: 31/08/1991
Field of study

The emulation of a guest network G on a host network H is work-preserving and real-time if the inefficiency, that is the ratio WG/WH of the amounts of work done in both networks, and the slowdown of the emulation are O(1). In this thesis we show that an infinite number of meshes can be emulated on a butterfly in a work-preserving real-time manner, despite the fact that any emulation of an s x s-node mesh in a butterfly with load 1 has a dilation of Ω(logs). The recursive embedding of a mesh in a butterfly presented by Koch et al. (STOC 1989), which forms the basis for our work, is corrected and generalized by relaxing unnecessary constraints. An algorithm determining the parameter for each stage of the recursion is described and a rigorous analysis of the resulting emulation shows that it is work-preserving and real-time for an infinite number of meshes. Data obtained from simulated embeddings suggests possible improvements to achieve a truly work-preserving emulation of the class of meshes on the class of butterflies

Digital Commons @ New Jersey Institute of Technology (NJIT)

The I/O Complexity of Hybrid Algorithms for Square Matrix Multiplication

Author: De Stefani Lorenzo
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th International Symposium on Algorithms and Computation (ISAAC 2019)
Publication date: 01/01/2019
Field of study

Asymptotically tight lower bounds are derived for the I/O complexity of a general class of hybrid algorithms computing the product of n x n square matrices combining "Strassen-like" fast matrix multiplication approach with computational complexity Theta(n^{log_2 7}), and "standard" matrix multiplication algorithms with computational complexity Omega (n^3). We present a novel and tight Omega ((n/max{sqrt M, n_0})^{log_2 7}(max{1,(n_0)/M})^3M) lower bound for the I/O complexity of a class of "uniform, non-stationary" hybrid algorithms when executed in a two-level storage hierarchy with M words of fast memory, where n_0 denotes the threshold size of sub-problems which are computed using standard algorithms with algebraic complexity Omega (n^3). The lower bound is actually derived for the more general class of "non-uniform, non-stationary" hybrid algorithms which allow recursive calls to have a different structure, even when they refer to the multiplication of matrices of the same size and in the same recursive level, although the quantitative expressions become more involved. Our results are the first I/O lower bounds for these classes of hybrid algorithms. All presented lower bounds apply even if the recomputation of partial results is allowed and are asymptotically tight. The proof technique combines the analysis of the Grigoriev\u27s flow of the matrix multiplication function, combinatorial properties of the encoding functions used by fast Strassen-like algorithms, and an application of the Loomis-Whitney geometric theorem for the analysis of standard matrix multiplication algorithms. Extensions of the lower bounds for a parallel model with P processors are also discussed

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Embedding complete binary trees in product graphs

Author: Broadwater A.L.
Efe K.
Fernandez A.
Publication venue
Publication date: 01/01/2000
Field of study

This paper shows how to embed complete binary trees in products of complete binary trees, products of shuffle-exchange graphs, and products of de Bruijn graphs with small dilation and congestion. In the embedding results presented here the size of the host graph can be fixed to an arbitrary size, while we define no bound on the size of the guest graph. This is motivated by the fact that the host architecture has a fixed number of processors due to its physical design, while the guest graph can grow arbitrarily large depending on the application. The results of this paper widen the class of computations that can be performed on these product graphs which are often cited as being low-cost alternatives for hypercubes. © J.C. Baltzer AG, Science Publishers

Bilkent University Institutional Repository

A Lower Bound Technique for Communication in BSP

Author: Bilardi Gianfranco
Scquizzato Michele
Silvestri Francesco
Publication venue
Publication date: 25/11/2017
Field of study

Communication is a major factor determining the performance of algorithms on current computing systems; it is therefore valuable to provide tight lower bounds on the communication complexity of computations. This paper presents a lower bound technique for the communication complexity in the bulk-synchronous parallel (BSP) model of a given class of DAG computations. The derived bound is expressed in terms of the switching potential of a DAG, that is, the number of permutations that the DAG can realize when viewed as a switching network. The proposed technique yields tight lower bounds for the fast Fourier transform (FFT), and for any sorting and permutation network. A stronger bound is also derived for the periodic balanced sorting network, by applying this technique to suitable subnetworks. Finally, we demonstrate that the switching potential captures communication requirements even in computational models different from BSP, such as the I/O model and the LPRAM

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Efficient Out-of-Core Algorithms for Linear Relaxation Using Blocking Covers

Author: Leiserson Charles E.
Rao Satish
Toledo Sivan
Publication venue: Academic Press.
Publication date: 30/04/1997
Field of study

AbstractWhen a numerical computation fails to fit in the primary memory of a serial or parallel computer, a so-called “out-of-core” algorithm, which moves data between primary and secondary memories, must be used. In this paper, we study out-of-core algorithms for sparse linear relaxation problems in which each iteration of the algorithm updates the state of every vertex in a graph with a linear combination of the states of its neighbors. We give a general method that can save substantially on the I/O traffic for many problems. For example, our technique allows a computer withMwords of primary memory to performT=Ω(M1/5) cycles of a multigrid algorithm for a two-dimensional elliptic solver over an n-point domain using onlyΘ(nT/M1/5) I/O transfers, as compared with the naive algorithm which requiresΩ(nT) I/O's. Our method depends on the existence of a “blocking” cover of the graph that underlies the linear relaxation. A blocking cover has the property that the subgraphs forming the cover have large diameters once a small number of vertices have been removed. The key idea in our method is to introduce a variable for each removed vertex for each time step of the algorithm. We maintain linear dependences among the removed vertices, thereby allowing each subgraph to be iteratively relaxed without external communication. We give a general theorem relating blocking covers to I/O-efficient relaxation schemes. We also give an automatic method for finding blocking covers for certain classes of graphs, including planar graphs andd-dimensional simplicial graphs with constant aspect ratio (i.e., graphs that arise from dividingd-space into “well-shaped” polyhedra). As a result, we can performTiterations of linear relaxation on anyn-vertex planar graph using onlyΘ(n+nTlgn/M1/4) I/O's or on anyn-noded-dimensional simplicial graph with constant aspect ratio using onlyΘ(n+nTlgn/MΩ(1/d)) I/O's

Elsevier - Publisher Connector

Overlay Networks: An Akamai Perspective

Author: Assad
Dilley
Eriksson
Kontothanassis
Nygren
Stoica
Zhao
Publication venue: 'Wiley'
Publication date: 01/01/2014
Field of study

The Internet is transforming every aspect of communication in human so-ciety by enabling a wide range of applications for business, commerce, en-tertainment, news, and social interaction. Modern and future distributed applications require high reliability, performance, security, and scalability

CiteSeerX

Crossref

Recommended from our members

Work-Preserving Emulations of Fixed-Connection Networks

Author: KOCH RICHARD R.
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/1997
Field of study

In this paper, we study the problem of emulating TG steps of an NG-node guest network, G, on an NH-node host network, H. We call an emulation work-preserving if the time required by the host, TH, is O(TGNG/NH), because then both the guest and host networks perform the same total work (i.e., processor-time product), Q(TGNG), to within a constant factor. We say that an emulation occurs in real-time if TH 5 O(TG), because then the host emulates the guest with constant slowdown. In addition to describing several work-preserving and real-time emulations, we also provide a general model in which lower bounds can be proved. Some of the more interesting and diverse consequences of this work include: (1) a proof that a linear array can emulate a (much larger) butterfly in a work-preserving fashion, but that a butterfly cannot emulate an expander (of any size) in a work-preserving fashion, (2) a proof that a butterfly can emulate a shuffle-exchange network in a real-time work-preserving fashion, and vice versa, (3) a proof that a butterfly can emulate a mesh (or an array of higher, but fixed, dimension) in a real-time work-preserving fashion, even though any O(1)-to-1 embedding of an N-node mesh in an N-node butterfly has dilation V(log N), and (4) simple O(N2/log2 N)-area and O(N3/ 2/log3/2 N)-volume layouts for the N-node shuffle-exchange network. Categories and Subject Descriptors: C.1.2 [Processor Architectures]: Multiple Data Stream Architectures— parallel processors; C.2.1 [Computer-Communications Networks]: Network Analysis and Design— network topology; F.1.1 [Computation by Abstract Devices]: Models of Computation—networks of machines; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems—computations on discrete structures; G.2.1 [Discrete Mathematics]: combinatories—combinatorial algorithms; G.2.2 [Discrete Mathematics]: Graph Theory—graph algorithms General Terms: Algorithms, Design, Theor

ScholarWorks@UMass Amherst