160 research outputs found

    A multipath analysis of biswapped networks.

    Get PDF
    Biswapped networks of the form Bsw(G)Bsw(G) have recently been proposed as interconnection networks to be implemented as optical transpose interconnection systems. We provide a systematic construction of κ+1\kappa+1 vertex-disjoint paths joining any two distinct vertices in Bsw(G)Bsw(G), where κ1\kappa\geq 1 is the connectivity of GG. In doing so, we obtain an upper bound of max{2Δ(G)+5,Δκ(G)+Δ(G)+2}\max\{2\Delta(G)+5,\Delta_\kappa(G)+\Delta(G)+2\} on the (κ+1)(\kappa+1)-diameter of Bsw(G)Bsw(G), where Δ(G)\Delta(G) is the diameter of GG and Δκ(G)\Delta_\kappa(G) the κ\kappa-diameter. Suppose that we have a deterministic multipath source routing algorithm in an interconnection network GG that finds κ\kappa mutually vertex-disjoint paths in GG joining any 22 distinct vertices and does this in time polynomial in Δκ(G)\Delta_\kappa(G), Δ(G)\Delta(G) and κ\kappa (and independently of the number of vertices of GG). Our constructions yield an analogous deterministic multipath source routing algorithm in the interconnection network Bsw(G)Bsw(G) that finds κ+1\kappa+1 mutually vertex-disjoint paths joining any 22 distinct vertices in Bsw(G)Bsw(G) so that these paths all have length bounded as above. Moreover, our algorithm has time complexity polynomial in Δκ(G)\Delta_\kappa(G), Δ(G)\Delta(G) and κ\kappa. We also show that if GG is Hamiltonian then Bsw(G)Bsw(G) is Hamiltonian, and that if GG is a Cayley graph then Bsw(G)Bsw(G) is a Cayley graph

    Multiswapped networks and their topological and algorithmic properties

    Get PDF
    We generalise the biswapped network Bsw(G)Bsw(G) to obtain a multiswapped network Msw(H;G)Msw(H;G), built around two graphs G and H. We show that the network Msw(H;G)Msw(H;G) lends itself to optoelectronic implementation and examine its topological and algorithmic. We derive the length of a shortest path joining any two vertices in Msw(H;G)Msw(H;G) and consequently a formula for the diameter. We show that if G has connectivity κ⩾1κ⩾1 and H has connectivity λ⩾1λ⩾1 where λ⩽κλ⩽κ then Msw(H;G)Msw(H;G) has connectivity at least κ+λκ+λ, and we derive upper bounds on the (κ+λ)(κ+λ)-diameter of Msw(H;G)Msw(H;G). Our analysis yields distributed routing algorithms for a distributed-memory multiprocessor whose underlying topology is Msw(H;G)Msw(H;G). We also prove that if G and H are Cayley graphs then Msw(H;G)Msw(H;G) need not be a Cayley graph, but when H is a bipartite Cayley graph then Msw(H;G)Msw(H;G) is necessarily a Cayley graph

    Efficient structural outlooks for vertex product networks

    Get PDF
    In this thesis, a new classification for a large set of interconnection networks, referred to as "Vertex Product Networks" (VPN), is provided and a number of related issues are discussed including the design and evaluation of efficient structural outlooks for algorithm development on this class of networks. The importance of studying the VPN can be attributed to the following two main reasons: first an unlimited number of new networks can be defined under the umbrella of the VPN, and second some known networks can be studied and analysed more deeply. Examples of the VPN include the newly proposed arrangement-star and the existing Optical Transpose Interconnection Systems (OTIS-networks). Over the past two decades many interconnection networks have been proposed in the literature, including the star, hyperstar, hypercube, arrangement, and OTIS-networks. Most existing research on these networks has focused on analysing their topological properties. Consequently, there has been relatively little work devoted to designing efficient parallel algorithms for important parallel applications. In an attempt to fill this gap, this research aims to propose efficient structural outlooks for algorithm development. These structural outlooks are based on grid and pipeline views as popular structures that support a vast body of applications that are encountered in many areas of science and engineering, including matrix computation, divide-and- conquer type of algorithms, sorting, and Fourier transforms. The proposed structural outlooks are applied to the VPN, notably the arrangement-star and OTIS-networks. In this research, we argue that the proposed arrangement-star is a viable candidate as an underlying topology for future high-speed parallel computers. Not only does the arrangement-star bring a solution to the scalability limitations from which the Abstract existing star graph suffers, but it also enables the development of parallel algorithms based on the proposed structural outlooks, such as matrix computation, linear algebra, divide-and-conquer algorithms, sorting, and Fourier transforms. Results from a performance study conducted in this thesis reveal that the proposed arrangement-star supports efficiently applications based on the grid or pipeline structural outlooks. OTIS-networks are another example of the VPN. This type of networks has the important advantage of combining both optical and electronic interconnect technology. A number of studies have recently explored the topological properties of OTIS-networks. Although there has been some work on designing parallel algorithms for image processing and sorting, hardly any work has considered the suitability of these networks for an important class of scientific problems such as matrix computation, sorting, and Fourier transforms. In this study, we present and evaluate two structural outlooks for algorithm development on OTIS-networks. The proposed structural outlooks are general in the sense that no specific factor network or problem domain is assumed. Timing models for measuring the performance of the proposed structural outlooks are provided. Through these models, the performance of various algorithms on OTIS-networks are evaluated and compared with their counterparts on conventional electronic interconnection systems. The obtained results reveal that OTIS-networks are an attractive candidate for future parallel computers due to their superior performance characteristics over networks using traditional electronic interconnects

    Some studies on the multi-mesh architecture.

    Get PDF
    In this thesis, we have reported our investigations on interconnection network architectures based on the idea of a recently proposed multi-processor architecture, Multi-Mesh network. This includes the development of a new interconnection architecture, study of its topological properties and a proposal for implementing Multi-Mesh using optical technology. We have presented a new network topology, called the 3D Multi-Mesh (3D MM) that is an extension of the Multi-Mesh architecture [DDS99]. This network consists of n3 three-dimensional meshes (termed as 3D blocks), each having n3 processors, interconnected in a suitable manner so that the resulting topology is 6-regular with n6 processors and a diameter of only 3n. We have shown that the connectivity of this network is 6. We have explored an algorithm for point-to-point communication on the 3D MM. It is expected that this architecture will enable more efficient algorithm mapping compared to existing architectures. We have also proposed some implementation of the multi-mesh avoiding the electronic bottleneck due to long copper wires for communication between some processors. Our implementation considers a number of realistic scenarios based on hybrid (optical and electronic) communication. One unique feature of this investigation is our use of WDM wavelength routing and the protection scheme. We are not aware of any implementation of interconnection networks using these techniques.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .A32. Source: Masters Abstracts International, Volume: 43-03, page: 0868. Adviser: Subir Bandyopadhyay. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    Interconnection networks for parallel and distributed computing

    Get PDF
    Parallel computers are generally either shared-memory machines or distributed- memory machines. There are currently technological limitations on shared-memory architectures and so parallel computers utilizing a large number of processors tend tube distributed-memory machines. We are concerned solely with distributed-memory multiprocessors. In such machines, the dominant factor inhibiting faster global computations is inter-processor communication. Communication is dependent upon the topology of the interconnection network, the routing mechanism, the flow control policy, and the method of switching. We are concerned with issues relating to the topology of the interconnection network. The choice of how we connect processors in a distributed-memory multiprocessor is a fundamental design decision. There are numerous, often conflicting, considerations to bear in mind. However, there does not exist an interconnection network that is optimal on all counts and trade-offs have to be made. A multitude of interconnection networks have been proposed with each of these networks having some good (topological) properties and some not so good. Existing noteworthy networks include trees, fat-trees, meshes, cube-connected cycles, butterflies, Möbius cubes, hypercubes, augmented cubes, k-ary n-cubes, twisted cubes, n-star graphs, (n, k)-star graphs, alternating group graphs, de Bruijn networks, and bubble-sort graphs, to name but a few. We will mainly focus on k-ary n-cubes and (n, k)-star graphs in this thesis. Meanwhile, we propose a new interconnection network called augmented k-ary n- cubes. The following results are given in the thesis.1. Let k ≥ 4 be even and let n ≥ 2. Consider a faulty k-ary n-cube Q(^k_n) in which the number of node faults f(_n) and the number of link faults f(_e) are such that f(_n) + f(_e) ≤ 2n - 2. We prove that given any two healthy nodes s and e of Q(^k_n), there is a path from s to e of length at least k(^n) - 2f(_n) - 1 (resp. k(^n) - 2f(_n) - 2) if the nodes s and e have different (resp. the same) parities (the parity of a node Q(^k_n) in is the sum modulo 2 of the elements in the n-tuple over 0, 1, ∙∙∙ , k - 1 representing the node). Our result is optimal in the sense that there are pairs of nodes and fault configurations for which these bounds cannot be improved, and it answers questions recently posed by Yang, Tan and Hsu, and by Fu. Furthermore, we extend known results, obtained by Kim and Park, for the case when n = 2.2. We give precise solutions to problems posed by Wang, An, Pan, Wang and Qu and by Hsieh, Lin and Huang. In particular, we show that Q(^k_n) is bi-panconnected and edge-bipancyclic, when k ≥ 3 and n ≥ 2, and we also show that when k is odd, Q(^k_n) is m-panconnected, for m = (^n(k - 1) + 2k - 6’ / ‘_2), and (k -1) pancyclic (these bounds are optimal). We introduce a path-shortening technique, called progressive shortening, and strengthen existing results, showing that when paths are formed using progressive shortening then these paths can be efficiently constructed and used to solve a problem relating to the distributed simulation of linear arrays and cycles in a parallel machine whose interconnection network is Q(^k_n) even in the presence of a faulty processor.3. We define an interconnection network AQ(^k_n) which we call the augmented k-ary n-cube by extending a k-ary n-cube in a manner analogous to the existing extension of an n-dimensional hypercube to an n-dimensional augmented cube. We prove that the augmented k-ary n-cube Q(^k_n) has a number of attractive properties (in the context of parallel computing). For example, we show that the augmented k-ary n-cube Q(^k_n) - is a Cayley graph (and so is vertex-symmetric); has connectivity 4n - 2, and is such that we can build a set of 4n - 2 mutually disjoint paths joining any two distinct vertices so that the path of maximal length has length at most max{{n- l)k- (n-2), k + 7}; has diameter [(^k) / (_3)] + [(^k - 1) /( _3)], when n = 2; and has diameter at most (^k) / (_4) (n+ 1), for n ≥ 3 and k even, and at most [(^k)/ (_4) (n + 1) + (^n) / (_4), for n ^, for n ≥ 3 and k odd.4. We present an algorithm which given a source node and a set of n - 1 target nodes in the (n, k)-star graph S(_n,k) where all nodes are distinct, builds a collection of n - 1 node-disjoint paths, one from each target node to the source. The collection of paths output from the algorithm is such that each path has length at most 6k - 7, and the algorithm has time complexity O(k(^3)n(^4))

    Aspects of k-k-Routing in Meshes and OTIS Networks

    Get PDF
    Aspects of k-k Routing in Meshes and OTIS-Networks Abstract Efficient data transport in parallel computers build on sparse interconnection networks is crucial for their performance. A basic transport problem in such a computer is the k-k routing problem. In this thesis, aspects of the k-k routing problem on r-dimensional meshes and OTIS-G networks are discussed. The first oblivious routing algorithms for these networks are presented that solve the k-k routing problem in an asymptotically optimal running time and a constant buffer size. Furthermore, other aspects of the k-k routing problem for OTIS-G networks are analysed. In particular, lower bounds for the problem based on the diameter and bisection width of OTIS-G networks are given, and the k-k sorting problem on the OTIS-Mesh is considered. Based on OTIS-G networks, a new class of networks, called Extended OTIS-G networks, is introduced, which have smaller diameters than OTIS-G networks.Für die Leistungfähigkeit von Parallelrechnern, die über ein Verbindungsnetzwerk kommunizieren, ist ein effizienter Datentransport entscheidend. Ein grundlegendes Transportproblem in einem solchen Rechner ist das k-k Routing Problem. In dieser Arbeit werden Aspekte dieses Problems in r-dimensionalen Gittern und OTIS-G Netzwerken untersucht. Es wird der erste vergessliche (oblivious) Routing Algorithmus vorgestellt, der das k-k Routing Problem in diesen Netzwerken in einer asymptotisch optimalen Laufzeit bei konstanter Puffergröße löst. Für OTIS-G Netzwerke werden untere Laufzeitschranken für das untersuchte Problem angegeben, die auf dem Durchmesser und der Bisektionsweite der Netzwerke basieren. Weiterhin wird ein Algorithmus vorgestellt, der das k-k Sorting Problem mit einer Laufzeit löst, die nahe an der Bisektions- und Durchmesserschranke liegt. Basierend auf den OTIS-G Netzwerken, wird eine neue Klasse von Netzwerken eingeführt, die sogenannten Extended OTIS-G Netzwerke, die sich durch einen kleineren Durchmesser von OTIS-G Netzwerken unterscheiden

    Discrete element modeling of dry granular material using a massively parallel supercomputer

    Get PDF
    It is the state-of -the-art within Geotechnical Engineering to model soils as systems of particles rather than using the traditional continuum approach. Simulating these systems of particles for geotechnical boundary value problems results in systems which are of necessity large, motivating the application of massively parallel supercomputers. This thesis pursues such an approach. The following work describes numerical experiments using a Discrete Element Method (DEM) paradigm for soils (Trubal) together with massively parallel computers with Single Instruction Multiple Data (SIMD) architecture. The discrete element method describes the behavior of granular assemblies using the classical mechanics of discrete bodies. The computational requirements of DEM algorithms introduce time complexities, which mandate a compatible topology for massively parallel machines in order to achieve optimal performance. This thesis demonstrates the compatibility of a Single Instruction Multiple Data (SIMD) topology in performing discrete element simulations for 3-d spherical dry granular media. The serial algorithm, Trubal, was first modified to run with a parallel data structure on a SIMD architecture. The modified version, known as Trubal for Parallel Machines (TPM), is the data parallel version that was tested on the connection machines (CM-2) and (CM-5), consisting of 32,768 processors and 512 nodes, respectively. The first version of TPM was tested on the CM-2 machine before its use was discontinued. Because the architecture is synchronized at each instruction, elemental data movements reduce the performance of the machine\u27s overall resources and increase the latency of the communication between processors. This issue is addressed within the design of the algorithm so that the SIMD vector processing capability can adapt to a dynamic memory data structure. A second version of TPM was subsequently designed for the CM-5 machine using a more efficient parallel data structure to improve the performance of the simulations. TPM version 2.0 was able to obtain a speedup in performance by handling all possible contacts within each processor, thereby creating a homogeneous data structure. The overall efficiency is governed by the global communication which is a function of the speed of the interconnection network within the architecture. TPM\u27s improved performance is demonstrated using two different triaxial simulations. One of them involved a physical triaxial experiment with steel spheres performed by Rowe (1962) and later simulated by Cundall (1979). The remodeling of this numerical simulation validated TPM version 2.0 overall performance where a nine-fold speedup was obtained. TPM\u27s reproduction of these results and its improved speedup encourage further investigations using discrete models on parallel platforms. This thesis substantiates the use of parallel computing as a technique for geotechnical applications. It is further anticipated that developing and adapting heterogeneous platforms to DEM models will make the application of parallel computing more attractive in geotechnical engineering

    Integration of tools for the Design and Assessment of High-Performance, Highly Reliable Computing Systems (DAHPHRS), phase 1

    Get PDF
    Systems for Space Defense Initiative (SDI) space applications typically require both high performance and very high reliability. These requirements present the systems engineer evaluating such systems with the extremely difficult problem of conducting performance and reliability trade-offs over large design spaces. A controlled development process supported by appropriate automated tools must be used to assure that the system will meet design objectives. This report describes an investigation of methods, tools, and techniques necessary to support performance and reliability modeling for SDI systems development. Models of the JPL Hypercubes, the Encore Multimax, and the C.S. Draper Lab Fault-Tolerant Parallel Processor (FTPP) parallel-computing architectures using candidate SDI weapons-to-target assignment algorithms as workloads were built and analyzed as a means of identifying the necessary system models, how the models interact, and what experiments and analyses should be performed. As a result of this effort, weaknesses in the existing methods and tools were revealed and capabilities that will be required for both individual tools and an integrated toolset were identified
    corecore