507 research outputs found

    CLEX: Yet Another Supercomputer Architecture?

    Get PDF
    We propose the CLEX supercomputer topology and routing scheme. We prove that CLEX can utilize a constant fraction of the total bandwidth for point-to-point communication, at delays proportional to the sum of the number of intermediate hops and the maximum physical distance between any two nodes. Moreover, % applying an asymmetric bandwidth assignment to the links, all-to-all communication can be realized (1+o(1))(1+o(1))-optimally both with regard to bandwidth and delays. This is achieved at node degrees of nΔn^{\varepsilon}, for an arbitrary small constant Δ∈(0,1]\varepsilon\in (0,1]. In contrast, these results are impossible in any network featuring constant or polylogarithmic node degrees. Through simulation, we assess the benefits of an implementation of the proposed communication strategy. Our results indicate that, for a million processors, CLEX can increase bandwidth utilization and reduce average routing path length by at least factors 1010 respectively 55 in comparison to a torus network. Furthermore, the CLEX communication scheme features several other properties, such as deadlock-freedom, inherent fault-tolerance, and canonical partition into smaller subsystems

    Optimal all-to-all personalized exchange in self-routable multistage networks

    Full text link

    Multi-stage switching networks for waveguide optical technology

    Get PDF
    Multi-stage switching is very suitable for implementing interconnection systems operating at different physical scale (from rack-to-rack to on-chip) and with several technologies (either photonics or electronics). Several multistage architectures have been proposed to design these systems in a highly modular and efficient way. Since these proposals are general and applicable to a vast range of technologies, optimizations are possible once a specific technology is considered. In this work, we aim at optimizing multi-stage banyan and EGS architectures in case of optical waveguide technology implementation. We propose a method to decrease the number of waveguide crossovers, while avoiding an excessive increase of waveguide bends

    Symmetric rearrangeable networks and algorithms

    Get PDF
    A class of symmetric rearrangeable nonblocking networks has been considered in this thesis. A particular focus of this thesis is on Benes networks built with 2 x 2 switching elements. Symmetric rearrangeable networks built with larger switching elements have also being considered. New applications of these networks are found in the areas of System on Chip (SoC) and Network on Chip (NoC). Deterministic routing algorithms used in NoC applications suffer low scalability and slow execution time. On the other hand, faster algorithms are blocking and thus limit throughput. This will be an acceptable trade-off for many applications where achieving ”wire speed” on the on-chip network would require extensive optimisation of the attached devices. In this thesis I designed an algorithm that has much lower blocking probabilities than other suboptimal algorithms but a much faster execution time than deterministic routing algorithms. The suboptimal method uses the looping algorithm in its outermost stages and then in the two distinct subnetworks deeper in the switch uses a fast but suboptimal path search method to find available paths. The worst case time complexity of this new routing method is O(NlogN) using a single processor, which matches the best known results reported in the literature. Disruption of the ongoing communications in this class of networks during rearrangements is an open issue. In this thesis I explored a modification of the topology of these networks which gives rise to what is termed as repackable networks. A repackable topology allows rearrangements of paths without intermittently losing connectivity by breaking the existing communication paths momentarily. The repackable network structure proposed in this thesis is efficient in its use of hardware when compared to other proposals in the literature. As most of the deterministic algorithms designed for Benes networks implement a permutation of all inputs to find the routing tags for the requested inputoutput pairs, I proposed a new algorithm that can work for partial permutations. If the network load is defined as ρ, the mean number of active inputs in a partial permutation is, m = ρN, where N is the network size. This new method is based on mapping the network stages into a set of sub-matrices and then determines the routing tags for each pair of requests by populating the cells of the sub-matrices without creating a blocking state. Overall the serial time complexity of this method is O(NlogN) and O(mlogN) where all N inputs are active and with m < N active inputs respectively. With minor modification to the serial algorithm this method can be made to work in the parallel domain. The time complexity of this routing algorithm in a parallel machine with N completely connected processors is O(log^2 N). With m active requests the time complexity goes down to (logmlogN), which is better than the O(log^2 m + logN), reported in the literature for 2^0.5((log^2 -4logN)^0.5-logN)<= ρ <= 1. I also designed multistage symmetric rearrangeable networks using larger switching elements and implement a new routing algorithm for these classes of networks. The network topology and routing algorithms presented in this thesis should allow large scale networks of modest cost, with low setup times and moderate blocking rates, to be constructed. Such switching networks will be required to meet the bandwidth requirements of future communication networks

    On the design and implementation of broadcast and global combine operations using the postal model

    Get PDF
    There are a number of models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model a parameter λ is used to model the communication latency of the message-passing system. Each node during each round can send a fixed-size message and, simultaneously, receive a message of the same size. Furthermore, a message sent out during round r will incur a latency of hand will arrive at the receiving node at round r + λ - 1. Our goal in this paper is to bridge the gap between the theoretical modeling and the practical implementation. In particular, we investigate a number of practical issues related to the design and implementation of two collective communication operations, namely, the broadcast operation and the global combine operation. Those practical issues include, for example, 1) techniques for measurement of the value of λ on a given machine, 2) creating efficient broadcast algorithms that get the latency hand the number of nodes n as parameters and 3) creating efficient global combine algorithms for parallel machines with λ which is not an integer. We propose solutions that address those practical issues and present results of an experimental study of the new algorithms on the Intel Delta machine. Our main conclusion is that the postal model can help in performance prediction and tuning, for example, a properly tuned broadcast improves the known implementation by more than 20%

    Building Fault Tollrence within Clouds at Network Level

    Get PDF
    Cloud computing technologies and infrastructure facilities are coming up in a big way making it cost effective for the users to implement their IT based solutions to run business in most cost-effective and economical way. Many intricate issues however, have cropped-up which must be addressed to be able to use clouds the purpose for which they are designed and implemented. Among all, fault tolerance and securing the data stored on the clouds takes most of the importance. Continuous availability of the services is dependent on many factors. Faults bound to happen within a network, software, and platform or within the infrastructure which are all used for establishing the cloud. The network that connects various servers, devices, peripherals etc., have to be fault tolerant to start-with so that intended and un-interrupted services to the user can be made available. A novel network design method that leads to achieve high availability of the network and thereby the cloud itself has been presented in this pape

    The smart supply chain: a conceptual cyclic framework

    Get PDF
    Purpose: The objective of this work is to analyze the characteristics of the smart supply chain (SSC) and to propose a conceptual framework research. Given the pace of current technological change, there is a need to analyze the new features of the SSC, related to digital technologies and the incorporation of services. Design/methodology/approach: A systematic review of the literature is addressed, analyzing the latest studies on the subject. This methodology allows to propose a conceptualization of the SSC and incorporate new elements of analysis. Findings: The results show that much of the innovation and instrumentalization of supply chains involves incorporating digital services to expand their functionalities, especially in terms of agility and connectivity. The servitization of supply chains is therefore a key new feature. Put in relation to other characteristics identified in the literature, a conceptual cyclic framework is proposed for the SSC. Originality/value: This study contributes to strengthening the theoretical foundations of SSCs and serves as a guide for researchers and practitionersPeer Reviewe

    Modeling Network Contention Effects on All-to-All Operations

    Get PDF
    10 pagesOne of the most important collective communication patterns used in scientific applications is the complete exchange, also called All-to-All. Although efficient complete exchange algorithms have been studied for specific networks, general solutions like those available in well-known MPI distributions (e.g. the MPI_Alltoall operation) are strongly influenced by the congestion of network resources. In this paper we present an integrated approach to model the performance of the All-to-All collective operation. Our approach consists in identifying a contention signature that characterizes a given network environment, using it to augment a contention-free communication model. This approach allows an accurate prediction of the performance of the All-to-All operation over different network architectures with a small overhead. This approach is assessed by experimental results using three different network architectures, namely Fast Ethernet, Gigabit Ethernet and Myrinet

    Modelling Network Contention Effects\\ on All-to-All Operations

    Get PDF
    version étendue de l'article publié à CLUSTER2006One of the most important collective communication patterns used in scientific applications is the complete exchange, also called All-to-All. Although efficient complete exchange algorithms have been studied for specific networks, general solutions like those available in well-known MPI distributions (e.g. the MPI_Alltoall operation) are strongly influenced by the congestion of network resources. In this paper we present an integrated approach to model the performance of the All-to-All collective operation. Our approach consists in identifying a contention signature that characterizes a given network environment, using it to augment a contention-free communication model. This approach allows an accurate prediction of the performance of the All-to-All operation over different network architectures with a small overhead. This approach is assessed by experimental results using three different network architectures, namely Fast Ethernet, Gigabit Ethernet and Myrinet
    • 

    corecore