Search CORE

11 research outputs found

High Peformance and Low Power On-Die Interconnect Fabrics.

Author: Satpathy Sudhir Kumar
Publication venue
Publication date
Field of study

Increasing power density with technology scaling has caused stagnation in operating frequency of modern day microprocessors. This has led designers to prefer multicore architectures over complex monolithic processors to keep up with the demand for rising computing throughput. Although processing units are getting smaller and simpler, the dramatic rise of their count on a single die has made the fabric that connects these processing units increasingly complex. These interconnect fabrics have become a bottleneck in improving overall system effciency. As a result, the design paradigm for multi-core chips is gradually shifting from a core-centric architecture towards an interconnect-centric architecture, where system efficiency is limited by the fabric rather than the processing ability of any individual core. This dissertation introduces three novel and synergistic circuit techniques to improve scalability of switch fabrics to make on-die integration of hundreds to thousands of cores feasible. 1) A matrix topology is proposed for designing a fully connected switch fabric that re-uses output buses for programming, and stores shue congurations at cross points. This significantly reduces routing congestion, lowers area/power, and improves per- formance. Silicon measurements demonstrate 47% energy savings in a 64-lane SIMD processor fabricated in 65nm CMOS over a conventional implementation. 2) A novel approach to handle high radix arbitration along with data routing is proposed. It optimally uses existing cross-bar interconnect resources without requiring any additional overhead. Bandwidth exceeding 2Tb/s is recorded in a test prototype fabricated in 65nm. 3) Building on the later, a new circuit topology to manage and update priority adaptively within the switch fabric without incurring additional delay or area is then proposed. Several assist circuit techniques, such as a thyristor based sense amplifier and self regenerating bi-directional repeaters are proposed for high speed energy efficient signaling to and from the switch fabric to improve overall routing efficiency. Using these techniques a 64 x 64 switch fabric with 128b data bus fabricated in 45nm achieves a throughput of 4.5Tb/s at single cycle latency while operating at 559MHz.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91506/1/sudhirks_1.pd

Deep Blue Documents at the University of Michigan

EXPLORING PHOTONIC BENEA SWITCHING FABRICS FOR FUTURE HPC AND DATACENTRES

Author: Kynigos Markos
Publication venue
Publication date: 01/08/2022
Field of study

The University of Manchester - Institutional Repository

The connection machine

Author: Hillis William Daniel
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1988
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1988.Bibliography: leaves 134-157.by William Daniel Hillis.Ph.D

DSpace@MIT

Small-world interconnection networks for large parallel computer systems

Author: Rodríguez Salazar Fernando
Publication venue
Publication date: 01/01/2004
Field of study

The use of small-world graphs as interconnection networks of multicomputers is proposed and analysed in this work. Small-world interconnection networks are constructed by adding (or modifying) edges to an underlying local graph. Graphs with a rich local structure but with a large diameter are shown to be the most suitable candidates for the underlying graph. Generation models based on random and deterministic wiring processes are proposed and analysed. For the random case basic properties such as degree, diameter, average length and bisection width are analysed, and the results show that a fast transition from a large diameter to a small diameter is experienced when the number of new edges introduced is increased. Random traffic analysis on these networks is undertaken, and it is shown that although the average latency experiences a similar reduction, networks with a small number of shortcuts have a tendency to saturate as most of the traffic flows through a small number of links. An analysis of the congestion of the networks corroborates this result and provides away of estimating the minimum number of shortcuts required to avoid saturation. To overcome these problems deterministic wiring is proposed and analysed. A Linear Feedback Shift Register is used to introduce shortcuts in the LFSR graphs. A simple routing algorithm has been constructed for the LFSR and extended with a greedy local optimisation technique. It has been shown that a small search depth gives good results and is less costly to implement than a full shortest path algorithm. The Hilbert graph on the other hand provides some additional characteristics, such as support for incremental expansion, efficient layout in two dimensional space (using two layers), and a small fixed degree of four. Small-world hypergraphs have also been studied. In particular incomplete hypermeshes have been introduced and analysed and it has been shown that they outperform the complete traditional implementations under a constant pinout argument. Since it has been shown that complete hypermeshes outperform the mesh, the torus, low dimensional m-ary d-cubes (with and without bypass channels), and multi-stage interconnection networks (when realistic decision times are accounted for and with a constant pinout), it follows that incomplete hypermeshes outperform them as well

Glasgow Theses Service

OpenGrey Repository

Quantitative performance evaluation of SCI memory hierarchies

Author: Hexsel Roberto A.
Publication venue: The University of Edinburgh
Publication date: 01/01/1994
Field of study

Edinburgh Research Archive

NOC 2004 proceedings:NOC 2004 Proceedings : 9th European conference on networks & optical communications, Technical University of Eindhoven (TUE), Eindhoven, The Netherlands, June 29 - July 1, 2004

Author
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2004
Field of study

Pure OAI Repository

NOC 2004 proceedings:NOC 2004 Proceedings : 9th European conference on networks & optical communications, Technical University of Eindhoven (TUE), Eindhoven, The Netherlands, June 29 - July 1, 2004

Author
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2004
Field of study

Pure OAI Repository

Modelling, Dimensioning and Optimization of 5G Communication Networks, Resources and Services

Author
Publication venue: 'MDPI AG'
Publication date: 02/02/2023
Field of study

This reprint aims to collect state-of-the-art research contributions that address challenges in the emerging 5G networks design, dimensioning and optimization. Designing, dimensioning and optimization of communication networks resources and services have been an inseparable part of telecom network development. The latter must convey a large volume of traffic, providing service to traffic streams with highly differentiated requirements in terms of bit-rate and service time, required quality of service and quality of experience parameters. Such a communication infrastructure presents many important challenges, such as the study of necessary multi-layer cooperation, new protocols, performance evaluation of different network parts, low layer network design, network management and security issues, and new technologies in general, which will be discussed in this book

Directory of Open Access Books (DOAB)

ATM virtual connection performance modeling

Author: Rijnsoever van, B.J.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1997
Field of study

Pure OAI Repository

Simulation Modelling of Distributed-Shared Memory Multiprocessors

Author: Marurngsith Worawan
Publication venue: University of Edinburgh. College of Science and Engineering. School of Informatics
Publication date: 01/02/2006
Field of study

Institute for Computing Systems ArchitectureDistributed shared memory (DSM) systems have been recognised as a compelling platform for parallel computing due to the programming advantages and scalability. DSM systems allow applications to access data in a logically shared address space by abstracting away the distinction of physical memory location. As the location of data is transparent, the sources of overhead caused by accessing the distant memories are difficult to analyse. This memory locality problem has been identified as crucial to DSM performance. Many researchers have investigated the problem using simulation as a tool for conducting experiments resulting in the progressive evolution of DSM systems. Nevertheless, both the diversity of architectural configurations and the rapid advance of DSM implementations impose constraints on simulation model designs in two issues: the limitation of the simulation framework on model extensibility and the lack of verification applicability during a simulation run causing the delay in verification process. This thesis studies simulation modelling techniques for memory locality analysis of various DSM systems implemented on top of a cluster of symmetric multiprocessors. The thesis presents a simulation technique to promote model extensibility and proposes a technique for verification applicability, called a Specification-based Parameter Model Interaction (SPMI). The proposed techniques have been implemented in a new interpretation-driven simulation called DSiMCLUSTER on top of a discrete event simulation (DES) engine known as HASE. Experiments have been conducted to determine which factors are most influential on the degree of locality and to determine the possibility to maximise the stability of performance. DSiMCLUSTER has been validated against a SunFire 15K server and has achieved similarity of cache miss results, an average of +-6% with the worst case less than 15% of difference. These results confirm that the techniques used in developing the DSiMCLUSTER can contribute ways to achieve both (a) a highly extensible simulation framework to keep up with the ongoing innovation of the DSM architecture, and (b) the verification applicability resulting in an efficient framework for memory analysis experiments on DSM architecture

Edinburgh Research Archive