Search CORE

94 research outputs found

The efficiency of greedy routing in hypercubes and butterflies

Author
Publication venue: Massachusetts Institute of Technology, Laboratory for Information and Decision Systems
Publication date: 01/01/1990
Field of study

Includes bibliographical references (p. 24-26).Cover title. "October 1990".Research supported by the ARO. DAAL03-86-K-0171 Research supported by the NSF. ECS-8552419by George D. Stamoulis and John N. Tsitsiklis

DSpace@MIT

A Scalable & Energy Efficient Graphene-Based Interconnection Framework for Intra and Inter-Chip Wireless Communication in Terahertz Band

Author: Saxena Sagar
Publication venue: RIT Scholar Works
Publication date: 01/11/2017
Field of study

Network-on-Chips (NoCs) have emerged as a communication infrastructure for the multi-core System-on-Chips (SoCs). Despite its advantages, due to the multi-hop communication over the metal interconnects, traditional Mesh based NoC architectures are not scalable in terms of performance and energy consumption. Folded architectures such as Torus and Folded Torus were proposed to improve the performance of NoCs while retaining the regular tile-based structure for ease of manufacturing. Ultra-low-latency and low-power express channels between communicating cores have also been proposed to improve the performance of conventional NoCs. However, the performance gain of these approaches is limited due to metal/dielectric based interconnection. Many emerging interconnect technologies such as 3D integration, photonic, Radio Frequency (RF), and wireless interconnects have been envisioned to alleviate the issues of a metal/dielectric interconnect system. However, photonic and RF interconnects need the additional physically overlaid optical waveguides or micro-strip transmission lines to enable data transmission across the NoC. Several on-chip antennas have shown to improve energy efficiency and bandwidth of on-chip data communications. However, the date rates of the mm-wave wireless channels are limited by the state-of-the-art power-efficient transceiver design. Recent research has brought to light novel graphene based antennas operating at THz frequencies. Due to the higher operating frequencies compared to mm-wave transceivers, the data rate that can be supported by these antennas are significantly higher. Higher operating frequencies imply that graphene based antennas are just hundred micrometers in size compared to dimensions in the range of a millimeter of mm-wave antennas. Such reduced dimensions are suitable for integration of several such transceivers in a single NoC for relatively low overheads. In this work, to exploit the benefits of a regular NoC structure in conjunction with emerging Graphene-based wireless interconnect. We propose a toroidal folding based NoC architecture. The novelty of this folding based approach is that we are using low power, high bandwidth, single hop direct point to point wireless links instead of multihop communication that happens through metallic wires. We also propose a novel phased based communication protocol through which multiple wireless links can be made active at a time without having any interference among the transceiver. This offers huge gain in terms of performance as compared to token based mechanism where only a single wireless link can be made active at a time. We also propose to extend Graphene-based wireless links to enable energy-efficient, phase-based chip-to-chip communication to create a seamless, wireless interconnection fabric for multichip systems as well. Through cycle-accurate system-level simulations, we demonstrate that such designs with torus like folding based on THz links instead of global wires along with the proposed phase based multichip systems. We provide estimates that they are able to provide significant gains (about 3 to 4 times better in terms of achievable bandwidth, packet latency and average packet energy when compared to wired system) in performance and energy efficiency in data transfer in a NoC as well as multichip system. Thus, realization of these kind of interconnection framework that could support high data rate links in Tera-bits-per-second that will alleviate the capacity limitations of current interconnection framework

RIT Scholar Works

Devices and networks for optical switching

Author: Taylor Michael George
Publication venue: UCL (University College London)
Publication date: 01/01/1991
Field of study

This thesis is concerned with some aspects of the application of optics to switching and computing. Two areas are dealt with: the design of switching networks which use optical interconnects, and the development and application of the t-SEED optical logic device. The work on optical interconnects looks at the multistage interconnection network which has been proposed as a hybrid switch using both electronics and optics. It is shown that the architecture can be mapped from one dimensional to two dimensional format, so that the machine makes full use of the space available to the optics. Other mapping rules are described which allow the network to make optimum use of the optical interconnects, and the endpoint is a hybrid optical-electronic machine which should be able to outperform an all-electronic equivalent. The development of the t-SEED optical logic device is described, which is the integration of a phototransistor with a multiple quantum well optical modulator. It is found to be important to have the modulator underneath rather than on top of the transistor to avoid unwanted thyristor action. In order for the transistor to have a high gain the collector must have a low doping level, the exit window in the substrate must be etched all the way to the emitter layer, and the etch must not damage the emitter-base junction. A real optical gain of 1.6 has been obtained, which is higher than has ever been reached before but is not as high as should be possible. Improvements to the device are suggested. A new model of the Fabry-Perot cavity is introduced which helps considerably in the interpretation of experimental measurements made on the quantum well modulators. Also a method of improving the contrast of the multiple quantum well modulator by grading the well widths is proposed which may find application in long wavelength transmission modulators. Some systems which make use of the t-SEED are considered. It is shown that the t-SEED device has the right characteristics for use as a neuron element in the optical implementation of a neural network. A new image processing network for clutter removal in binary images is introduced which uses the t-SEED, and a brief performance analysis suggests that the network may be superior to an all-electronic machine

UCL Discovery

Broadcast-oriented wireless network-on-chip : fundamentals and feasibility

Author: Abadal Cavallé Sergi
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2016
Field of study

Premi extraordinari doctorat UPC curs 2015-2016, àmbit Enginyeria de les TICRecent years have seen the emergence and ubiquitous adoption of Chip Multiprocessors (CMPs), which rely on the coordinated operation of multiple execution units or cores. Successive CMP generations integrate a larger number of cores seeking higher performance with a reasonable cost envelope. For this trend to continue, however, important scalability issues need to be solved at different levels of design. Scaling the interconnect fabric is a grand challenge by itself, as new Network-on-Chip (NoC) proposals need to overcome the performance hurdles found when dealing with the increasingly variable and heterogeneous communication demands of manycore processors. Fast and flexible NoC solutions are needed to prevent communication become a performance bottleneck, situation that would severely limit the design space at the architectural level and eventually lead to the use of software frameworks that are slow, inefficient, or less programmable. The emergence of novel interconnect technologies has opened the door to a plethora of new NoCs promising greater scalability and architectural flexibility. In particular, wireless on-chip communication has garnered considerable attention due to its inherent broadcast capabilities, low latency, and system-level simplicity. Most of the resulting Wireless Network-on-Chip (WNoC) proposals have set the focus on leveraging the latency advantage of this paradigm by creating multiple wireless channels to interconnect far-apart cores. This strategy is effective as the complement of wired NoCs at moderate scales, but is likely to be overshadowed at larger scales by technologies such as nanophotonics unless bandwidth is unrealistically improved. This dissertation presents the concept of Broadcast-Oriented Wireless Network-on-Chip (BoWNoC), a new approach that attempts to foster the inherent simplicity, flexibility, and broadcast capabilities of the wireless technology by integrating one on-chip antenna and transceiver per processor core. This paradigm is part of a broader hybrid vision where the BoWNoC serves latency-critical and broadcast traffic, tightly coupled to a wired plane oriented to large flows of data. By virtue of its scalable broadcast support, BoWNoC may become the key enabler of a wealth of unconventional hardware architectures and algorithmic approaches, eventually leading to a significant improvement of the performance, energy efficiency, scalability and programmability of manycore chips. The present work aims not only to lay the fundamentals of the BoWNoC paradigm, but also to demonstrate its viability from the electronic implementation, network design, and multiprocessor architecture perspectives. An exploration at the physical level of design validates the feasibility of the approach at millimeter-wave bands in the short term, and then suggests the use of graphene-based antennas in the terahertz band in the long term. At the link level, this thesis provides an insightful context analysis that is used, afterwards, to drive the design of a lightweight protocol that reliably serves broadcast traffic with substantial latency improvements over state-of-the-art NoCs. At the network level, our hybrid vision is evaluated putting emphasis on the flexibility provided at the network interface level, showing outstanding speedups for a wide set of traffic patterns. At the architecture level, the potential impact of the BoWNoC paradigm on the design of manycore chips is not only qualitatively discussed in general, but also quantitatively assessed in a particular architecture for fast synchronization. Results demonstrate that the impact of BoWNoC can go beyond simply improving the network performance, thereby representing a possible game changer in the manycore era.Avenços en el disseny de multiprocessadors han portat a una àmplia adopció dels Chip Multiprocessors (CMPs), que basen el seu potencial en la operació coordinada de múltiples nuclis de procés. Generacions successives han anat integrant més nuclis en la recerca d'alt rendiment amb un cost raonable. Per a que aquesta tendència continuï, però, cal resoldre importants problemes d'escalabilitat a diferents capes de disseny. Escalar la xarxa d'interconnexió és un gran repte en ell mateix, ja que les noves propostes de Networks-on-Chip (NoC) han de servir un tràfic eminentment variable i heterogeni dels processadors amb molts nuclis. Són necessàries solucions ràpides i flexibles per evitar que les comunicacions dins del xip es converteixin en el pròxim coll d'ampolla de rendiment, situació que limitaria en gran mesura l'espai de disseny a nivell d'arquitectura i portaria a l'ús d'arquitectures i models de programació lents, ineficients o poc programables. L'aparició de noves tecnologies d'interconnexió ha possibilitat la creació de NoCs més flexibles i escalables. En particular, la comunicació intra-xip sense fils ha despertat un interès considerable en virtut de les seva baixa latència, simplicitat, i bon rendiment amb tràfic broadcast. La majoria de les Wireless NoC (WNoC) proposades fins ara s'han centrat en aprofitar l'avantatge en termes de latència d'aquest nou paradigma creant múltiples canals sense fils per interconnectar nuclis allunyats entre sí. Aquesta estratègia és efectiva per complementar a NoCs clàssiques en escales mitjanes, però és probable que altres tecnologies com la nanofotònica puguin jugar millor aquest paper a escales més grans. Aquesta tesi presenta el concepte de Broadcast-Oriented WNoC (BoWNoC), un nou enfoc que intenta rendibilitzar al màxim la inherent simplicitat, flexibilitat, i capacitats broadcast de la tecnologia sense fils integrant una antena i transmissor/receptor per cada nucli del processador. Aquest paradigma forma part d'una visió més àmplia on un BoWNoC serviria tràfic broadcast i urgent, mentre que una xarxa convencional serviria fluxos de dades més pesats. En virtut de la escalabilitat i del seu suport broadcast, BoWNoC podria convertir-se en un element clau en una gran varietat d'arquitectures i algoritmes poc convencionals que milloressin considerablement el rendiment, l'eficiència, l'escalabilitat i la programabilitat de processadors amb molts nuclis. El present treball té com a objectius no només estudiar els aspectes fonamentals del paradigma BoWNoC, sinó també demostrar la seva viabilitat des dels punts de vista de la implementació, i del disseny de xarxa i arquitectura. Una exploració a la capa física valida la viabilitat de l'enfoc usant tecnologies longituds d'ona milimètriques en un futur proper, i suggereix l'ús d'antenes de grafè a la banda dels terahertz ja a més llarg termini. A capa d'enllaç, la tesi aporta una anàlisi del context de l'aplicació que és, més tard, utilitzada per al disseny d'un protocol d'accés al medi que permet servir tràfic broadcast a baixa latència i de forma fiable. A capa de xarxa, la nostra visió híbrida és avaluada posant èmfasi en la flexibilitat que aporta el fet de prendre les decisions a nivell de la interfície de xarxa, mostrant grans millores de rendiment per una àmplia selecció de patrons de tràfic. A nivell d'arquitectura, l'impacte que el concepte de BoWNoC pot tenir sobre el disseny de processadors amb molts nuclis no només és debatut de forma qualitativa i genèrica, sinó també avaluat quantitativament per una arquitectura concreta enfocada a la sincronització. Els resultats demostren que l'impacte de BoWNoC pot anar més enllà d'una millora en termes de rendiment de xarxa; representant, possiblement, un canvi radical a l'era dels molts nuclisAward-winningPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Broadcast-oriented wireless network-on-chip : fundamentals and feasibility

Author: Abadal Cavallé Sergi
Publication venue: Universitat Politècnica de Catalunya
Publication date: 15/07/2016
Field of study

UPCommons. Portal del coneixement obert de la UPC

Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)

Author: Huck R. W.
Rafferty William
Reekie D. Hugh M.
Publication venue
Publication date
Field of study

Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression

NASA Technical Reports Server

Next-Gen Hybrid Memory and Interconnect System Architectures

Author: Wen Fei
Publication venue
Publication date: 02/05/2021
Field of study

This dissertation mainly addresses two problems that emerge along with the 'big data' trend: the increasing demands of memory capacity for mobile computing platform, and the needs for interconnection network with higher bandwidth/energy efficiency in the HPC/Data Center. The current mobile applications have rapidly growing memory footprints, posing a great challenge for memory system design. Insufficient DRAM main memory will incur frequent data swaps between memory and storage, a process that hurts performance, consumes energy and deteriorates the write endurance of typical flash storage devices. Alternately, a larger DRAM has higher leakage power and drains the battery faster. Further, DRAM scaling trends make further growth of DRAM in the mobile space prohibitive due to cost. Emerging non-volatile memory (NVM) has the potential to alleviate these issues due to its higher capacity per cost than DRAM and minimal static power. Recently, a wide spectrum of NVM technologies, including phase-change memories (PCM), memristor, and 3D XPoint have emerged. Despite the mentioned advantages, NVM has longer access latency compared to DRAM and NVM writes can incur higher latencies and wear costs. Therefore integration of these new memory technologies in the memory hierarchy requires a fundamental rearchitecting of traditional system designs. In this work, we propose a hardware-accelerated memory manager (HMMU) that addresses both types of memory in a flat space address space. We design a set of data placement and data migration policies within this memory manager, such that we may exploit the advantages of each memory technology. By augmenting the system with this HMMU, we reduce the overall memory latency while also reducing energy consumption and writes to the NVM. Experimental results show that our design achieves a 39% reduction in energy consumption with only a 12% performance degradation versus an all-DRAM baseline that is likely untenable in the future. After developing the pure hardware memory management for the data migration between DRAM and NVM, we consider to integrate information from the software stack into our system. These software information, such as programmers' hints or application profiling results, reveals the longer-term memory access pattern and data object properties; but they come at the cost of high software latency. Hardware approaches can avoid the latencies of software kernel processes related to page migration, such as page fault handling. However, hardware's vision is limited to a short time window, as it can only monitor and analyze the recently received memory requests. Ideally, the execution time advantages of pure hardware approaches, should be combined with the data object properties in a global scope. Further, application programmer's hints could guide the data placement at the allocation time, thus data objects with similar property could be congregated to reduce unnecessary page migrations. In this work, we propose such a hardware-software cooperative approach. In particular, we built a heap memory manager that allows the programmer to choose the memory type for each data object allocation. Such denotations are relayed to the hardware memory manager as hints for the decisions on data placement and migration. Meanwhile the hardware memory manager is still capable of capturing the per-application phase changes and maintaining flexibility in its data redistribution. The integration of the two mechanisms leads to optimal results from both long-term and short-term aspects. Experiment results show that our design shortens the overall memory latency while also reducing energy consumption and writes to the NVM versus prior approaches. Our design achieves a 40% reduction in energy consumption with only a 16% performance degradation versus the all-DRAM memory system. As for the HPC/Data domain, a primary problem is how to scale up the interconnection network to service the ever-increasing number of nodes. Photonic-links, with its high bandwidth and low signal loss across long distance propagation, is a promising technology to solve this problem. The higher bandwidth allows the router to connect more nodes while the long-distance connection makes it possible to implement more advanced typologies, such as the flattened butterfly. Both factors help to reduce the average number of hops between nodes across the network. Such high-radix and short distance network is essential to provisioning low latency communications in massive scale systems. However, due to the different physical and device properties, interconnection network needs redesign to adopt the photonic links. We first listed the basic formulas and design flow for interconnection network, and introduced a highly efficient event-driven simulator. Then we conducted a series of experiments to explore the design space, and gave a quantitative comparison between interconnection networks made of pure electrical links and those with electronic/photonic hybrid design

Texas A&M Repository

Development of the readout electronics for the high luminosity upgrade of the CMS outer strip tracker

Author: Braga Davide
Publication venue: Physics, Imperial College London
Publication date: 01/06/2016
Field of study

The High-luminosity upgrade of the LHC will deliver the dramatic increase in luminosity required for precision measurements and to probe Beyond the Standard Model theories. At the same time, it will present unprecedented challenges in terms of pileup and radiation degradation. The CMS experiment is set for an extensive upgrade campaign, which includes the replacement of the current Tracker with another all-silicon detector with improved performance and reduced mass. One of the most ambitious aspects of the future Tracker will be the ability to identify high transverse momentum track candidates at every bunch crossing and with very low latency, in order to include tracking information at the L1 hardware trigger stage, a critical and effective step to achieve triggers with high purity and low threshold. This thesis presents the development and the testing of the CMS Binary Chip 2 (CBC2), a prototype Application Specific Integrated Circuit (ASIC) for the binary front-end readout of silicon strip detectors modules in the Outer Tracker, which also integrates the logic necessary to identify high transverse momentum candidates by correlating hits from two silicon strip detectors, separated by a few millimetres. The design exploits the relation between the transverse momentum and the curvature in the trajectory of charged particles subject to the large magnetic field of CMS. The logic which follows the analogue amplification and binary conversion rejects clusters wider than a programmable maximum number of adjacent strips, compensates for the geometrical offset in the alignment of the module, and correlates the hits between the two sensor layers. Data are stored in a memory buffer before being transferred to an additional buffer stage and being serially read-out upon receipt of a Level 1 trigger. The CBC2 has been subject to extensive testing since its production in January 2013: this work reports the results of electrical characterization, of the total ionizing dose irradiation tests, and the performance of a prototype module instrumented with CBC2 in realistic conditions in a beam test. The latter is the first experimental demonstration of the Pt-selection principle central to the future of CMS. Several total-ionizing-dose tests highlighted no functional issue, but observed significant excess static current for doses <1 Mrad. The source of the excess was traced to static leakage current in the memory pipeline, and is believed to be a consequence of the high instantaneous dose delivered by the x-ray setup. Nevertheless, a new SRAM layout aimed at removing the leakage path was proposed for the CBC3. The results of single event upset testing of the chip are also reported, two of the three distinct memory circuits used in the chip were proven to meet the expected robustness, while the third will be replaced in the next iteration of the chip. Finally, the next version of the ASIC is presented, highlighting the additional features of the final prototype, such as half-strip resolution, additional trigger logic functionality, longer trigger latency and higher rate, and fully synchronous stub readout.Open Acces

Spiral - Imperial College Digital Repository

Digital signal conditioning on multiprocessor systems

Author: Gould Lee
Publication venue
Publication date: 01/01/1992
Field of study

An important application area of modem computer systems is that of digital signal processing. This discipline is concerned with the analysis or modification of digitally represented signals, through the use of simple mathematical operations. A primary need of such systems is that of high data throughput. Although optimised programmable processors are available, system designers are now looking towards parallel processing to gain further performance increases. Such parallel systems may be easily constructed using the transputer family of processors. However, although these devices are comparatively easy to program, they possess a general von Neumann core and so are relatively inefficient at implementing digital signal processing algorithms. The power of the transputer lies in its ability to communicate effectively, not in its computational capability. The converse is true of specialised digital signal processors. These devices have been designed specifically to implement the type of small data intensive operations required by digital signal processing algorithms, but have not been designed to operate efficiently in a multiprocessor environment. This thesis examines the performance of both types of processors with reference to a common signal processing application, multichannel filtering. The transputer is examined in both uniprocessor and multiprocessor configurations, and its performance analysed. A theoretical model of program behaviour is developed, in order to assess the performance benefits of particular code structures and the effects of such parameters as data block size. The transputer implementation is contrasted with that of the Motorola DSP56001 digital signal processor. This device is found to be much more efficient at implementing such algorithms on a single device, but provides limited multiprocessor support. Using the conclusions of this assessment, a hybrid multiprocessor has been designed. This consists of a transputer controlling a number of signal processors, communicating through shared memory, separating tiie tasks of computation and communication. Forcing the transputer to communicate through shared memory causes problems, and these have been addressed. A theoretical performance model of the system has been produced. A small system has been constructed, and is currently running performance test software

Durham e-Theses

Recommended from our members

Silicon photonic switching: from building block design to intelligent control

Author: Huang Yishen
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

The rapid growth in data communication technologies is at the heart of enriching the digital experiences for people around the world. Encoding high bandwidth data to the optical domain has drastically changed the bandwidth-distance trade-off imposed by electrical media. Silicon photonics, sharing the technological maturity of the semiconductor industry, is a platform poised to make optical interconnect components more robust, manufacturable, and ubiquitous. One of the most prominent device classes enabled by the silicon photonics platform is photonic switching, which describes the direct routing of optical signal carriers without the optical-electrical-optical conversions. While theoretical designs and prototypes of monolithic silicon photonic switch devices have been studied, realizing high-performance and feasible switch systems requires explorations of all design aspects from basic building blocks to control systems. This thesis provides a holistic collection of studies on silicon photonic switching in topics of novel switching element designs, multi-stage switch architectures, device calibration, topology scalability, smart routing strategies, and performance-aware control plane. First, component designs for assembling a silicon photonic switch device are presented. Structures that perform 2×2 optical switching functions are introduced. To realize switching granularities in both spatial and spectral domains, a resonator-assisted Mach-Zehnder interferometer design is demonstrated with high performance and design robustness. Next, multi-stage monolithic switching devices with microring resonator-based switching elements are investigated. An 8×8 switch device with dual-microring switching elements is presented with a well-balanced set of performance metrics in extinction ratio, crosstalk suppression, and optical bandwidth. Continued scaling in the switch port count requires both an economic increase in the number of switching elements integrated in a device and the preservation of signal quality through the switch fabric. A highly scalable switch architecture based on Clos network with microring switch-and-select sub-switches is presented as a solution to reach high switch radices while addressing key factors of insertion loss, crosstalk, and optical passband to ensure end-to-end switching performance. The thesis then explores calibration techniques to acquire and optimize system-wide control points for integrated silicon switch devices. Applicable to common rearrangeably non-blocking switch topologies, automated procedures are developed to calibrate entire switch devices without the need for built-in power monitors. Using Mach-Zehnder interferometer-based switching elements as a demonstration, calibration techniques for optimal control points are introduced to achieve balanced push-pull drive scheme and reduced crosstalk in switching operations. Furthermore, smart routing strategies are developed based on optical penalty estimations enabled by expedited lightpath characterization procedures. Leveraging configuration redundancies in the switch fabric, the routing strategies are capable of avoiding the worst penalty optical paths and effectively elevate the bottom-line performance of the switch device. Additional works are also presented on enhancing optical system control planes with machine learning techniques to accurately characterize complex systems and identify critical control parameters. Using flexgrid networks as a case study, light-weight machine learning workflows are tailored to devise control strategies for improving spectral power stability during wavelength assignment and defragmentation. This work affirms the efficacy of intelligent control planes to predict system dynamics and drive performance optimizations for optical interconnect systems

Columbia University Academic Commons