30 research outputs found
BPLight-CNN: A Photonics-based Backpropagation Accelerator for Deep Learning
Training deep learning networks involves continuous weight updates across the
various layers of the deep network while using a backpropagation algorithm
(BP). This results in expensive computation overheads during training.
Consequently, most deep learning accelerators today employ pre-trained weights
and focus only on improving the design of the inference phase. The recent trend
is to build a complete deep learning accelerator by incorporating the training
module. Such efforts require an ultra-fast chip architecture for executing the
BP algorithm. In this article, we propose a novel photonics-based
backpropagation accelerator for high performance deep learning training. We
present the design for a convolutional neural network, BPLight-CNN, which
incorporates the silicon photonics-based backpropagation accelerator.
BPLight-CNN is a first-of-its-kind photonic and memristor-based CNN
architecture for end-to-end training and prediction. We evaluate BPLight-CNN
using a photonic CAD framework (IPKISS) on deep learning benchmark models
including LeNet and VGG-Net. The proposed design achieves (i) at least 34x
speedup, 34x improvement in computational efficiency, and 38.5x energy savings,
during training; and (ii) 29x speedup, 31x improvement in computational
efficiency, and 38.7x improvement in energy savings, during inference compared
to the state-of-the-art designs. All these comparisons are done at a 16-bit
resolution; and BPLight-CNN achieves these improvements at a cost of
approximately 6% lower accuracy compared to the state-of-the-art
COMET: A Cross-Layer Optimized Optical Phase Change Main Memory Architecture
Traditional DRAM-based main memory systems face several challenges with
memory refresh overhead, high latency, and low throughput as the industry moves
towards smaller DRAM cells. These issues have been exacerbated by the emergence
of data-intensive applications in recent years. Memories based on phase change
materials (PCMs) offer promising solutions to these challenges. PCMs store data
in the material's phase, which can shift between amorphous and crystalline
states when external thermal energy is supplied. This is often achieved using
electrical pulses. Alternatively, using laser pulses and integration with
silicon photonics offers a unique opportunity to realize high-bandwidth and
low-latency photonic memories. Such a memory system may in turn open the
possibility of realizing fully photonic computing systems. But to realize
photonic memories, several challenges that are unique to the photonic domain
such as crosstalk, optical loss management, and laser power overhead have to be
addressed. In this work, we present COMET, the first cross-layer optimized
optical main memory architecture that uses PCMs. In architecting COMET, we
explore how to use silicon photonics and PCMs together to design a large-scale
main memory system while addressing associated challenges. We explore
challenges and propose solutions at the PCM cell, photonic memory circuit, and
memory architecture levels. Based on our evaluations, COMET offers 7.1x better
bandwidth, 15.1x lower EPB, and 3x lower latencies than the best-known prior
work on photonic main memory architecture design
Network-on-Chip
Limitations of bus-based interconnections related to scalability, latency, bandwidth, and power consumption for supporting the related huge number of on-chip resources result in a communication bottleneck. These challenges can be efficiently addressed with the implementation of a network-on-chip (NoC) system. This book gives a detailed analysis of various on-chip communication architectures and covers different areas of NoCs such as potentials, architecture, technical challenges, optimization, design explorations, and research directions. In addition, it discusses current and future trends that could make an impactful and meaningful contribution to the research and design of on-chip communications and NoC systems
Towards zero latency photonic switching in shared memory networks
Photonic networks-on-chip based on silicon photonics have been proposed to reduce latency and power consumption in future chip multi-core processors (CMP). However, high performance CMPs use a shared memory model which generates large numbers of short messages, creating high arbitration latency overhead for photonic switching networks. In this paper we explore techniques which intelligently use information from the memory hierarchy to predict communication in order to setup photonic circuits with reduced or eliminated arbitration latency. Firstly, we present a switch scheduling algorithm which arbitrates on a per memory transaction basis and holds open photonic circuits to exploit temporal locality. We show that this can reduce the average arbitration latency overhead by 60% and eliminate arbitration latency altogether for a signi cant proportion of memory transactions. We then show how this technique can be applied to multiple-socket shared memory systems with low latency and energy consumption penalties. Finally, we present ideas and initial results to demonstrate that cache miss prediction could be used to set up photonic circuits for more complex memory transactions and main memory accesses
Towards zero latency photonic switching in shared memory networks
Photonic networks-on-chip based on silicon photonics have been proposed to reduce latency and power consumption in future chip multi-core processors (CMP). However, high performance CMPs use a shared memory model which generates large numbers of short messages, creating high arbitration latency overhead for photonic switching networks. In this paper we explore techniques which intelligently use information from the memory hierarchy to predict communication in order to setup photonic circuits with reduced or eliminated arbitration latency. Firstly, we present a switch scheduling algorithm which arbitrates on a per memory transaction basis and holds open photonic circuits to exploit temporal locality. We show that this can reduce the average arbitration latency overhead by 60% and eliminate arbitration latency altogether for a signicant proportion of memory transactions. We then show how this technique can be applied to multiple-socket shared memory systems with low latency and energy consumption penalties. Finally, we present ideas and initial results to demonstrate that cache miss prediction could be used to set up photonic circuits for more complex memory transactions and main memory accesses
On-Chip Optical Interconnection Networks for Multi/Manycore Architectures
The rapid development of multi/manycore technologies offers the opportunity for highly parallel architectures implemented on a single chip. While the first, low-parallelism multicore products have been based on simple interconnection structures (single bus, very simple crossbar), the emerging highly parallel architectures will require complex, limited-degree interconnection networks. This thesis studies this trend according to the general theory of interconnection structures for parallel machines, and investigates some solutions in terms of performance, cost, fault-tolerance, and run-time support to shared-memory and/or message passing programming mechanisms
Broadcast-oriented wireless network-on-chip : fundamentals and feasibility
Premi extraordinari doctorat UPC curs 2015-2016, Ã mbit Enginyeria de les TICRecent years have seen the emergence and ubiquitous adoption of Chip Multiprocessors (CMPs), which rely on the coordinated operation of multiple execution units or cores. Successive CMP generations integrate a larger number of cores seeking higher performance with a reasonable cost envelope. For this trend to continue, however, important scalability issues need to be solved at different levels of design. Scaling the interconnect fabric is a grand challenge by itself, as new Network-on-Chip (NoC) proposals need to overcome the performance hurdles found when dealing with the increasingly variable and heterogeneous communication demands of manycore processors. Fast and flexible NoC solutions are needed to prevent communication become a performance bottleneck, situation that would severely limit the design space at the architectural level and eventually lead to the use of software frameworks that are slow, inefficient, or less programmable.
The emergence of novel interconnect technologies has opened the door to a plethora of new NoCs promising greater scalability and architectural flexibility. In particular, wireless on-chip communication has garnered considerable attention due to its inherent broadcast capabilities, low latency, and system-level simplicity. Most of the resulting Wireless Network-on-Chip (WNoC) proposals have set the focus on leveraging the latency advantage of this paradigm by creating multiple wireless channels to interconnect far-apart cores. This strategy is effective as the complement of wired NoCs at moderate scales, but is likely to be overshadowed at larger scales by technologies such as nanophotonics unless bandwidth is unrealistically improved.
This dissertation presents the concept of Broadcast-Oriented Wireless Network-on-Chip (BoWNoC), a new approach that attempts to foster the inherent simplicity, flexibility, and broadcast capabilities of the wireless technology by integrating one on-chip antenna and transceiver per processor core. This paradigm is part of a broader hybrid vision where the BoWNoC serves latency-critical and broadcast traffic, tightly coupled to a wired plane oriented to large flows of data. By virtue of its scalable broadcast support, BoWNoC may become the key enabler of a wealth of unconventional hardware architectures and algorithmic approaches, eventually leading to a significant improvement of the performance, energy efficiency, scalability and programmability of manycore chips.
The present work aims not only to lay the fundamentals of the BoWNoC paradigm, but also to demonstrate its viability from the electronic implementation, network design, and multiprocessor architecture perspectives. An exploration at the physical level of design validates the feasibility of the approach at millimeter-wave bands in the short term, and then suggests the use of graphene-based antennas in the terahertz band in the long term. At the link level, this thesis provides an insightful context analysis that is used, afterwards, to drive the design of a lightweight protocol that reliably serves broadcast traffic with substantial latency improvements over state-of-the-art NoCs. At the network level, our hybrid vision is evaluated putting emphasis on the flexibility provided at the network interface level, showing outstanding speedups for a wide set of traffic patterns. At the architecture level, the potential impact of the BoWNoC paradigm on the design of manycore chips is not only qualitatively discussed in general, but also quantitatively assessed in a particular architecture for fast synchronization. Results demonstrate that the impact of BoWNoC can go beyond simply improving the network performance, thereby representing a possible game changer in the manycore era.Avenços en el disseny de multiprocessadors han portat a una à mplia adopció dels Chip Multiprocessors (CMPs), que basen el seu potencial en la operació coordinada de múltiples nuclis de procés. Generacions successives han anat integrant més nuclis en la recerca d'alt rendiment amb un cost raonable. Per a que aquesta tendència continuï, però, cal resoldre importants problemes d'escalabilitat a diferents capes de disseny. Escalar la xarxa d'interconnexió és un gran repte en ell mateix, ja que les noves propostes de Networks-on-Chip (NoC) han de servir un trà fic eminentment variable i heterogeni dels processadors amb molts nuclis. Són necessà ries solucions rà pides i flexibles per evitar que les comunicacions dins del xip es converteixin en el pròxim coll d'ampolla de rendiment, situació que limitaria en gran mesura l'espai de disseny a nivell d'arquitectura i portaria a l'ús d'arquitectures i models de programació lents, ineficients o poc programables. L'aparició de noves tecnologies d'interconnexió ha possibilitat la creació de NoCs més flexibles i escalables. En particular, la comunicació intra-xip sense fils ha despertat un interès considerable en virtut de les seva baixa latència, simplicitat, i bon rendiment amb trà fic broadcast. La majoria de les Wireless NoC (WNoC) proposades fins ara s'han centrat en aprofitar l'avantatge en termes de latència d'aquest nou paradigma creant múltiples canals sense fils per interconnectar nuclis allunyats entre sÃ. Aquesta estratègia és efectiva per complementar a NoCs clà ssiques en escales mitjanes, però és probable que altres tecnologies com la nanofotònica puguin jugar millor aquest paper a escales més grans. Aquesta tesi presenta el concepte de Broadcast-Oriented WNoC (BoWNoC), un nou enfoc que intenta rendibilitzar al mà xim la inherent simplicitat, flexibilitat, i capacitats broadcast de la tecnologia sense fils integrant una antena i transmissor/receptor per cada nucli del processador. Aquest paradigma forma part d'una visió més à mplia on un BoWNoC serviria trà fic broadcast i urgent, mentre que una xarxa convencional serviria fluxos de dades més pesats. En virtut de la escalabilitat i del seu suport broadcast, BoWNoC podria convertir-se en un element clau en una gran varietat d'arquitectures i algoritmes poc convencionals que milloressin considerablement el rendiment, l'eficiència, l'escalabilitat i la programabilitat de processadors amb molts nuclis. El present treball té com a objectius no només estudiar els aspectes fonamentals del paradigma BoWNoC, sinó també demostrar la seva viabilitat des dels punts de vista de la implementació, i del disseny de xarxa i arquitectura. Una exploració a la capa fÃsica valida la viabilitat de l'enfoc usant tecnologies longituds d'ona milimètriques en un futur proper, i suggereix l'ús d'antenes de grafè a la banda dels terahertz ja a més llarg termini. A capa d'enllaç, la tesi aporta una anà lisi del context de l'aplicació que és, més tard, utilitzada per al disseny d'un protocol d'accés al medi que permet servir trà fic broadcast a baixa latència i de forma fiable. A capa de xarxa, la nostra visió hÃbrida és avaluada posant èmfasi en la flexibilitat que aporta el fet de prendre les decisions a nivell de la interfÃcie de xarxa, mostrant grans millores de rendiment per una à mplia selecció de patrons de trà fic. A nivell d'arquitectura, l'impacte que el concepte de BoWNoC pot tenir sobre el disseny de processadors amb molts nuclis no només és debatut de forma qualitativa i genèrica, sinó també avaluat quantitativament per una arquitectura concreta enfocada a la sincronització. Els resultats demostren que l'impacte de BoWNoC pot anar més enllà d'una millora en termes de rendiment de xarxa; representant, possiblement, un canvi radical a l'era dels molts nuclisAward-winningPostprint (published version
Broadcast-oriented wireless network-on-chip : fundamentals and feasibility
Premi extraordinari doctorat UPC curs 2015-2016, Ã mbit Enginyeria de les TICRecent years have seen the emergence and ubiquitous adoption of Chip Multiprocessors (CMPs), which rely on the coordinated operation of multiple execution units or cores. Successive CMP generations integrate a larger number of cores seeking higher performance with a reasonable cost envelope. For this trend to continue, however, important scalability issues need to be solved at different levels of design. Scaling the interconnect fabric is a grand challenge by itself, as new Network-on-Chip (NoC) proposals need to overcome the performance hurdles found when dealing with the increasingly variable and heterogeneous communication demands of manycore processors. Fast and flexible NoC solutions are needed to prevent communication become a performance bottleneck, situation that would severely limit the design space at the architectural level and eventually lead to the use of software frameworks that are slow, inefficient, or less programmable.
The emergence of novel interconnect technologies has opened the door to a plethora of new NoCs promising greater scalability and architectural flexibility. In particular, wireless on-chip communication has garnered considerable attention due to its inherent broadcast capabilities, low latency, and system-level simplicity. Most of the resulting Wireless Network-on-Chip (WNoC) proposals have set the focus on leveraging the latency advantage of this paradigm by creating multiple wireless channels to interconnect far-apart cores. This strategy is effective as the complement of wired NoCs at moderate scales, but is likely to be overshadowed at larger scales by technologies such as nanophotonics unless bandwidth is unrealistically improved.
This dissertation presents the concept of Broadcast-Oriented Wireless Network-on-Chip (BoWNoC), a new approach that attempts to foster the inherent simplicity, flexibility, and broadcast capabilities of the wireless technology by integrating one on-chip antenna and transceiver per processor core. This paradigm is part of a broader hybrid vision where the BoWNoC serves latency-critical and broadcast traffic, tightly coupled to a wired plane oriented to large flows of data. By virtue of its scalable broadcast support, BoWNoC may become the key enabler of a wealth of unconventional hardware architectures and algorithmic approaches, eventually leading to a significant improvement of the performance, energy efficiency, scalability and programmability of manycore chips.
The present work aims not only to lay the fundamentals of the BoWNoC paradigm, but also to demonstrate its viability from the electronic implementation, network design, and multiprocessor architecture perspectives. An exploration at the physical level of design validates the feasibility of the approach at millimeter-wave bands in the short term, and then suggests the use of graphene-based antennas in the terahertz band in the long term. At the link level, this thesis provides an insightful context analysis that is used, afterwards, to drive the design of a lightweight protocol that reliably serves broadcast traffic with substantial latency improvements over state-of-the-art NoCs. At the network level, our hybrid vision is evaluated putting emphasis on the flexibility provided at the network interface level, showing outstanding speedups for a wide set of traffic patterns. At the architecture level, the potential impact of the BoWNoC paradigm on the design of manycore chips is not only qualitatively discussed in general, but also quantitatively assessed in a particular architecture for fast synchronization. Results demonstrate that the impact of BoWNoC can go beyond simply improving the network performance, thereby representing a possible game changer in the manycore era.Avenços en el disseny de multiprocessadors han portat a una à mplia adopció dels Chip Multiprocessors (CMPs), que basen el seu potencial en la operació coordinada de múltiples nuclis de procés. Generacions successives han anat integrant més nuclis en la recerca d'alt rendiment amb un cost raonable. Per a que aquesta tendència continuï, però, cal resoldre importants problemes d'escalabilitat a diferents capes de disseny. Escalar la xarxa d'interconnexió és un gran repte en ell mateix, ja que les noves propostes de Networks-on-Chip (NoC) han de servir un trà fic eminentment variable i heterogeni dels processadors amb molts nuclis. Són necessà ries solucions rà pides i flexibles per evitar que les comunicacions dins del xip es converteixin en el pròxim coll d'ampolla de rendiment, situació que limitaria en gran mesura l'espai de disseny a nivell d'arquitectura i portaria a l'ús d'arquitectures i models de programació lents, ineficients o poc programables. L'aparició de noves tecnologies d'interconnexió ha possibilitat la creació de NoCs més flexibles i escalables. En particular, la comunicació intra-xip sense fils ha despertat un interès considerable en virtut de les seva baixa latència, simplicitat, i bon rendiment amb trà fic broadcast. La majoria de les Wireless NoC (WNoC) proposades fins ara s'han centrat en aprofitar l'avantatge en termes de latència d'aquest nou paradigma creant múltiples canals sense fils per interconnectar nuclis allunyats entre sÃ. Aquesta estratègia és efectiva per complementar a NoCs clà ssiques en escales mitjanes, però és probable que altres tecnologies com la nanofotònica puguin jugar millor aquest paper a escales més grans. Aquesta tesi presenta el concepte de Broadcast-Oriented WNoC (BoWNoC), un nou enfoc que intenta rendibilitzar al mà xim la inherent simplicitat, flexibilitat, i capacitats broadcast de la tecnologia sense fils integrant una antena i transmissor/receptor per cada nucli del processador. Aquest paradigma forma part d'una visió més à mplia on un BoWNoC serviria trà fic broadcast i urgent, mentre que una xarxa convencional serviria fluxos de dades més pesats. En virtut de la escalabilitat i del seu suport broadcast, BoWNoC podria convertir-se en un element clau en una gran varietat d'arquitectures i algoritmes poc convencionals que milloressin considerablement el rendiment, l'eficiència, l'escalabilitat i la programabilitat de processadors amb molts nuclis. El present treball té com a objectius no només estudiar els aspectes fonamentals del paradigma BoWNoC, sinó també demostrar la seva viabilitat des dels punts de vista de la implementació, i del disseny de xarxa i arquitectura. Una exploració a la capa fÃsica valida la viabilitat de l'enfoc usant tecnologies longituds d'ona milimètriques en un futur proper, i suggereix l'ús d'antenes de grafè a la banda dels terahertz ja a més llarg termini. A capa d'enllaç, la tesi aporta una anà lisi del context de l'aplicació que és, més tard, utilitzada per al disseny d'un protocol d'accés al medi que permet servir trà fic broadcast a baixa latència i de forma fiable. A capa de xarxa, la nostra visió hÃbrida és avaluada posant èmfasi en la flexibilitat que aporta el fet de prendre les decisions a nivell de la interfÃcie de xarxa, mostrant grans millores de rendiment per una à mplia selecció de patrons de trà fic. A nivell d'arquitectura, l'impacte que el concepte de BoWNoC pot tenir sobre el disseny de processadors amb molts nuclis no només és debatut de forma qualitativa i genèrica, sinó també avaluat quantitativament per una arquitectura concreta enfocada a la sincronització. Els resultats demostren que l'impacte de BoWNoC pot anar més enllà d'una millora en termes de rendiment de xarxa; representant, possiblement, un canvi radical a l'era dels molts nuclisAward-winningPostprint (published version
Recommended from our members
Variation-Aware Modeling and Design of Nanophotonic Interconnects
Optical interconnects have started to replace electrical interconnects in the communications between racks and circuit boards with potential benefits in bandwidth, delay, power efficiency, and crosstalk. Silicon photonics has emerged to be a highly promising enabling technology for the short-reach nanophotonic interconnects because it offers favorable CMOS compatibility and high integration level. The fast-growing complexity of photonic integrated circuit (PIC) and close electro-optical integration call for computer-aided design (CAD) for integrated photonics, and electronic-photonic design automation (EPDA) including accurate behavior models and efficient simulation methodologies for integrated electro-optical systems. Also, the nanophotonic devices are highly sensitive to fabrication process variation and thermal variation effects, which requires proper modeling, optimization, and management schemes. To address these problems, this thesis is dedicated to the following two tasks: (1) compact modeling and circuit-level simulation of nanophotonic interconnects, and (2) power-efficient management of the variation effects in nanophotonic interconnects.The first part of the thesis develops compact models for key components in nanophotonic interconnects including silicon microring modulators, diode lasers, electro-absorption modulators (EAM), photodetectors, etc. These compact models are developed based on their electrical and optical properties, and are then extensively validated by measurement data. The model parameters are extracted from common electrical and optical tests. Implemented in Verilog-A, the models are used in SPICE simulations of optical links, whose results again agree well with measurement data. The compact model library and the simulation methodology enable electro-optical co-simulations and optical device design explorations in the circuit-level.In the second part of the thesis, we propose modeling methods and power-efficient management schemes for the process and thermal variations in optical interconnects. The proposed adaptive tuning technique performs on-chip self-tests and adaptively allocates just enough power for link operations. The technique saves significant amount of power compared to worst-case based conservative designs, and scales well w.r.t. variations and network size. We also design power-efficient pairing algorithms for microring-based optical interconnects. Our algorithms optimally mix-and-match microring-based devices to minimize the power consumption for tuning. The algorithms are tested on both measured and synthetic data sets, demonstrating promising results of power reduction and scalability for handling a large number of devices. Lastly, we decompose and analyze wafer-scale spatial patterns of process variations in microring modulators. We further investigate the correlations between the spatial patterns and fabrication process steps, which is valuable for understanding process variation sources and improving fabrication processes for uniformity