121 research outputs found
WiSync: an architecture for fast synchronization through on-chip wireless communication
In shared-memory multiprocessing, fine-grain synchronization is challenging because it requires frequent communication. As technology scaling delivers larger manycore chips, such pattern is expected to remain costly to support.; In this paper, we propose to address this challenge by using on-chip wireless communication. Each core has a transceiver and an antenna to communicate with all the other cores. This environment supports very low latency global communication. Our architecture, called WiSync, uses a per-core Broadcast Memory (BM). When a core writes to its BM, all the other 100+ BMs get updated in less than 10 processor cycles. We also use a second wireless channel with cheaper transfers to execute barriers efficiently. WiSync supports multiprogramming, virtual memory, and context switching. Our evaluation with simulations of 128-threaded kernels and 64-threaded applications shows that WiSync speeds-up synchronization substantially. Compared to using advanced conventional synchronization, WiSync attains an average speedup of nearly one order of magnitude for the kernels, and 1.12 for PARSEC and SPLASH-2.Peer ReviewedPostprint (author's final draft
High Peformance and Low Power On-Die Interconnect Fabrics.
Increasing power density with technology scaling has caused stagnation in operating frequency of modern day microprocessors. This has led designers to prefer multicore architectures over complex monolithic processors to keep up with the demand for rising computing throughput. Although processing units are getting smaller and simpler, the dramatic rise of their count on a single die has made the fabric that connects these processing units increasingly complex. These interconnect fabrics have become a bottleneck in improving overall system effciency. As a result, the design paradigm for multi-core chips is gradually shifting from a core-centric architecture towards an interconnect-centric architecture, where system efficiency is limited by the fabric rather than the processing ability of any individual core. This dissertation introduces three novel and synergistic circuit techniques to improve scalability of switch fabrics to make on-die integration of hundreds to thousands of cores feasible. 1) A matrix topology is proposed for designing a fully connected switch fabric that re-uses output buses for programming, and stores shue congurations at cross points. This significantly reduces routing congestion, lowers area/power, and improves per-
formance. Silicon measurements demonstrate 47% energy savings in a 64-lane SIMD processor fabricated in 65nm CMOS over a conventional implementation. 2) A novel approach to handle high radix arbitration along with data routing is proposed. It optimally uses existing cross-bar interconnect resources without requiring any additional overhead. Bandwidth exceeding 2Tb/s is recorded in a test prototype fabricated in 65nm. 3) Building on the later, a new circuit topology to manage and update priority adaptively within the switch fabric without incurring additional delay or area is then proposed. Several assist circuit techniques, such as a thyristor based sense amplifier and self regenerating bi-directional repeaters are proposed for high speed energy efficient signaling to and from the switch fabric to improve overall routing efficiency.
Using these techniques a 64 x 64 switch fabric with 128b data bus fabricated in 45nm achieves a throughput of 4.5Tb/s at single cycle latency while operating at 559MHz.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91506/1/sudhirks_1.pd
Design space exploration of photonic interconnects
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 109-113).As processors scale deep into the multi-core and many-core regimes, bandwidth and energy-efficiency of the on-die interconnect network have become paramount design issues. Recognizing potential limits of electrical interconnects, emerging nanophotonic integration has been recently proposed as a potential technology option for both on-chip and chip-to-chip applications. As optical links avoid the capacitive, resistive and signal integrity limits imposed upon electrical interconnects, the introduction of integrated photonics allows for efficient realization of physical connectivity that are costly to accomplish electrically. While many recent works have since cited the potential benefits of optics, inherent design tradeoffs of photonic datapath and backend components remain relatively unknown at the system-level. This thesis develops insights regarding the behavior of electrical and hybrid optoelectrical networks and systems. We present power and area models that capture the behavior of electrical interface circuits and their interactions with optical devices. To animate these models in the context of a full system, we contribute DSENT, a novel physical modeling framework capable of estimating the costs of generalized digital electronics, mixed-signal interface circuitry, and optical links. With DSENT, we enable fast power and area evaluation of entire networks to connect the dynamics of an underlying photonics interconnect to that of an otherwise electrical system. Using our methodolody, we perform a technology-driven design space exploration of intra-chip networks and highlight the importance of thermal tuning and parasitic receiver capacitances in network power consumption. We show that the performance gains enabled by photonics-inspired architectures can enable savings in total system energy even if the network is more costly. Finally, we propose a photonically interconnected DRAM system as a solution to the core-to-DRAM bandwidth bottleneck. By attacking energy consumption at the DRAM channel, chip, and bank level with integrated photoncis, we cut the power consumption of the DRAM system by 10x while remaining area neutral when compared to a projected electrical baseline.by Chen Sun.S.M
Recommended from our members
On Multicast in Asynchronous Networks-on-Chip: Techniques, Architectures, and FPGA Implementation
In this era of exascale computing, conventional synchronous design techniques are facing unprecedented challenges. The consumer electronics market is replete with many-core systems in the range of 16 cores to thousands of cores on chip, integrating multi-billion transistors. However, with this ever increasing complexity, the traditional design approaches are facing key issues such as increasing chip power, process variability, aging, thermal problems, and scalability. An alternative paradigm that has gained significant interest in the last decade is asynchronous design. Asynchronous designs have several potential advantages: they are naturally energy proportional, burning power only when active, do not require complex clock distribution, are robust to different forms of variability, and provide ease of composability for heterogeneous platforms. Networks-on-chip (NoCs) is an interconnect paradigm that has been introduced to deal with the ever-increasing system complexity. NoCs provide a distributed, scalable, and efficient interconnect solution for today’s many-core systems. Moreover, NoCs are a natural match with asynchronous design techniques, as they separate communication infrastructure and timing from the computational elements. To this end, globally-asynchronous locally-synchronous (GALS) systems that interconnect multiple processing cores, operating at different clock speeds, using an asynchronous NoC, have gained significant interest. While asynchronous NoCs have several advantages, they also face a key challenge of supporting new types of traffic patterns. Once such pattern is multicast communication, where a source sends packets to arbitrary number of destinations. Multicast is not only common in parallel computing, such as for cache coherency, but also for emerging areas such as neuromorphic computing. This important capability has been largely missing from asynchronous NoCs. This thesis introduces several efficient multicast solutions for these interconnects. In particular, techniques, and network architectures are introduced to support high-performance and low-power multicast. Two leading network topologies are the focus: a variant mesh-of-trees (MoT) and a 2D mesh. In addition, for a more realistic implementation and analysis, as well as significantly advancing the field of asynchronous NoCs, this thesis also targets synthesis of these NoCs on commercial FPGAs. While there has been significant advances in FPGA technologies, there has been only limited research on implementing asynchronous NoCs on FPGAs. To this end, a systematic computeraided design (CAD) methodology has been introduced to efficiently and safely map asynchronous NoCs on FPGAs. Overall, this thesis makes the following three contributions. The first contribution is a multicast solution for a variant MoT network topology. This topology consists of simple low-radix switches, and has been used in high-performance computing platforms. A novel local speculation technique is introduced, where a subset of the network’s switches are speculative that always broadcast every packet. These switches are very simple and have high performance. Speculative switches are surrounded by non-speculative ones that route packets based on their destinations and also throttle any redundant copies created by the former. This hybrid network architecture achieved significant performance and power benefits over other multicast approaches. The second contribution is a multicast solution for a 2D-mesh topology, which is more complex with higher-radix switches and also is more commonly used. A novel continuous-time replication strategy is introduced to optimize the critical multi-way forking operation of a multicast transmission. In this technique, a multicast packet is first stored in an input port of a switch, from where it is sent through distinct output ports towards different destinations concurrently, at each output’s own rate and in continuous time. This strategy is shown to have significant latency and energy benefits over an approach that performs multicast using multiple distinct serial unicasts to each destination. Finally, a systematic CAD methodology is introduced to synthesize asynchronous NoCs on commercial FPGAs. A two-fold goal is targeted: correctness and high performance. For ease of implementation, only existing FPGA synthesis tools are used. Moreover, since asynchronous NoCs involve special asynchronous components, a comprehensive guide is introduced to map these elements correctly and efficiently. Two asynchronous NoC switches are synthesized using the proposed approach on a leading Xilinx FPGA in 28 nm: one that only handles unicast, and the other that also supports multicast. Both showed significant energy benefits with some performance gains over a state-of-the-art synchronous switch
Architecture and algorithms for the implementation of digital wireless receivers in FPGA and ASIC: ISDB-T and DVB-S2 cases
[EN] The first generation of Terrestrial Digital Television(DTV) has been in service for over a decade. In 2013, several countries have already completed the transition from Analog to Digital TV Broadcasting, most of which in Europe. In South America, after several studies and trials, Brazil adopted the Japanese standard with some innovations. Japan and Brazil started Digital Terrestrial Television Broadcasting (DTTB) services in December 2003 and December 2007 respectively, using Integrated Services Digital Broadcasting - Terrestrial (ISDB-T), also known as ARIB STD-B31.
In June 2005 the Committee for the Information Technology Area (CATI) of Brazilian Ministry of Science and Technology and Innovation MCTI approved the incorporation of the IC-Brazil Program, in the National Program for Microelectronics (PNM) . The main goals of IC-Brazil are the formal qualification of IC designers, support to the creation of semiconductors companies focused on projects of ICs within Brazil, and the attraction of semiconductors companies focused on the design and development of ICs in Brazil.
The work presented in this thesis originated from the unique momentum created by the combination of the birth of Digital Television in Brazil and the creation of the IC-Brazil Program by the Brazilian government. Without this combination it would not have been possible to make these kind of projects in Brazil. These projects have been a long and costly journey, albeit scientifically and technologically worthy, towards a Brazilian DTV state-of-the-art low complexity Integrated Circuit, with good economy scale perspectives, due to the fact that at the beginning of this project ISDB-T standard was not adopted by several countries like DVB-T.
During the development of the ISDB-T receiver proposed in this thesis, it was realized that due to the continental dimensions of Brazil, the DTTB would not be enough to cover the entire country with open DTV signal, specially for the case of remote localizations far from the high urban density regions. Then, Eldorado Research Institute and Idea! Electronic Systems, foresaw that, in a near future, there would be an open distribution system for high definition DTV over satellite, in Brazil. Based on that, it was decided by Eldorado Research Institute, that would be necessary to create a new ASIC for broadcast satellite reception. At that time DVB-S2 standard was the strongest candidate for that, and this assumption still stands nowadays. Therefore, it was decided to apply to a new round of resources funding from the MCTI - that was granted - in order to start the new project.
This thesis discusses in details the Architecture and Algorithms proposed for the implementation of a low complexity Intermediate Frequency(IF) ISDB-T Receiver on Application Specific Integrated Circuit (ASIC) CMOS. The Architecture proposed here is highly based on the COordinate Rotation Digital Computer (CORDIC) Algorithm, that is a simple and efficient algorithm suitable for VLSI implementations. The receiver copes with the impairments inherent to wireless channels transmission and the receiver crystals. The thesis also discusses the Methodology adopted and presents the implementation results. The receiver performance is presented and compared to those obtained by means of simulations.
Furthermore, the thesis also presents the Architecture and Algorithms for a DVB-S2 receiver targeting its ASIC implementation. However, unlike the ISDB-T receiver, only preliminary ASIC implementation results are introduced. This was mainly done in order to have an early estimation of die area to prove that the project in ASIC is economically viable, as well as to verify possible bugs in early stage. As in the case of
ISDB-T receiver, this receiver is highly based on CORDIC algorithm and it was prototyped in FPGA. The Methodology used for the second receiver is derived from that used for the ISDB-T receiver, with minor additions given the project characteristics.[ES] La primera generación de Televisión Digital Terrestre(DTV) ha estado en servicio por más de una década. En 2013, varios países completaron la transición de transmisión analógica a televisión digital, la mayoría de ellas en Europa. En América del Sur, después de varios estudios y ensayos, Brasil adoptó el estándar japonés con algunas innovaciones. Japón y Brasil comenzaron a prestar el servicio de Difusión de Televisión Digital Terrestre (DTTB) en diciembre de 2003 y diciembre de 2007 respectivamente, utilizando Radiodifusión Digital de Servicios Integrados Terrestres (ISDB-T), también conocida como ARIB STD-B31.
En junio de 2005, el Comité del Área de Tecnología de la Información (CATI) del Ministerio de Ciencia, Tecnología e Innovación de Brasil - MCTI aprobó la incorporación del Programa CI-Brasil, en el Programa Nacional de Microelectrónica (PNM). Los principales objetivos de la CI-Brasil son la formación de diseñadores de CIs, apoyar la creación de empresas de semiconductores enfocadas en proyectos de circuitos integrados dentro de Brasil, y la atracción de empresas de semiconductores interesadas en el diseño y desarrollo de circuitos integrados.
El trabajo presentado en esta tesis se originó en el impulso único creado por la combinación del nacimiento de la televisión digital en Brasil y la creación del Programa de CI-Brasil por el gobierno brasileño. Sin esta combinación no hubiera sido posible realizar este tipo de proyectos en Brasil. Estos proyectos han sido un trayecto largo y costoso, aunque meritorio desde el punto de vista científico y tecnológico, hacia un Circuito Integrado brasileño de punta y de baja complejidad para DTV, con buenas perspectivas de economía de escala debido al hecho que al inicio de este proyecto, el estándar ISDB-T no fue adoptado por varios países como DVB-T.
Durante el desarrollo del receptor ISDB-T propuesto en esta tesis, se observó que debido a las dimensiones continentales de Brasil, la DTTB no sería suficiente para cubrir todo el país con la señal de televisión digital abierta, especialmente para el caso de localizaciones remotas, apartadas de las regiones de alta densidad urbana. En ese momento, el Instituto de Investigación Eldorado e Idea! Sistemas Electrónicos, previeron que en un futuro cercano habría un sistema de distribución abierto para DTV de alta definición por satélite en Brasil. Con base en eso, el Instituto de Investigación Eldorado decidió que sería necesario crear un nuevo ASIC para la recepción de radiodifusión por satélite, basada el estándar DVB-S2.
En esta tesis se analiza en detalle la Arquitectura y algoritmos propuestos para la implementación de un receptor ISDB-T de baja complejidad y frecuencia intermedia (IF) en un Circuito Integrado de Aplicación Específica (ASIC) CMOS. La arquitectura aquí propuesta se basa fuertemente en el algoritmo Computadora Digital para Rotación de Coordenadas (CORDIC), el cual es un algoritmo simple, eficiente y adecuado para implementaciones VLSI. El receptor hace frente a las deficiencias inherentes a las transmisiones por canales inalámbricos y los cristales del receptor. La tesis también analiza la metodología adoptada y presenta los resultados de la implementación.
Por otro lado, la tesis también presenta la arquitectura y los algoritmos para un receptor DVB-S2 dirigido a la implementación en ASIC. Sin embargo, a diferencia del receptor ISDB-T, se introducen sólo los resultados preliminares de implementación en ASIC. Esto se hizo principalmente con el fin de tener una estimación temprana del área del die para demostrar que el proyecto en ASIC es económicamente viable, así como para verificar posibles errores en etapa temprana. Como en el caso de receptor ISDB-T, este receptor se basa fuertemente en el algoritmo CORDIC y fue un prototipado en FPGA. La metodología utilizada para el segundo receptor se deriva de la utilizada para el re[CA] La primera generació de Televisió Digital Terrestre (TDT) ha estat en servici durant més d'una dècada. En 2013, diversos països ja van completar la transició de la radiodifusió de televisió analògica a la digital, i la majoria van ser a Europa. A Amèrica del Sud, després de diversos estudis i assajos, Brasil va adoptar l'estàndard japonés amb algunes innovacions. Japó i Brasil van començar els servicis de Radiodifusió de Televisió Terrestre Digital (DTTB) al desembre de 2003 i al desembre de 2007, respectivament, utilitzant la Radiodifusió Digital amb Servicis Integrats de (ISDB-T), coneguda com a ARIB STD-B31.
Al juny de 2005, el Comité de l'Àrea de Tecnologia de la Informació (CATI) del Ministeri de Ciència i Tecnologia i Innovació del Brasil (MCTI) va aprovar la incorporació del programa CI Brasil al Programa Nacional de Microelectrònica (PNM). Els principals objectius de CI Brasil són la qualificació formal dels dissenyadors de circuits integrats, el suport a la creació d'empreses de semiconductors centrades en projectes de circuits integrats dins del Brasil i l'atracció d'empreses de semiconductors centrades en el disseny i desenvolupament de circuits integrats.
El treball presentat en esta tesi es va originar en l'impuls únic creat per la combinació del naixement de la televisió digital al Brasil i la creació del programa Brasil CI pel govern brasiler. Sense esta combinació no hauria estat possible realitzar este tipus de projectes a Brasil. Estos projectes han suposat un viatge llarg i costós, tot i que digne científicament i tecnològica, cap a un circuit integrat punter de baixa complexitat per a la TDT brasilera, amb bones perspectives d'economia d'escala perquè a l'inici d'este projecte l'estàndard ISDB-T no va ser adoptat per diversos països, com el DVB-T.
Durant el desenvolupament del receptor de ISDB-T proposat en esta tesi, va resultar que, a causa de les dimensions continentals de Brasil, la DTTB no seria suficient per cobrir tot el país amb el senyal de TDT oberta, especialment pel que fa a les localitzacions remotes allunyades de les regions d'alta densitat urbana.. En este moment, l'Institut de Recerca Eldorado i Idea! Sistemes Electrònics van preveure que, en un futur pròxim, no hi hauria a Brasil un sistema de distribució oberta de TDT d'alta definició a través de satèl¿lit. D'acord amb això, l'Institut de Recerca Eldorado va decidir que seria necessari crear un nou ASIC per a la recepció de radiodifusió per satèl¿lit. basat en l'estàndard DVB-S2.
En esta tesi s'analitza en detall l'arquitectura i els algorismes proposats per l'execució d'un receptor ISDB-T de Freqüència Intermèdia (FI) de baixa complexitat sobre CMOS de Circuit Integrat d'Aplicacions Específiques (ASIC). L'arquitectura ací proposada es basa molt en l'algorisme de l'Ordinador Digital de Rotació de Coordenades (CORDIC), que és un algorisme simple i eficient adequat per implementacions VLSI. El receptor fa front a les deficiències inherents a la transmissió de canals sense fil i els cristalls del receptor. Esta tesi també analitza la metodologia adoptada i presenta els resultats de l'execució. Es presenta el rendiment del receptor i es compara amb els obtinguts per mitjà de simulacions.
D'altra banda, esta tesi també presenta l'arquitectura i els algorismes d'un receptor de DVB-S2 de cara a la seua implementació en ASIC. No obstant això, a diferència del receptor ISDB-T, només s'introdueixen resultats preliminars d'implementació en ASIC. Això es va fer principalment amb la finalitat de tenir una estimació primerenca de la zona de dau per demostrar que el projecte en ASIC és econòmicament viable, així com per verificar possibles errors en l'etapa primerenca. Com en el cas del receptor ISDB-T, este receptor es basa molt en l'algorisme CORDIC i va ser un prototip de FPGA. La metodologia utilitzada per al segon receptor es deriva de la utilitzada per al receptor IRodrigues De Lima, E. (2016). Architecture and algorithms for the implementation of digital wireless receivers in FPGA and ASIC: ISDB-T and DVB-S2 cases [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/61967TESI
Optics for AI and AI for Optics
Artificial intelligence is deeply involved in our daily lives via reinforcing the digital transformation of modern economies and infrastructure. It relies on powerful computing clusters, which face bottlenecks of power consumption for both data transmission and intensive computing. Meanwhile, optics (especially optical communications, which underpin today’s telecommunications) is penetrating short-reach connections down to the chip level, thus meeting with AI technology and creating numerous opportunities. This book is about the marriage of optics and AI and how each part can benefit from the other. Optics facilitates on-chip neural networks based on fast optical computing and energy-efficient interconnects and communications. On the other hand, AI enables efficient tools to address the challenges of today’s optical communication networks, which behave in an increasingly complex manner. The book collects contributions from pioneering researchers from both academy and industry to discuss the challenges and solutions in each of the respective fields
Digital signal processing waveform aggregation and its experimental demonstration for next generation mobile fronthaul
L'abstract è presente nell'allegato / the abstract is in the attachmen
Wavelength reconfigurability for next generation optical access networks
Next generation optical access networks should not only increase the capacity but also be able to redistribute the capacity on the fly in order to manage larger variations in traffic patterns. Wavelength reconfigurability is the instrument to enable such capability of network-wide bandwidth redistribution since it allows dynamic sharing of both wavelengths and timeslots in WDM-TDM optical access networks. However, reconfigurability typically requires tunable lasers and tunable filters at the user side, resulting in cost-prohibitive optical network units (ONU). In this dissertation, I propose a novel concept named cyclic-linked flexibility to address the cost-prohibitive problem. By using the cyclic-linked flexibility, the ONU needs to switch only within a subset of two pre-planned wavelengths, however, the cyclic-linked structure of wavelengths allows free bandwidth to be shifted to any wavelength by a rearrangement process. Rearrangement algorithm are developed to demonstrate that the cyclic-linked flexibility performs close to the fully flexible network in terms of blocking probability, packet delay, and packet loss. Furthermore, the evaluation shows that the rearrangement process has a minimum impact to in-service ONUs. To realize the cyclic-linked flexibility, a family of four physical architectures is proposed. PRO-Access architecture is suitable for new deployments and disruptive upgrades in which the network reach is not longer than 20 km. WCL-Access architecture is suitable for metro-access merger with the reach up to 100 km. PSB-Access architecture is suitable to implement directly on power-splitter-based PON deployments, which allows coexistence with current technologies. The cyclically-linked protection architecture can be used with current and future PON standards when network protection is required
- …