2 research outputs found

    WHYPE: A Scale-Out Architecture with Wireless Over-the-Air Majority for Scalable In-memory Hyperdimensional Computing

    Full text link
    Hyperdimensional computing (HDC) is an emerging computing paradigm that represents, manipulates, and communicates data using long random vectors known as hypervectors. Among different hardware platforms capable of executing HDC algorithms, in-memory computing (IMC) has shown promise as it is very efficient in performing matrix-vector multiplications, which are common in the HDC algebra. Although HDC architectures based on IMC already exist, how to scale them remains a key challenge due to collective communication patterns that these architectures required and that traditional chip-scale networks were not designed for. To cope with this difficulty, we propose a scale-out HDC architecture called WHYPE, which uses wireless in-package communication technology to interconnect a large number of physically distributed IMC cores that either encode hypervectors or perform multiple similarity searches in parallel. In this context, the key enabler of WHYPE is the opportunistic use of the wireless network as a medium for over-the-air computation. WHYPE implements an optimized source coding that allows receivers to calculate the bit-wise majority of multiple hypervectors (a useful operation in HDC) being transmitted concurrently over the wireless channel. By doing so, we achieve a joint broadcast distribution and computation with a performance and efficiency unattainable with wired interconnects, which in turn enables massive parallelization of the architecture. Through evaluations at the on-chip network and complete architecture levels, we demonstrate that WHYPE can bundle and distribute hypervectors faster and more efficiently than a hypothetical wired implementation, and that it scales well to tens of receivers. We show that the average error rate of the majority computation is low, such that it has negligible impact on the accuracy of HDC classification tasks.Comment: Accepted at IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS). arXiv admin note: text overlap with arXiv:2205.1088

    Wireless Chip-Scale Communications for Neural Network Accelerators

    Get PDF
    Wireless on-chip communications have been proposed as a complement to conventional Network-on-Chip (NoC) paradigms in manycore processors. In massively parallel architectures, the fast broadcast and reconfigurability capabilities of the wireless plane open the door to new scalable and adaptive architectures with significant impact on a plethora of fields. This thesis aims to explore such impact in the all-pervasive field of AI accelerators, designing and evaluating new accelerators augmented with wireless on-chip communication.The last decade has witnessed an explosive growth in the use of Deep Neural Networks in fields such as computer vision, natural language processing, medicine or economics. Their achievements in accuracy across so many relevant and different applications exhibit the enormous potential of this disruptive technology. However, this unprecedented performance is closely tied with the fact that their new designs contain much deeper and bigger layer sets, forcing them to manage millions - and in some cases even billions - of parameters. This comes at a high computational and communication cost at the processor level, which has prompted the development of new hardware aimed at handling such large computing expense more efficiently, the so called \acrlong{dnn} accelerators. This work explores the potential of enhancing the performance of these accelerators by introducing Wireless Networks-on-Chip in their design, a novel interconnect paradigm proposed by the research community to overcome some of the communication challenges that manycore systems face. Specifically, both on-chip and off-chip wireless interconnect implementations have been studied and evaluated. In the off-chip case, a theoretical improvement of 13X in the runtime has been achieved, but at the expense of some area and power overheads.La última década ha sido testigo de un inmenso crecimiento en el uso de Deep Neural Networks en campos como la visión artificial, procesamiento de lenguaje natural, medicina o economía. Haber conseguido estos resultados sin precedentes en aplicaciones tan relevantes y variadas muestra el enorme potencial de esta tecnología tan disruptiva. Sin embargo, estos logros van muy ligados al hecho de que los nuevos diseños contienen muchas más capas y más profundas, lo que se traduce en millones - y en algunos casos billones - de parámetros. Esto supone un gran coste computacional y de comunicación a nivel de procesador, lo que ha impulsado el desarrollo de nuevo hardware que permita gestionar tal coste de manera más eficiente, los llamados aceleradores de Deep Neural Networks. Este proyecto explora la potencial mejora en rendimiento de estos aceleradores mediante la introducción de Wireless Newtorks-on-Chip en su diseño, un nuevo paradigma de interconexiones propuesto por la comunidad científica para superar algunos de los problemas de comunicación que sistemas manycore deben afrontar. Específicamente, implementaciones tanto on-chip como off-chip se han estudiado y evaluado. Se ha conseguido una mejora teórica de 13X en el runtime, pero con algunos costes añadidos de área y potencia.La darrera dècada ha estat testimoni d'un immens creixement en l'ús de Deep Neural Networks en camps com la visió artificial, processament de llenguatge natural, medicina o economia. Haver aconseguit aquests resultats sense precedents en aplicacions tan rellevants i variades mostra l?enorme potencial d?aquesta tecnologia tan disruptiva. No obstant, aquests èxits van molt lligats al fet de que els nous dissenys contenen moltes més capes i més profundes, cosa que es tradueix en milions - i en alguns casos bilions - de paràmetres. Això suposa un gran cost computacional i de comunicació a nivell de processador, cosa que ha impulsat el desenvolupament de nou hardware que permetin gestionar tal cost de manera més eficient, els anomenats acceleradors de Deep Neural Networks. Aquest projecte explora la potencial millora en rendiment d'aquests acceleradors mitjançant la introducció de Wireless Newtorks-on-Chip al seu disseny, un nou paradigma d'interconnexions proposat per la comunitat científica per a superar alguns dels problemes de comunicació que sistemes manycore han d'afrontar. Específicament, implementacions tant on-chip com off-chip s'han estudiat i evaluat. En el cas off-chip, s'ha aconseguit una millora teòrica de 13X al runtime però amb alguns costos afegits d'àrea i potència
    corecore