115,841 research outputs found

    An Energy and Performance Exploration of Network-on-Chip Architectures

    Get PDF
    In this paper, we explore the designs of a circuit-switched router, a wormhole router, a quality-of-service (QoS) supporting virtual channel router and a speculative virtual channel router and accurately evaluate the energy-performance tradeoffs they offer. Power results from the designs placed and routed in a 90-nm CMOS process show that all the architectures dissipate significant idle state power. The additional energy required to route a packet through the router is then shown to be dominated by the data path. This leads to the key result that, if this trend continues, the use of more elaborate control can be justified and will not be immediately limited by the energy budget. A performance analysis also shows that dynamic resource allocation leads to the lowest network latencies, while static allocation may be used to meet QoS goals. Combining the power and performance figures then allows an energy-latency product to be calculated to judge the efficiency of each of the networks. The speculative virtual channel router was shown to have a very similar efficiency to the wormhole router, while providing a better performance, supporting its use for general purpose designs. Finally, area metrics are also presented to allow a comparison of implementation costs

    A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)

    Full text link
    Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multi-core neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure

    Modeling and Analysis of the Performance of Exascale Photonic Networks

    Full text link
    "This is the peer reviewed version of the following article: Duro, JosĂ©, Jose A. Pascual, Salvador Petit, Julio Sahuquillo, and MarĂ­a E. GĂłmez. 2018. Modeling and Analysis of the Performance of Exascale Photonic Networks. Concurrency and Computation: Practice and Experience 31 (21). Wiley. doi:10.1002/cpe.4773, which has been published in final form at https://doi.org/10.1002/cpe.4773. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving."[EN] Photonics technology has become a promising and viable alternative for both on-chip and off-chip interconnection networks of future Exascale systems. Nevertheless, this technology is not mature enough yet in this context, so research efforts focusing on photonic networks are still required to achieve realistic suitable network implementations. In this regard, system-level photonic network simulators can help guide designers to assess the multiple design choices. Most current research is done on electrical network simulators, whose components work widely different from photonics components. In this work, we summarize and compare the working behavior of both technologies which includes the use of optical routers, wavelength-division multiplexing and circuit switching among others. After implementing them into a well-known simulation framework, an extensive simulation study has been carried out using realistic photonic network configurations with synthetic and realistic traffic. Experimental results show that, compared to electrical networks, optical networks can reduce the execution time of the studied real workloads in almost one order of magnitude. Our study also reveals that the photonic configuration highly impacts on the network performance, being the bandwidth per channel and the message length the most important parameters.This work was supported by the ExaNeSt project, funded by the European Union's Horizon 2020 Research and Innovation Program under grant 671553, and by the Spanish Ministerio de EconomĂ­a y Competitividad (MINECO) and Plan E funds under grant TIN2015-66972-C5-1-R. Pascual was supported by a HiPEAC Collaboration Grant.Duro-GĂłmez, J.; Pascual PĂ©rez, JA.; Petit MartĂ­, SV.; Sahuquillo BorrĂĄs, J.; GĂłmez Requena, ME. (2019). Modeling and Analysis of the Performance of Exascale Photonic Networks. Concurrency and Computation Practice and Experience. 31(21):1-12. https://doi.org/10.1002/cpe.4773S1123121Top500 website. Accessed January2018.Kodi, A. K., Neel, B., & Brantley, W. C. (2014). Photonic Interconnects for Exascale and Datacenter Architectures. IEEE Micro, 34(5), 18-30. doi:10.1109/mm.2014.62Rumley, S., Nikolova, D., Hendry, R., Li, Q., Calhoun, D., & Bergman, K. (2015). Silicon Photonics for Exascale Systems. Journal of Lightwave Technology, 33(3), 547-562. doi:10.1109/jlt.2014.2363947Shacham, A., Bergman, K., & Carloni, L. P. (2008). Photonic Networks-on-Chip for Future Generations of Chip Multiprocessors. IEEE Transactions on Computers, 57(9), 1246-1260. doi:10.1109/tc.2008.78Batten, C., Joshi, A., Orcutt, J., Khilo, A., Moss, B., Holzwarth, C. W., 
 Asanovic, K. (2009). Building Many-Core Processor-to-DRAM Networks with Monolithic CMOS Silicon Photonics. IEEE Micro, 29(4), 8-21. doi:10.1109/mm.2009.60WernerS NavaridasJ LujĂĄnM.Designing low‐power low‐latency networks‐on‐chip by optimally combining electrical and optical links. Paper presented at: 23rd IEEE International Symposium on High Performance Computer Architecture (HPCA);2016;Austin TX.PucheJ LechagoS PetitS GĂłmezME SahuquilloJ.Accurately modeling a photonic NoC in a detailed CMP simulation framework. Paper presented at: 2016 International Conference on High Performance Computing & Simulation (HPCS);2016;Innsbruck Austria.ChenG ChenH HaurylauM et al.On‐chip copper‐based vs. optical interconnects: delay uncertainty latency power and bandwidth density comparative predictions. Paper presented at: 2006 International Interconnect Technology Conference;2006;Burlingame CA.KatevenisM ChrysosN MarazakisM et al.The ExaNeSt project: Interconnects storage and packaging for exascale systems. Paper presented at: 2016 Euromicro Conference on Digital System Design (DSD);2016;Limassol Cyprus.ConcattoC PascualJA NavaridasJ et al.A CAM-Free Exascalable HPC Router for Low-Energy Communications. Paper presented at: 31st International Conference on Architecture of Computing Systems (ARCS);2018.DuanG‐H FedeliJ‐M KeyvaniniaS ThomsonD et al.10 Gb/s integrated tunable hybrid III‐V/si laser and silicon Mach‐Zehnder modulator. Paper presented at: European Conference and Exhibition on Optical Communications;2012;Amsterdam The Netherlands.DuanGH JanyC Le LiepvreAL et al.Integrated hybrid III‐V/si laser and transmitter. Paper presented at: 2012 International Conference on Indium Phosphide and Related Materials;2012;Santa Barbara CA.Soref, R., & Bennett, B. (1987). Electrooptical effects in silicon. IEEE Journal of Quantum Electronics, 23(1), 123-129. doi:10.1109/jqe.1987.1073206Liu, A., Liao, L., Rubin, D., Nguyen, H., Ciftcioglu, B., Chetrit, Y., 
 Paniccia, M. (2007). High-speed optical modulation based on carrier depletion in a silicon waveguide. Optics Express, 15(2), 660. doi:10.1364/oe.15.000660Thomson, D. J., Gardes, F. Y., Hu, Y., Mashanovich, G., Fournier, M., Grosse, P., 
 Reed, G. T. (2011). High contrast 40Gbit/s optical modulation in silicon. Optics Express, 19(12), 11507. doi:10.1364/oe.19.011507Bergman, K., Carloni, L. P., Biberman, A., Chan, J., & Hendry, G. (2014). Photonic Network-on-Chip Design. Integrated Circuits and Systems. doi:10.1007/978-1-4419-9335-9Dong, P., Chen, L., Xie, C., Buhl, L. L., & Chen, Y.-K. (2012). 50-Gb/s silicon quadrature phase-shift keying modulator. Optics Express, 20(19), 21181. doi:10.1364/oe.20.021181DongP LiuX SethumadhavanC et al.224‐Gb/s PDM‐16‐QAM modulator and receiver based on silicon photonic integrated circuits. Paper presented at: Optical Fiber Communication Conference/National Fiber Optic Engineers Conference;2013;Anaheim CA.Navaridas, J., Miguel-Alonso, J., Pascual, J. A., & Ridruejo, F. J. (2011). Simulating and evaluating interconnection networks with INSEE. Simulation Modelling Practice and Theory, 19(1), 494-515. doi:10.1016/j.simpat.2010.08.008Lu, L., Zhao, S., Zhou, L., Li, D., Li, Z., Wang, M., 
 Chen, J. (2016). 16 × 16 non-blocking silicon optical switch based on electro-optic Mach-Zehnder interferometers. Optics Express, 24(9), 9295. doi:10.1364/oe.24.009295DuroJ PetitS SahuquilloJ GĂłmezME.Modeling a photonic network for exascale computing. Paper presented at: 2017 International Conference on High Performance Computing & Simulation (HPCS);2017;Genoa Italy.Xi, K., Kao, Y.-H., & Chao, H. J. (2012). A Petabit Bufferless Optical Switch for Data Center Networks. Optical Interconnects for Future Data Center Networks, 135-154. doi:10.1007/978-1-4614-4630-9_8KimJ DallyWJ ScottS AbtsD.Technology‐driven highly‐scalable dragonfly topology. Paper presented at: 35th International Symposium on Computer Architecture (ISCA);2008;Beijing China.Essiambre, R.-J., & Tkach, R. W. (2012). Capacity Trends and Limits of Optical Communication Networks. Proceedings of the IEEE, 100(5), 1035-1055. doi:10.1109/jproc.2012.2182970Temprana, E., Myslivets, E., Kuo, B. P.-P., Liu, L., Ataie, V., Alic, N., & Radic, S. (2015). Overcoming Kerr-induced capacity limit in optical fiber transmission. Science, 348(6242), 1445-1448. doi:10.1126/science.aab1781Springel, V. (2005). The cosmological simulation code gadget-2. Monthly Notices of the Royal Astronomical Society, 364(4), 1105-1134. doi:10.1111/j.1365-2966.2005.09655.xPlimpton, S. (1995). Fast Parallel Algorithms for Short-Range Molecular Dynamics. Journal of Computational Physics, 117(1), 1-19. doi:10.1006/jcph.1995.1039Ben‐ItzhakY ZahaviE CidonI KolodnyA.HNOCS: Modular open‐source simulator for heterogeneous NoCs. Paper presented at: 2012 International Conference on Embedded Computer Systems (SAMOS);2012;Samos Greece.HossainH AhmedM Al‐NayeemA IslamTZ AkbarMM.Gpnocsim‐a general purpose simulator for network‐on‐chip. Paper presented at: 2007 International Conference on Information and Communication Technology;2007;Dhaka Bangladesh.JainL Al‐HashimiB GaurMS LaxmiV NarayananA.NIRGAM: A simulator for NoC interconnect routing and application modeling. Paper presented at: Design Automation and Test in Europe Conference;2007;Nice France.ChanJ HendryG BibermanA BergmanK CarloniLP.PhoenixSim: A simulator for physical‐layer analysis of chip‐scale photonic interconnection networks. In: Proceedings of the Conference on Design Automation and Test in Europe;2010;Dresden Germany.RumleyS BahadoriM WenK NikolovaD BergmanK.PhoenixSim: crosslayer design and modeling of silicon photonic interconnects. In: Proceedings of the 1st International Workshop on Advanced Interconnect Solutions and Technologies for Emerging Computing Systems;2016;Prague Czech Republic.VargaA HornigR.An overview of the OMNeT++ simulation environment. In: Proceedings of the 1st International Conference on Simulation Tools and Techniques for Communications Networks and Systems & Workshops;2008;Marseille France.SunC ChenCHO KurianG et al.DSENT‐a tool connecting emerging photonics with electronics for opto‐electronic networks‐on‐chip modeling. Paper presented at: 2012 IEEE/ACM Sixth International Symposium on Networks‐on‐Chip;2012;Copenhagen Denmark.Ma, X., Yu, J., Hua, X., Wei, C., Huang, Y., Yang, L., 
 Yang, J. (2014). LioeSim: A Network Simulator for Hybrid Opto-Electronic Networks-on-Chip Analysis. Journal of Lightwave Technology, 32(22), 4301-4310. doi:10.1109/jlt.2014.2356515KahngAB LiB PehL‐S SamadiK.ORION 2.0: a fast and accurate NoC power and area model for early‐stage design space exploration. In: Proceedings of the Conference on Design Automation and Test in Europe;2009;Nice France.Chan, J., Hendry, G., Bergman, K., & Carloni, L. P. (2011). Physical-Layer Modeling and System-Level Design of Chip-Scale Photonic Interconnection Networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 30(10), 1507-1520. doi:10.1109/tcad.2011.215715

    Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

    Get PDF
    Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely compute and memory intensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this trend, pushing weight quantization to the limit. Hardware accelerators for BWNs presented up to now have focused on core efficiency, disregarding I/O bandwidth and system-level efficiency that are crucial for deployment of accelerators in ultra-low power devices. We present Hyperdrive: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level by arranging Hyperdrive chips systolically in a 2D mesh while processing the entire feature map together in parallel. Hyperdrive achieves 4.3 TOp/s/W system-level efficiency (i.e., including I/Os)---3.1x higher than state-of-the-art BWN accelerators, even if its core uses resource-intensive FP16 arithmetic for increased robustness

    An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration

    Get PDF
    We empirically evaluate an undervolting technique, i.e., underscaling the circuit supply voltage below the nominal level, to improve the power-efficiency of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing faults due to excessive circuit latency increase. We evaluate the reliability-power trade-off for such accelerators. Specifically, we experimentally study the reduced-voltage operation of multiple components of real FPGAs, characterize the corresponding reliability behavior of CNN accelerators, propose techniques to minimize the drawbacks of reduced-voltage operation, and combine undervolting with architectural CNN optimization techniques, i.e., quantization and pruning. We investigate the effect of environmental temperature on the reliability-power trade-off of such accelerators. We perform experiments on three identical samples of modern Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification CNN benchmarks. This approach allows us to study the effects of our undervolting technique for both software and hardware variability. We achieve more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain is the result of eliminating the voltage guardband region, i.e., the safe voltage region below the nominal level that is set by FPGA vendor to ensure correct functionality in worst-case environmental and circuit conditions. 43% of the power-efficiency gain is due to further undervolting below the guardband, which comes at the cost of accuracy loss in the CNN accelerator. We evaluate an effective frequency underscaling technique that prevents this accuracy loss, and find that it reduces the power-efficiency gain from 43% to 25%.Comment: To appear at the DSN 2020 conferenc
    • 

    corecore