Search CORE

13,598 research outputs found

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

Author: Bell Steven Emberton
Cao Kaidi
Gao Mingyu
Ha Heonjae
Horowitz Mark
Kozyrakis Christos
Liu Qiaoyi
Nayak Ankita
Pu Jing
Raina Priyanka
Setter Jeff Ou
Yang Xuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/04/2020
Field of study

We show that DNN accelerator micro-architectures and their program mappings represent specific choices of loop order and hardware parallelism for computing the seven nested loops of DNNs, which enables us to create a formal taxonomy of all existing dense DNN accelerators. Surprisingly, the loop transformations needed to create these hardware variants can be precisely and concisely represented by Halide's scheduling language. By modifying the Halide compiler to generate hardware, we create a system that can fairly compare these prior accelerators. As long as proper loop blocking schemes are used, and the hardware can support mapping replicated loops, many different hardware dataflows yield similar energy efficiency with good performance. This is because the loop blocking can ensure that most data references stay on-chip with good locality and the processing units have high resource utilization. How resources are allocated, especially in the memory system, has a large impact on energy and performance. By optimizing hardware resource allocation while keeping throughput constant, we achieve up to 4.2X energy improvement for Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively.Comment: Published as a conference paper at ASPLOS 202

arXiv.org e-Print Archive

Crossref

Modeling high-performance wormhole NoCs for critical real-time embedded systems

Author: Abella Ferrer Jaume
Cazorla Almeida Francisco Javier
Hernández Carles
Panic Milos
Quiñones Eduardo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Manycore chips are a promising computing platform to cope with the increasing performance needs of critical real-time embedded systems (CRTES). However, manycores adoption by CRTES industry requires understanding task's timing behavior when their requests use manycore's network-on-chip (NoC) to access hardware shared resources. This paper analyzes the contention in wormhole-based NoC (wNoC) designs - widely implemented in the high-performance domain - for which we introduce a new metric: worst-contention delay (WCD) that captures wNoC impact on worst-case execution time (WCET) in a tighter manner than the existing metric, worst-case traversal time (WCTT). Moreover, we provide an analytical model of the WCD that requests can suffer in a wNoC and we validate it against wNoC designs resembling those in the Tilera-Gx36 and the Intel-SCC 48-core processors. Building on top of our WCD analytical model, we analyze the impact on WCD that different design parameters such as the number of virtual channels, and we make a set of recommendations on what wNoC setups to use in the context of CRTES.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

NoCo: ILP-based worst-case contention estimation for mesh real-time manycores

Author: Abella Ferrer Jaume
Cardona Nadal Jordi
Cazorla Almeida Francisco Javier
Hernández Luz Carles
Mezzetti Enrico
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/01/2019
Field of study

Manycores are capable of providing the computational demands required by functionally-advanced critical applications in domains such as automotive and avionics. In manycores a network-on-chip (NoC) provides access to shared caches and memories and hence concentrates most of the contention that tasks suffer, with effects on the worst-case contention delay (WCD) of packets and tasks' WCET. While several proposals minimize the impact of individual NoC parameters on WCD, e.g. mapping and routing, there are strong dependences among these NoC parameters. Hence, finding the optimal NoC configurations requires optimizing all parameters simultaneously, which represents a multidimensional optimization problem. In this paper we propose NoCo, a novel approach that combines ILP and stochastic optimization to find NoC configurations in terms of packet routing, application mapping, and arbitration weight allocation. Our results show that NoCo improves other techniques that optimize a subset of NoC parameters.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2015- 65316-P and the HiPEAC Network of Excellence. It also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (agreement No. 772773). Carles Hernández is jointly supported by the MINECO and FEDER funds through grant TIN2014-60404-JIN. Jaume Abella has been partially supported by the Spanish Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717. Enrico Mezzetti has been partially supported by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva-Incorporaci´on postdoctoral fellowship number IJCI-2016-27396.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Scalability of broadcast performance in wireless network-on-chip

Author: Abadal Cavallé Sergi
Alarcón Cot Eduardo José
Cabellos Aparicio Alberto
González Colás Antonio María
Lee Heekwan
Mestres Sugrañes Albert
Nemirovsky Mario
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

An OpenSHMEM Implementation for the Adapteva Epiphany Coprocessor

Author: D Richie
D Richie
J Ross
JA Ross
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/08/2016
Field of study

This paper reports the implementation and performance evaluation of the OpenSHMEM 1.3 specification for the Adapteva Epiphany architecture within the Parallella single-board computer. The Epiphany architecture exhibits massive many-core scalability with a physically compact 2D array of RISC CPU cores and a fast network-on-chip (NoC). While fully capable of MPMD execution, the physical topology and memory-mapped capabilities of the core and network translate well to Partitioned Global Address Space (PGAS) programming models and SPMD execution with SHMEM.Comment: 14 pages, 9 figures, OpenSHMEM 2016: Third workshop on OpenSHMEM and Related Technologie

arXiv.org e-Print Archive

Crossref