Search CORE

3,652 research outputs found

High Throughput Architecture for High Performance NoC

Author: Magdy A. El-Moursy
Mohamed A. Abd El Ghany
Mohammed Ismail
Publication venue: 'IntechOpen'
Publication date: 01/04/2010
Field of study

Congestion-aware wireless network-on-chip for high-speed communication

Author: M. Devanathan
P. Sivakumar
V. Ranganathan
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2020
Field of study

The design of system-on-chip (SoC) requires the complex integration between a multi-number of cores on a single chip. To establish the effective communication between multiple cores there aremore challenging issues on designing the network-on-chip (NoC) architectures. The proposed system deals with the utilization of on-chip antennas for the wireless communication between the long distance cores to minimize the latency and power. In this proposed work, we have designed high-speed wireless NoC (WiNoC) for on-chip communication. This high-speed WiNoC has been achieved by designing a congestion measure unit, which monitors and measures the congestion in the input data and establishes the effective wireless communication between the output channels and routers. The designed architecture is synthesized and implemented by using Altera Quartus II, where the SoC is designed using Qsys builder. The proposed WiNoC shows better performance parameters like throughput, latency and power than the conventional NoC

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

A Scalable Packet-Switch Based on Output-Queued NoCs for Data Centre Networks

Author: Hassen F
Mhamdi L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2016
Field of study

The switch fabric in a Data-Center Network (DCN) handles constantly variable loads. This is stressing the need for high-performance packet switches able to keep pace with climbing throughput while maintaining resiliency and scalability. Conventional multistage switches with their space-memory variants proved to be performance limited as they do not scale well with the proliferating DC requirements. Most proposals are either too complex to implement or not cost effective. In this paper, we present a highly scalable multistage switching architecture for DC switching fabrics. We describe a three-stage Clos packet-switch fabric with Output-Queued Unidirectional NoC (OQ-UDN) modules and Round-Robin packets dispatching scheme. The proposed OQ Clos-UDN architecture avoids the need for complex and costly input modules and simplifies the scheduling process. Thanks to a dynamic packets dispatching and the multi-hop nature of the UDN modules, the switch provides load balancing and path-diversity. We compared our proposed architecture to state-of-the art previous architectures under extensive uniform and non-uniform DC traffic settings. Simulations of various switch settings have shown that the proposed OQ Clos-UDN outperforms previous proposals and maintains high throughput and latency performance

Crossref

White Rose Research Online

A efficacy of different buffer size on latency of network on chip (NoC)

Author: Ehkan P.
N. M. Warip M.
Phing Ng. Yen.
Wahida Binti Zulkefli Farah
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/06/2019
Field of study

Moore's prediction has been used to set targets for research and development in semiconductor industry for years now. A burgeoning number of processing cores on a chip demand competent and scalable communication architecture such as network-on-chip (NoC). NoC technology applies networking theory and methods to on-chip communication and brings noteworthy improvements over conventional bus and crossbar interconnections. Calculated performances such as latency, throughput, and bandwidth are characterized at design time to assured the performance of NoC. However, if communication pattern or parameters set like buffer size need to be altered, there might result in large area and power consumption or increased latency. Routers with large input buffers improve the efficiency of NoC communication while routers with small buffers reduce power consumption but result in high latency. This paper intention is to validate that size of buffer exert influence to NoC performance in several different network topologies. It is concluded that the way in which routers are interrelated or arranged affect NoC’s performance (latency) where different buffer sizes were adapted. That is why buffering requirements for different routers may vary based on their location in the network and the tasks assigned to them

Bulletin of Electrical Engineering and Informatics

Bibliometric Review of NoC Router Optimization

Author: Pande Jayshree
Shahane Priti
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2020
Field of study

Network on chip (NoC) has been proposed as an emerging solution for scalability and performance demands of next generation System on Chip (SoC). NoC provides a solution for the bus based interconnection issue of SoC, where large numbers of Intellectual Property modules (IP) are integrated on a single chip for better performance. The NoC has several advantages such as scalability, low latency and low power consumption, high bandwidth over dedicated wires and buses. Interconnections between multiple chip cores have a significant impact on the communication and performance of the chip design in terms of region, latency, throughput and power. In the NoC architecture, the router is a dominant component that significantly affects the performance of the NoC. NoC router architectures evolved since the year 2002 and progress in the domain pertaining to the optimization in the NoC router architectures has been discussed. The key objective of this bibliometric review is to understand the extent of the existing literature in the domain of performance efficient NoC router architectures. The bibliometric analysis is primarily based on data extracted from Scopus. It reveals that major contributions are done by researchers from USA, China followed by India in the form of conference, journals and articles publications. The major contribution is by the subject areas of Computer Science and Engineering followed by Mathematics and Material Science. The geographical analysis is done by using the GPS visualize tool. The clusters were created using Gephi

DigitalCommons@University of Nebraska

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

Author: Chen Yu-Hsin
Emer Joel
Sze Vivienne
Yang Tien-Ju
Publication venue
Publication date: 20/05/2019
Field of study

A recent trend in DNN development is to extend the reach of deep learning applications to platforms that are more resource and energy constrained, e.g., mobile devices. These endeavors aim to reduce the DNN model size and improve the hardware processing efficiency, and have resulted in DNNs that are much more compact in their structures and/or have high data sparsity. These compact or sparse models are different from the traditional large ones in that there is much more variation in their layer shapes and sizes, and often require specialized hardware to exploit sparsity for performance improvement. Thus, many DNN accelerators designed for large DNNs do not perform well on these models. In this work, we present Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs. To deal with the widely varying layer shapes and sizes, it introduces a highly flexible on-chip network, called hierarchical mesh, that can adapt to the different amounts of data reuse and bandwidth requirements of different data types, which improves the utilization of the computation resources. Furthermore, Eyeriss v2 can process sparse data directly in the compressed domain for both weights and activations, and therefore is able to improve both processing speed and energy efficiency with sparse models. Overall, with sparse MobileNet, Eyeriss v2 in a 65nm CMOS process achieves a throughput of 1470.6 inferences/sec and 2560.3 inferences/J at a batch size of 1, which is 12.6x faster and 2.5x more energy efficient than the original Eyeriss running MobileNet. We also present an analysis methodology called Eyexam that provides a systematic way of understanding the performance limits for DNN processors as a function of specific characteristics of the DNN model and accelerator design; it applies these characteristics as sequential steps to increasingly tighten the bound on the performance limits.Comment: accepted for publication in IEEE Journal on Emerging and Selected Topics in Circuits and Systems. This extended version on arXiv also includes Eyexam in the appendi

arXiv.org e-Print Archive

DSpace@MIT

Castell: a heterogeneous cmp architecture scalable to hundreds of processors

Author: Cabarcas Jaramillo Felipe
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2011
Field of study

Technology improvements and power constrains have taken multicore architectures to dominate microprocessor designs over uniprocessors. At the same time, accelerator based architectures have shown that heterogeneous multicores are very efficient and can provide high throughput for parallel applications, but with a high-programming effort. We propose Castell a scalable chip multiprocessor architecture that can be programmed as uniprocessors, and provides the high throughput of accelerator-based architectures. Castell relies on task-based programming models that simplify software development. These models use a runtime system that dynamically finds, schedules, and adds hardware-specific features to parallel tasks. One of these features is DMA transfers to overlap computation and data movement, which is known as double buffering. This feature allows applications on Castell to tolerate large memory latencies and lets us design the memory system focusing on memory bandwidth. In addition to provide programmability and the design of the memory system, we have used a hierarchical NoC and added a synchronization module. The NoC design distributes memory traffic efficiently to allow the architecture to scale. The synchronization module is a consequence of the large performance degradation of application for large synchronization latencies. Castell is mainly an architecture framework that enables the definition of domain-specific implementations, fine-tuned to a particular problem or application. So far, Castell has been successfully used to propose heterogeneous multicore architectures for scientific kernels, video decoding (using H.264), and protein sequence alignment (using Smith-Waterman and clustalW). It has also been used to explore a number of architecture optimizations such as enhanced DMA controllers, and architecture support for task-based programming models. ii

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura