1,391 research outputs found

    Scalability of broadcast performance in wireless network-on-chip

    Get PDF
    Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version

    Design Approach to Implementation Of Arbitration Algorithm In Shared Bus Architectures (MPSoC)

    Get PDF
    The multiprocessor SoC designs have more than one processor and huge memory on the same chip. SoC consists of hardware cores and software cores ,multiple processors, embedded DRAM and connectors between cores .A wide range of MPSOC architectures have been developed over the past decade. This paper surveys the history of various On-Chip communication architectures present in the design of MPSoC. This acts as a primary factor of overall performance in complex SoC designs. Some of the various techniques that have driven the design of MpSoC has been discussed. Dynamically configurable communication architectures are found to improve the system performance. Currently On-chip interconnection networks are mostly implemented using shared buses which are the most common medium. The arbitration plays a crucial role in determining performance of bus-based system, as it assigns priorities, with which processor is granted the access to the shared communication resources. In the conventional arbitration algorithms there are some drawbacks such as bus starvation problem and low system performance. The bus should provide each component a flexible and utmost share of on-chip communication bandwidth and should improve the latency in access of the shared bus. The performance of SoC is improved using the probabilistic round robin algorithm with regard to the parameters, latency.Thus in this paper various issues related to bus arbitration related to design of MPSoC is analysed

    On the design of a high-performance adaptive router for CC-NUMA multiprocessors

    Get PDF
    Copyright © 2003 IEEEThis work presents the design and evaluation of an adaptive packet router aimed at supporting CC-NUMA traffic. We exploit a simple and efficient packet injection mechanism to avoid deadlock, which leads to a fully adaptive routing by employing only three virtual channels. In addition, we selectively use output buffers for implementing the most utilized virtual paths in order to reduce head-of-line blocking. The careful implementation of these features has resulted in a good trade off between network performance and hardware cost. The outcome of this research is a High-Performance Adaptive Router (HPAR), which adequately balances the needs of parallel applications: minimal network latency at low loads and high throughput at heavy loads. The paper includes an evaluation process in which HPAR is compared with other adaptive routers using FIFO input buffering, with or without additional virtual channels to reduce head-of-line blocking. This evaluation contemplates both the VLSI costs of each router and their performance under synthetic and real application workloads. To make the comparison fair, all the routers use the same efficient deadlock avoidance mechanism. In all the experiments, HPAR exhibited the best response among all the routers tested. The throughput gains ranged from 10 percent to 40 percent in respect to its most direct rival, which employs more hardware resources. Other results shown that HPAR achieves up to 83 percent of its theoretical maximum throughput under random traffic and up to 70 percent when running real applications. Moreover, the observed packet latencies were comparable to those exhibited by simpler routers. Therefore, HPAR can be considered as a suitable candidate to implement packet interchange in next generations of CC-NUMA multiprocessors.Valentín Puente, José-Ángel Gregorio, Ramón Beivide, and Cruz Iz

    A Study of Multiprocessor Systems using the Picoblaze 8-bit Microcontroller Implemented on Field Programmable Gate Arrays

    Get PDF
    As Field Programmable Gate Arrays (FPGAs) are becoming more capable of implementing complex logic circuits, designers are increasingly choosing them over traditional microprocessor-based systems for implementing digital controllers and digital signal processing applications. Indeed, as FPGAs are being built using state-of-the-art deep submicron CMOS processes, the increased amount of logic and memory resources allows such FPGA-based implementations to compete in terms of speed, complexity, and power dissipation with most custom-built chips, but at a fraction of the development costs. The modern FPGA is now capable of implementing multiple instances of configurable processors that are completely specified by a high-level descriptor language. Such arrays of soft processor cores have opened up new design possibilities that include complex embedded systems applications that were previously implemented by custom multiprocessor chips. As the FPGA-based multiprocessor system is completely configurable by the user, it can be optimized for speed and power dissipation to fit a given application. The goal of this thesis is to investigate design methods for implementing an array of soft processor cores using the Xilinx FPGA-based 8-bit microcontroller known as PicoBlaze. While development tools exist for the larger 32-bit processor from Xilinx known as MicroBlaze, no such resources are currently available for the PicoBlaze microcontroller. PicoBlaze benefits in applications that requires only less data bits (less than 8 bits). For example, consider the gene sequencing or DNA sequencing in which the processing requires only 2 to 5 bits. In such an application, PicoBlaze can be a simple processor to produce the results. Also, the PicoBlaze unit offers a finer level of granularity and hence consumes fewer resources than the larger 32-bit MicroBlaze processor. Hence, the former will find applications in embedded systems requiring a complex design to be partitioned over several processors but where only an 8-bit datapath is required

    PPMB: A Partial-Multiple-Bus Multiprocessor Architecture with Improved Cost-Effectiveness

    Get PDF
    This paper addresses the design and performance analysis of partial-multiple-bus interconnection networks. They are bus architectures that have evolved from multiple-bus structure by dividing buses into groups and reducing bus connections. Their effect is to reduce cost and alleviate arbitration and drive requirements without degrading performance significantly. One such structure, called processor-oriented partial-multiple-bus (or PPMB), is proposed. It serves as an alternative to the conventional structure called memory-oriented partial-multiple-bus (or MPMB) and is aimed at higher system performance at less or equal system cost. It has been shown, both analytically and by simulation, that a substantial increase in system bandwidth (up to 20%) is achieved by the PPMB structure over the MPMB structure. With very large systems, the results also imply a significantly improved cost-effectiveness over the conventional multiple-bus architecture
    corecore