27 research outputs found

    Arbitration Schemes for Multiprocessor Shared Bus

    Get PDF

    Design Approach to Implementation Of Arbitration Algorithm In Shared Bus Architectures (MPSoC)

    Get PDF
    The multiprocessor SoC designs have more than one processor and huge memory on the same chip. SoC consists of hardware cores and software cores ,multiple processors, embedded DRAM and connectors between cores .A wide range of MPSOC architectures have been developed over the past decade. This paper surveys the history of various On-Chip communication architectures present in the design of MPSoC. This acts as a primary factor of overall performance in complex SoC designs. Some of the various techniques that have driven the design of MpSoC has been discussed. Dynamically configurable communication architectures are found to improve the system performance. Currently On-chip interconnection networks are mostly implemented using shared buses which are the most common medium. The arbitration plays a crucial role in determining performance of bus-based system, as it assigns priorities, with which processor is granted the access to the shared communication resources. In the conventional arbitration algorithms there are some drawbacks such as bus starvation problem and low system performance. The bus should provide each component a flexible and utmost share of on-chip communication bandwidth and should improve the latency in access of the shared bus. The performance of SoC is improved using the probabilistic round robin algorithm with regard to the parameters, latency.Thus in this paper various issues related to bus arbitration related to design of MPSoC is analysed

    pTNoC: Probabilistically time-analyzable tree-based NoC for mixed-criticality systems

    Get PDF
    The use of networks-on-chip (NoC) in real-time safety-critical multicore systems challenges deriving tight worst-case execution time (WCET) estimates. This is due to the complexities in tightly upper-bounding the contention in the access to the NoC among running tasks. Probabilistic Timing Analysis (PTA) is a powerful approach to derive WCET estimates on relatively complex processors. However, so far it has only been tested on small multicores comprising an on-chip bus as communication means, which intrinsically does not scale to high core counts. In this paper we propose pTNoC, a new tree-based NoC design compatible with PTA requirements and delivering scalability towards medium/large core counts. pTNoC provides tight WCET estimates by means of asymmetric bandwidth guarantees for mixed-criticality systems with negligible impact on average performance. Finally, our implementation results show the reduced area and power costs of the pTNoC.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under the PROXIMA Project (www.proxima-project.eu), grant agreement no 611085. This work has also been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Mladen Slijepcevic is funded by the Obra Social Fundación la Caixa under grant Doctorado “la Caixa” - Severo Ochoa. Carles Hern´andez is jointly funded by the Spanish Ministry of Economy and Competitiveness (MINECO) and FEDER funds through grant TIN2014-60404-JIN. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    On the suitability of time-randomized processors for secure and reliable high-performance computing

    Get PDF
    Time-randomized processor (TRP) architectures have been shown as one of the most promising approaches to deal with the overwhelming complexity of the timing analysis of high complex processor architectures for safety-related real-time systems. With TRPs the timing analysis step mainly relies on collecting measurements of the task under analysis rather than on complex timing models of the processor. Additionally, randomization techniques applied in TRPs provide increased reliability and security features. In this thesis, we elaborate on the reliability and security properties of TRPs and the suitability of extending this processor architecture design paradigm to the high-performance computing domain

    Design and Implementation of Shared Bus based Heterogeneous MPSoC

    Get PDF
    An MPSoC architecture is proposed with shared bus interconnect and its components mainly comprising of soft IPs. The proposed MPSoC architecture has four masters and four slaves communicated over a shared bus interconnect. Each master deals with two 16-bit inputs and process among an output of 32-bit. The slaves are four independent RAM soft IPs to be designed to handle 32-bit data. The main theme is to make the four masters and four slaves to get their tasks accessed through a 32-bit shared bus interconnect. Initially the soft IPs of processors and RAM memory elements are to be designed and to be verified using Modelsim simulation software. Before developing the proposed architecture, a prototype of one master to four slaves (1:4) with a simple address decoding scheme has to be developed and simulated in Modelsim simulation software. The prototype model architecture should be synthesized under target device Altera Cyclone II using Quartus synthesizing tool. The proposed architecture of 4 masters and 4 slaves with a common shared bus interconnect should be achieved and implement the entire architecture over Altera FPGA board and verify its functionality

    Implementation of Bus Arbiter Using Round Robin Scheme

    Get PDF
    ABSTRACT :In System on Chip (SoC) buses, intellectual properties (IPs) need to communicate with each other to access the required functionality. When the SoC bus is connected with more IPs, contentions occur while multiple IPs requests the bus at the same time. This makes on-chip bus based communication a major challenge for the system designer in the current SoC technology. The communication architectures must be able to adapt themselves according to the real-time requirements of the IPs. Hence, bus arbiters are proposed.The arbiter block plays important role in the SoC shared bus communication. The masters on a SoC bus may issue requests simultaneously and hence an arbiter is required to decide which master is granted for bus access. Bus Arbiter plays a vital role in handling the requests from the master and responses from slave (like Acknowledgement signal, Retry, etc). The main objective of arbitration algorithms is to ensure that only one master has access to the bus at any given time, all the other masters are forced to remain in the idle state until they are granted the use of the bus

    Design and implementation of a fair credit-based bandwidth sharing scheme for buses

    Get PDF
    Fair arbitration in the access to hardware shared resources is fundamental to obtain low worst-case execution time (WCET) estimates in the context of critical real-time systems, for which performance guarantees are essential. Several hardware mechanisms exist for managing arbitration in those resources (buses, memory controllers, etc.). They typically attain fairness in terms of the number of slots each contender (e.g., core) gets granted access to the shared resource. However, those policies may lead to unfair bandwidth allocations for workloads with contenders issuing short requests and contenders issuing long requests. We propose a Credit-Based Arbitration (CBA) mechanism that achieves fairness in the cycles each core is granted access to the resource rather than in the number of granted slots. Furthermore, we implement CBA as part of a LEON3 4-core processor for the Space domain in an FPGA proving the feasibility and good performance characteristics of the design by comparing it against other arbitration schemes.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under the PROXIMA Project (www.proxima-project.eu), grant agreement no 611085. This work has also been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Mladen Slijepcevic is funded by the Obra Social Fundaci´on la Caixa under grant Doctorado “la Caixa” - Severo Ochoa. Carles Hernández is jointly funded by the Spanish Ministry of Economy and Competitiveness (MINECO) and FEDER funds through grant TIN2014-60404-JIN. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    A Bandwidth Control Arbitration for SoC Interconnections Performing Applications With Task Dependencies

    Get PDF
    Current System-on-Chips (SoCs) execute applications with task dependency that compete for shared resources such as buses, memories, and accelerators. In such a structure, the arbitration policy becomes a critical part of the system to guarantee access and bandwidth suitable for the competing applications. Some strategies proposed in the literature to cope with these issues are Round-Robin, Weighted Round-Robin, Lottery, Time Division Access Multiplexing (TDMA), and combinations. However, a fine-grained bandwidth control arbitration policy is missing from the literature. We propose an innovative arbitration policy based on opportunistic access and a supervised utilization of the bus in terms of transmitted flits (transmission units) that settle the access and fine-grained control. In our proposal, every competing element has a budget. Opportunistic access grants the bus to request even if the component has spent all its flits. Supervised debt accounts a record for every transmitted flit when it has no flits to spend. Our proposal applies to interconnection systems such as buses, switches, and routers. The presented approach achieves deadlock-free behavior even with task dependency applications in the scenarios analyzed through cycle-accurate simulation models. The synergy between opportunistic and supervised debt techniques outperforms Lottery, TDMA, and Weighted Round-Robin in terms of bandwidth control in the experimental studies performed

    Time-Randomized Wormhole NoCs for Critical Applications

    Get PDF
    Wormhole-based NoCs (wNoCs) are widely accepted in high-performance domains as the most appropriate solution to interconnect an increasing number of cores in the chip. However, wNoCs suitability in the context of critical real-time applications has not been demonstrated yet. In this article, in the context of probabilistic timing analysis (PTA), we propose a PTA-compatible wNoC design that provides tight time-composable contention bounds. The proposed wNoC design builds on PTA ability to reason in probabilistic terms about hardware events impacting execution time (e.g., wNoC contention), discarding those sequences of events occurring with a negligible low probability. This allows our wNoC design to deliver improved guaranteed performance w.r.t. conventional time-deterministic setups. Our results show that performance guarantees of applications running on top of probabilistic wNoC designs improve by 40% and 93% on average for 4 × 4 and 6 × 6 wNoC setups, respectively.The research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under the PROXIMA Project (www.proxima-project.eu), grant agreement no 611085. This work has also been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Mladen Slijepcevic is funded by the Obra Social Fundación la Caixa under grant Doctorado \la Caixa" - Severo Ochoa. Carles Hernández is jointly funded by the Spanish Ministry of Economy and Competitiveness (MINECO) and FEDER funds through grant TIN2014-60404-JIN. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    Scaling High-Performance Interconnect Architectures to Many-Core Systems.

    Full text link
    The ever-increasing demand for performance scaling has made multi-core (2-8 cores) chips prevalent in today’s computing systems and foreshadows the shift toward many-core (10s- 100s of cores) chips in the near future. Although the potential performance gains from many-core systems remain appealing, the widespread adoption of these systems hinges on their ability to scale performance while simultaneously satisfying Quality-of-Service (QoS) and energy-efficiency constraints. This work makes the case that the interconnect for these many-core systems has a significant impact on the aforementioned scalability issues. The impact of interconnects on many-core systems is illustrated by observing that the degree of the interconnect has a signicant effect on system scalability and demonstrating that the architecture of high-radix, many-core systems are feasible, energy-efficient, and high-performance. The feasibility of high-radix crossbars for many-core systems is first shown through a new circuit-level building block called the Swizzle-Switch which can operate at frequencies up to 1.5GHz for 128-bit, radix-64 crossbars. This work then shows how a many-core system called the Swizzle-Switch Network (SSN) can use the Swizzle-Switch as the central building block for a flat crossbar interconnect. The SSN is shown to be advantageous to traditional Network-on-Chip (NoC) for systems up to 64 cores. The SSN performance by 21% relative to a Mesh while also providing a 25% energy savings over the Mesh. The Swizzle-Switch is also leveraged as a building block for high-radix NoC topologies that can support many-core architectures. The Swizzle-Switch-based Flattened Butterfly topology is demonstrated to provide a 15% speedup and 10% energy savings over the Mesh. Finally, the impact that 3D stacking technology has on many-core scalability is evaluated for bus and crossbar interconnects. A 3D-optimized Swizzle-Switch Network is able to leverage frequency gains to achieve a 15-28% speedup over a 2D-Swizzle-Switch Network when using memory- intensive benchmarks. Additionally, a bus-based 64-core architecture is shown to provide an average speedup of 49× over a baseline uniprocessor system when using 3D technology.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/93980/1/ksewell_1.pd
    corecore