363 research outputs found

    Fast simulation of networks-on-chip with priority-preemptive arbitration

    Get PDF
    An increasingly time-consuming part of the design flow of on-chip multiprocessors is the simulation of the interconnect architecture. The accurate simulation of state-of-the art network-on-chip interconnects can take hours, and this process is repeated for each design iteration because it provides valuable insights on communication latencies that can greatly affect the overall performance of the system. In this article, we identify a time-predictable network-on-chip architecture and show that its timing behaviour can be predicted using models which are far less complex than the architecture itself. We then explore such a feature to produce simplified and lightweight simulation models that can produce latency figures with more than 90% accuracy and simulate more than 1,000 times faster when compared to a cycle-accurate model of the same interconnect

    Comparative performance evaluation of latency and link dynamic power consumption modelling algorithms in wormhole switching networks on chip

    Get PDF
    The simulation of interconnect architectures can be a time-consuming part of the design flow of on-chip multiprocessors. Accurate simulation of state-of-the art network-on-chip interconnects can take several hours for realistic application examples, and this process must be repeated for each design iteration because the interactions between design choices can greatly affect the overall throughput and latency performance of the system. This paper presents a series of network-on-chip transaction-level model (TLM) algorithms that provide a highly abstracted view of the process of data transmission in priority preemptive and non-preemptive networks-on-chip, which permit a major reduction in simulation event count. These simulation models are tested using two realistic application case studies and with synthetic traffic. Results presented demonstrate that these lightweight TLM simulation models can produce latency figures accurate to within mere flits for the majority of flows, and more than 93% accurate link dynamic power consumption modelling, while simulating 2.5 to 3 orders of magnitude faster when compared to a cycle-accurate model of the same interconnect

    Performance evaluation of HEVC RCL applications mapped onto NoC-based embedded platforms

    Get PDF
    Today, several applications running into embedded systems have to fulfill soft or hard timing constraints. Video applications, like the modern High Efficiency Video Coding (HEVC), e.g., most often have soft real-time constraints. However, in specific scenarios, such as in robotic surgeries, the coupling of satellites and so on, harder timing constraints arise, becoming a huge challenge. Although the implementation of such applications in Networks-on-Chip (NoCs) being an alternative to reduce their algorithmic complexity and meet real-time constraints, a performance evaluation of the mapped NoC and the schedulability analysis for a given application are mandatory. In this work we make a performance evaluation of HEVC Residual Coding Loop (RCL) mapped onto a NoC-based embedded platform, considering the encoding of a single 1920x1080 pixels frame. A set of analysis exploring the combination of different NoC sizes and task mapping strategies were performed, showing for the typical and upper-bound workload cases scenarios when the application is schedulable and meets the real-time constraints

    SARA: Self-Aware Resource Allocation for Heterogeneous MPSoCs

    Full text link
    In modern heterogeneous MPSoCs, the management of shared memory resources is crucial in delivering end-to-end QoS. Previous frameworks have either focused on singular QoS targets or the allocation of partitionable resources among CPU applications at relatively slow timescales. However, heterogeneous MPSoCs typically require instant response from the memory system where most resources cannot be partitioned. Moreover, the health of different cores in a heterogeneous MPSoC is often measured by diverse performance objectives. In this work, we propose a Self-Aware Resource Allocation (SARA) framework for heterogeneous MPSoCs. Priority-based adaptation allows cores to use different target performance and self-monitor their own intrinsic health. In response, the system allocates non-partitionable resources based on priorities. The proposed framework meets a diverse range of QoS demands from heterogeneous cores.Comment: Accepted by the 55th annual Design Automation Conference 2018 (DAC'18

    Priority Based Switch Allocator in Adaptive Physical Channel Regulator for On Chip Interconnects

    Get PDF
    Chip multiprocessors (CMPs) are now popular design paradigm for microprocessors due to their power, performance and complexity advantages where a number of relatively simple cores are integrated on a single die. On chip interconnection network (NoC) is an excellent architectural paradigm which offers a stable and generalized communication platform for large scale of chip multiprocessors. The existing model APCR has three regulation schemes designed at switch allocation stage of NoC router pipelining, such as monopolizing, fair-sharing and channel-stealing. Its aim is to fairly allocate physical bandwidth in the form of flit level transmission unit while breaking the conventional assumptions i.e.its size is same as phit size. They have implemented channel-stealing scheme using the existing round-robin scheduler which is a well known scheduling algorithm for providing fairness, which is not an optimal solution. In this thesis, we have extended the efficiency of APCR model and propose three efficient scheduling policies for the channel stealing scheme in order to provide better quality of service (QoS). Our work can be divided into three parts. In the first part, we implemented ratio based scheduling technique in which we keep track of average number of its sent from each input in every cycle. It not only provides fairness among virtual channels (VCs), but also increases the saturation throughput of the network. In the second part, we have implemented an age based scheduling technique where we prioritize the VC, based on the age of the requesting flits. The age of each request is calculated as the difference between the time of injection and the current simulation time. Age based scheduler minimizes the packet latency. In the last part, we implemented a Static-Priority based scheduler. In this case, we arbitrarily assign random priorities to the packets at the time of their injection into the network. In this case, the high priority packets can be forwarded to any of the VCs, whereas the low priority packets can be forwarded to a limited number of VCs. So, basically Static-Priority based scheduler limits the accessibility on the number of VCs depending upon the packet priority. We study the performance metrics such as the average packet latency, and saturation throughput resulted by all the three new scheduling techniques. We demonstrate our simulation results for all three scheduling policies i.e. bit complement, transpose and uniform random considering from very low (no load) to high load injection rates. We evaluate the performance improvement because of our proposed scheduling techniques in APCR comparing with the performance of basic NoC design. The performance is also compared with the results found in monopolizing, fair-sharing and round-robin schemes for channel-stealing of APCR. It is observed from the simulation results using our detailed cycle-accurate simulator that our new scheduling policies implemented in APCR model improves the network throughput by 10% in case of synthetic workloads, compared with the existing round-robin scheme. Also, our scheduling policy in APCR model outperforms the baseline router by 28X under synthetic workloads

    An analysis and Simulation Tool of Real-Time Communications in On-Chip Networks: A Comparative Study

    Get PDF
    International audienceThis paper presents Real-Time Network-on-chip-based architecture Analysis and Simulation tool (ReTiNAS), with a special focus on real-time communications. It allows fast and precise exploration of real-time design choices onto NoC architectures. ReTiNAS is an event-based simulator written in Python. It implements different real-time communication protocols and tracks the communications within the NoC at cycle level. Its modularity allows activating and deactivating different NoC components and easily extending the implemented protocols for more customized simulations and analysis. Further, we use ReTiNAS to perform a comparative study of analysis and simulation for different communication protocols using a wide set of synthetic experiments

    Side-Channel Protected MPSoC through Secure Real-Time Networks-on-Chip

    Get PDF
    The integration of Multi-Processors System-on-Chip (MPSoCs) into the Internet -of -Things (IoT) context brings new opportunities, but also represent risks. Tight real-time constraints and security requirements should be considered simultaneously when designing MPSoCs. Network-on-Chip (NoCs) are specially critical when meeting these two conflicting characteristics. For instance the NoC design has a huge influence in the security of the system. A vital threat to system security are so-called side-channel attacks based on the NoC communication observations. To this end, we propose a NoC security mechanism suitable for hard real-time systems, in which schedulability is a vital design requirement. We present three contributions. First, we show the impact of the NoC routing in the security of the system. Second, we propose a packet route randomisation mechanism to increase NoC resilience against side-channel attacks. Third, using an evolutionary optimisation approach, we effectively apply route randomisation while controlling its impact on hard real-time performance guarantees. Extensive experimental evidence based on analytical and simulation models supports our findings

    Improving Packet Predictability of Scalable Network-on-Chip Designs without Priority Pre-emptive Arbitration

    Get PDF
    The quest for improving processing power and efficiency is spawning research into many-core systems with hundreds or thousands of cores. With communication being forecast as the foremost performance bottleneck, Network-on-Chips are the favoured communication infrastructure in the context mainly due to reasons like scalability and power efficiency. However, contention between non-preemptive NoC packets can result in variation in packet latencies thus potentially limiting the overall utilisation of the many-core system. Typical latency predictability enhancement techniques like Virtual Channels or Time Division Multiplexing are usually hardware expensive or non-scalable or both. This research explores the use of dynamic and scalable techniques in Network-on-Chip routers to improve packet predictability by countering Head-of-line blocking (blocked low priority packet blocking a high priority packet) and tailbacking (low priority packet utilising the link that is required by a high priority packet) of non-preemptive packets. The Priority forwarding and tunnelling technique introduced is designed to detect Head-of-line blocking situations so that its internal arbitration parameters can be altered (by forwarding packet parameters down the line) to resolve such issues. The Selective packet splitting technique presented allows resolution of tailbacking by emulating the effect of preemption of packets (by splitting packets) by using a low overhead alternative that manipulates packets. Finally, the thesis presents an architecture that allows the routers to have a notion of timeliness in data packets thus enabling packet arbitration based on application-supplied priority and timeliness thus improving the quality of service given to lower priority packets. Furthermore, the techniques presented in the thesis do not require additional hardware with the increase in size of the NoC. This enables the techniques to be scalable, as the size of the NoC or the number of packet priorities the NoC has to handle does not affect the functionality and operation of the techniques
    corecore