6,623 research outputs found

    Evaluation of temperature-performance trade-offs in wireless network-on-chip architectures

    Get PDF
    Continued scaling of device geometries according to Moore\u27s Law is enabling complete end-user systems on a single chip. Massive multicore processors are enablers for many information and communication technology (ICT) innovations spanning various domains, including healthcare, defense, and entertainment. In the design of high-performance massive multicore chips, power and heat are dominant constraints. Temperature hotspots witnessed in multicore systems exacerbate the problem of reliability in deep submicron technologies. Hence, there is a great need to explore holistic power and thermal optimization and management strategies for the massive multicore chips. High power consumption not only raises chip temperature and cooling cost, but also decreases chip reliability and performance. Thus, addressing thermal concerns at different stages of the design and operation is critical to the success of future generation systems. The performance of a multicore chip is also influenced by its overall communication infrastructure, which is predominantly a Network-on-Chip (NoC). The existing method of implementing a NoC with planar metal interconnects is deficient due to high latency, significant power consumption, and temperature hotspots arising out of long, multi-hop wireline links used in data exchange. On-chip wireless networks are envisioned as an enabling technology to design low power and high bandwidth massive multicore architectures. However, optimizing wireless NoCs for best performance does not necessarily guarantee a thermally optimal interconnection architecture. The wireless links being highly efficient attract very high traffic densities which in turn results in temperature hotspots. Therefore, while the wireless links result in better performance and energy-efficiency, they can also cause temperature hotspots and undermine the reliability of the system. Consequently, the location and utilization of the wireless links is an important factor in thermal optimization of high performance wireless Networks-on-Chip. Architectural innovation in conjunction with suitable power and thermal management strategies is the key for designing high performance yet energy-efficient massive multicore chips. This work contributes to exploration of various the design methodologies for establishing wireless NoC architectures that achieve the best trade-offs between temperature, performance and energy-efficiency. It further demonstrates that incorporating Dynamic Thermal Management (DTM) on a multicore chip designed with such temperature and performance optimized Wireless Network-on-Chip architectures improves thermal profile while simultaneously providing lower latency and reduced network energy dissipation compared to its conventional counterparts

    Design and implementation of the Quarc network on-chip

    Get PDF
    Networks-on-Chip (NoC) have emerged as alternative to buses to provide a packet-switched communication medium for modular development of large Systems-on-Chip. However, to successfully replace its predecessor, the NoC has to be able to efficiently exchange all types of traffic including collective communications. The latter is especially important for e.g. cache updates in multicore systems. The Quarc NoC architecture has been introduced as a Networks-on-Chip which is highly efficient in exchanging all types of traffic including broadcast and multicast. In this paper we present the hardware implementation of the switch architecture and the network adapter (transceiver) of the Quarc NoC. Moreover, the paper presents an analysis and comparison of the cost and performance between the Quarc and the Spidergon NoCs implemented in Verilog targeting the Xilinx Virtex FPGA family. We demonstrate a dramatic improvement in performance over the Spidergon especially for broadcast traffic, at no additional hardware cost

    Shortest path routing algorithm for hierarchical interconnection network-on-chip

    Get PDF
    Interconnection networks play a significant role in efficient on-chip communication for multicore systems. This paper introduces a new interconnection topology called the Hierarchical Cross Connected Recursive network (HCCR) and a shortest path routing algorithm for the HCCR. Proposed topology offers a high degree of regularity, scalability, and symmetry with a reduced number of links and node degree. A unique address encoding scheme is proposed for hierarchical graphical representation of HCCR networks, and based on this scheme a shortest path routing algorithm is devised. The algorithm requires 5(k-1) time where k=logn4-2 and k>0, in worst case to determine the next node along the shortest path

    DVFS using clock scheduling for Multicore Systems-on-Chip and Networks-on-Chip

    Get PDF
    A modern System-on-Chip (SoC) contains processor cores, application-specific process- ing elements, memory, peripherals, all connected with a high-bandwidth and low-latency Network-on-Chip (NoC). The downside of such very high level of integration and con- nectivity is the high power consumption. In CMOS technology this is made of a dynamic and a static component. To reduce the dynamic component, Dynamic voltage and Fre- quency Scaling (DVFS) has been adopted. Although DVFS is very effective chip-wide, the power optimization of complex SoCs calls for a finer grain application of DVFS. Ideally all the main components of an SoC should be provided with a DVFS controller. An SoC with a DVFS controller per component with individual DC-DC converters and PLL/DLL circuits cannot scale in size to hundreds of components, which are in the research agenda. We present an alternative that will permit such scaling. It is possible to achieve results close to an optimum DVFS by hopping between few voltage levels and by an innovative application of clock-gating that we term as clock scheduling. We obtain an effective clock frequency by periodically killing some clock cycles of a master clock. We can apply voltage scaling for some of the periodic clock schedules which yield effective clock 1/2, 1/3, . . . By dithering between few voltages we obtain results close to an ideal DVFS system in simple pipelined circuits and in a complex example, a NoC’s switch. Again in the context of a NoC, we show how clock scheduling and voltage scaling can be automatically determined by means of a proportional-integral loop controller that keeps track of the network load. We describe in detail its implementation and all the circuit-level issues that we found. For a single switch, result shows an advantage of up to 2X over simple frequency scaling without voltage scaling. By providing each NoC’s switch with our simple DVFS controller, power saving at network level can be significantly more than what a a global DVFS controller can get. In a realistic scenario represented by network traces generated by video applications (MPEG, PIP, MWD, VoPD), we obtain an average power saving of 33%. To reduce static power, the Power-Gating (PG) technique is used and consists in switching- off power supply of unused blocks via pMOS headers or nMOS footers in series with such blocks. Even though research has been done in this field, the application of PG to NoCs has not been fully investigated. We show that it is possible to apply PG to the input buffers of a NoC switch. Their leakage power contributes about 40-50% of total NoC power, hence reducing such contribution is worthwhile. We partitioned buffers in banks and apply PG only to inactive banks. With our technique, it is possible to save about 40% in leakage power, without impact on performance

    A Survey of Prediction and Classification Techniques in Multicore Processor Systems

    Get PDF
    In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems

    Space division multiplexing chip-to-chip quantum key distribution

    Get PDF
    Quantum cryptography is set to become a key technology for future secure communications. However, to get maximum benefit in communication networks, transmission links will need to be shared among several quantum keys for several independent users. Such links will enable switching in quantum network nodes of the quantum keys to their respective destinations. In this paper we present an experimental demonstration of a photonic integrated silicon chip quantum key distribution protocols based on space division multiplexing (SDM), through multicore fiber technology. Parallel and independent quantum keys are obtained, which are useful in crypto-systems and future quantum network
    • …
    corecore