8,250 research outputs found

    The impact of traffic localisation on the performance of NoCs for very large manycore systems

    Get PDF
    The scaling of semiconductor technologies is leading to processors with increasing numbers of cores. The adoption of Networks-on-Chip (NoC) in manycore systems requires a shift in focus from computation to communication, as communication is fast becoming the dominant factor in processor performance. In large manycore systems, performance is predicated on the locality of communication. In this work, we investigate the performance of three NoC topologies for systems with thousands of processor cores under two types of localised traffic. We present latency and throughput results comparing fat quadtree, concentrated mesh and mesh topologies under different degrees of localisation. Our results, based on the ITRS physical data for 2023, show that the type and degree of localisation of traffic significantly affects the NoC performance, and that scale-invariant topologies perform worse than flat topologies

    Tree-structured small-world connected wireless network-on-chip with adaptive routing

    Get PDF
    Traditional Network-on-Chip (NoC) systems comprised of many cores suffer from debilitating bottlenecks of latency and significant power dissipation due to the overhead inherent in multi-hop communication. In addition, these systems remain vulnerable to malicious circuitry incorporated into the design by untrustworthy vendors in a world where complex multi-stage design and manufacturing processes require the collective specialized services of a variety of contractors. This thesis proposes a novel small-world tree-based network-on-chip (SWTNoC) structure designed for high throughput, acceptable energy consumption, and resiliency to attacks and node failures resulting from the insertion of hardware Trojans. This tree-based implementation was devised as a means of reducing average network hop count, providing a large degree of local connectivity, and effective long-range connectivity by means of a novel wireless link approach based on carbon nanotube (CNT) antenna design. Network resiliency is achieved by means of a devised adaptive routing algorithm implemented to work with TRAIN (Tree-based Routing Architecture for Irregular Networks). Comparisons are drawn with benchmark architectures with optimized wireless link placement by means of the simulated annealing (SA) metaheuristic. Experimental results demonstrate a 21% throughput improvement and a 23% reduction in dissipated energy per packet over the closest competing architecture. Similar trends are observed at increasing system sizes. In addition, the SWTNoC maintains this throughput and energy advantage in the presence of a fault introduced into the system. By designing a hierarchical topology and designating a higher level of importance on a subset of the nodes, much higher network throughput can be attained while simultaneously guaranteeing deadlock freedom as well as a high degree of resiliency and fault-tolerance

    Self-Learning Hot Data Prediction: Where Echo State Network Meets NAND Flash Memories

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Well understanding the access behavior of hot data is significant for NAND flash memory due to its crucial impact on the efficiency of garbage collection (GC) and wear leveling (WL), which respectively dominate the performance and life span of SSD. Generally, both GC and WL rely greatly on the recognition accuracy of hot data identification (HDI). However, in this paper, the first time we propose a novel concept of hot data prediction (HDP), where the conventional HDI becomes unnecessary. First, we develop a hybrid optimized echo state network (HOESN), where sufficiently unbiased and continuously shrunk output weights are learnt by a sparse regression based on L2 and L1/2 regularization. Second, quantum-behaved particle swarm optimization (QPSO) is employed to compute reservoir parameters (i.e., global scaling factor, reservoir size, scaling coefficient and sparsity degree) for further improving prediction accuracy and reliability. Third, in the test on a chaotic benchmark (Rossler), the HOESN performs better than those of six recent state-of-the-art methods. Finally, simulation results about six typical metrics tested on five real disk workloads and on-chip experiment outcomes verified from an actual SSD prototype indicate that our HOESN-based HDP can reliably promote the access performance and endurance of NAND flash memories.Peer reviewe

    A low-energy rate-adaptive bit-interleaved passive optical network

    Get PDF
    Energy consumption of customer premises equipment (CPE) has become a serious issue in the new generations of time-division multiplexing passive optical networks, which operate at 10 Gb/s or higher. It is becoming a major factor in global network energy consumption, and it poses problems during emergencies when CPE is battery-operated. In this paper, a low-energy passive optical network (PON) that uses a novel bit-interleaving downstream protocol is proposed. The details about the network architecture, protocol, and the key enabling implementation aspects, including dynamic traffic interleaving, rate-adaptive descrambling of decimated traffic, and the design and implementation of a downsampling clock and data recovery circuit, are described. The proposed concept is shown to reduce the energy consumption for protocol processing by a factor of 30. A detailed analysis of the energy consumption in the CPE shows that the interleaving protocol reduces the total energy consumption of the CPE significantly in comparison to the standard 10 Gb/s PON CPE. Experimental results obtained from measurements on the implemented CPE prototype confirm that the CPE consumes significantly less energy than the standard 10 Gb/s PON CPE

    Temperature Evaluation of NoC Architectures and Dynamically Reconfigurable NoC

    Get PDF
    Advancements in the field of chip fabrication led to the integration of a large number of transistors in a small area, giving rise to the multi–core processor era. Massive multi–core processors facilitate innovation and research in the field of healthcare, defense, entertainment, meteorology and many others. Reduction in chip area and increase in the number of on–chip cores is accompanied by power and temperature issues. In high performance multi–core chips, power and heat are predominant constraints. High performance massive multicore systems suffer from thermal hotspots, exacerbating the problem of reliability in deep submicron technologies. High power consumption not only increases the chip temperature but also jeopardizes the integrity of the system. Hence, there is a need to explore holistic power and thermal optimization and management strategies for massive on–chip multi–core environments. In multi–core environments, the communication fabric plays a major role in deciding the efficiency of the system. In multi–core processor chips this communication infrastructure is predominantly a Network–on–Chip (NoC). Tradition NoC designs incorporate planar interconnects as a result these NoCs have long, multi–hop wireline links for data exchange. Due to the presence of multi–hop planar links such NoC architectures fall prey to high latency, significant power dissipation and temperature hotspots. Networks inspired from nature are envisioned as an enabling technology to achieve highly efficient and low power NoC designs. Adopting wireless technology in such architectures enhance their performance. Placement of wireless interconnects (WIs) alters the behavior of the network and hence a random deployment of WIs may not result in a thermally optimal solution. In such scenarios, the WIs being highly efficient would attract high traffic densities resulting in thermal hotspots. Hence, the location and utilization of the wireless links is a key factor in obtaining a thermal optimal highly efficient Network–on–chip. Optimization of the NoC framework alone is incapable of addressing the effects due to the runtime dynamics of the system. Minimal paths solely optimized for performance in the network may lead to excessive utilization of certain NoC components leading to thermal hotspots. Hence, architectural innovation in conjunction with suitable power and thermal management strategies is the key for designing high performance and energy–efficient multicore systems. This work contributes at exploring various wired and wireless NoC architectures that achieve best trade–offs between temperature, performance and energy–efficiency. It further proposes an adaptive routing scheme which factors in the thermal profile of the chip. The proposed routing mechanism dynamically reacts to the thermal profile of the chip and takes measures to avoid thermal hotspots, achieving a thermally efficient dynamically reconfigurable network on chip architecture

    An Artificial Neural Networks based Temperature Prediction Framework for Network-on-Chip based Multicore Platform

    Get PDF
    Continuous improvement in silicon process technologies has made possible the integration of hundreds of cores on a single chip. However, power and heat have become dominant constraints in designing these massive multicore chips causing issues with reliability, timing variations and reduced lifetime of the chips. Dynamic Thermal Management (DTM) is a solution to avoid high temperatures on the die. Typical DTM schemes only address core level thermal issues. However, the Network-on-chip (NoC) paradigm, which has emerged as an enabling methodology for integrating hundreds to thousands of cores on the same die can contribute significantly to the thermal issues. Moreover, the typical DTM is triggered reactively based on temperature measurements from on-chip thermal sensor requiring long reaction times whereas predictive DTM method estimates future temperature in advance, eliminating the chance of temperature overshoot. Artificial Neural Networks (ANNs) have been used in various domains for modeling and prediction with high accuracy due to its ability to learn and adapt. This thesis concentrates on designing an ANN prediction engine to predict the thermal profile of the cores and Network-on-Chip elements of the chip. This thermal profile of the chip is then used by the predictive DTM that combines both core level and network level DTM techniques. On-chip wireless interconnect which is recently envisioned to enable energy-efficient data exchange between cores in a multicore environment, will be used to provide a broadcast-capable medium to efficiently distribute thermal control messages to trigger and manage the DTM schemes

    Proceedings of the Second International Workshop on HyperTransport Research and Applications (WHTRA2011)

    Get PDF
    Proceedings of the Second International Workshop on HyperTransport Research and Applications (WHTRA2011) which was held Feb. 9th 2011 in Mannheim, Germany. The Second International Workshop for Research on HyperTransport is an international high quality forum for scientists, researches and developers working in the area of HyperTransport. This includes not only developments and research in HyperTransport itself, but also work which is based on or enabled by HyperTransport. HyperTransport (HT) is an interconnection technology which is typically used as system interconnect in modern computer systems, connecting the CPUs among each other and with the I/O bridges. Primarily designed as interconnect between high performance CPUs it provides an extremely low latency, high bandwidth and excellent scalability. The definition of the HTX connector allows the use of HT even for add-in cards. In opposition to other peripheral interconnect technologies like PCI-Express no protocol conversion or intermediate bridging is necessary. HT is a direct connection between device and CPU with minimal latency. Another advantage is the possibility of cache coherent devices. Because of these properties HT is of high interest for high performance I/O like networking and storage, but also for co-processing and acceleration based on ASIC or FPGA technologies. In particular acceleration sees a resurgence of interest today. One reason is the possibility to reduce power consumption by the use of accelerators. In the area of parallel computing the low latency communication allows for fine grain communication schemes and is perfectly suited for scalable systems. Summing up, HT technology offers key advantages and great performance to any research aspect related to or based on interconnects. For more information please consult the workshop website (http://whtra.uni-hd.de)

    Venice: Exploring Server Architectures for Effective Resource Sharing

    Get PDF
    Consolidated server racks are quickly becoming the backbone of IT infrastructure for science, engineering, and business, alike. These servers are still largely built and organized as when they were distributed, individual entities. Given that many fields increasingly rely on analytics of huge datasets, it makes sense to support flexible resource utilization across servers to improve cost-effectiveness and performance. We introduce Venice, a family of data-center server architectures that builds a strong communication substrate as a first-class resource for server chips. Venice provides a diverse set of resource-joining mechanisms that enables user programs to efficiently leverage non-local resources. To better understand the implications of design decisions about system support for resource sharing we have constructed a hardware prototype that allows us to more accurately measure end-to-end performance of at-scale applications and to explore tradeoffs among performance, power, and resource-sharing transparency. We present results from our initial studies analyzing these tradeoffs when sharing memory, accelerators, or NICs. We find that it is particularly important to reduce or hide latency, that data-sharing access patterns should match the features of the communication channels employed, and that inter-channel collaboration can be exploited for better performance
    • …
    corecore