11 research outputs found
Towards Compelling Cases for the Viability of Silicon-Nanophotonic Technology in Future Many-core Systems
Many crossbenchmarking results reported in the open literature raise optimistic expectations on the use of optical networks-on-chip (ONoCs) for high-performance and low-power on-chip communications in future Manycore Systems. However, these works ultimately fail to make a compelling case for the viability of silicon-nanophotonic technology for two fundamental reasons:
(1)Lack of aggressive electrical baselines (ENoCs).
(2) Inaccuracy in physical- and architecture-layer analysis of the ONoC.
This thesis aims at providing the guidelines and minimum requirements so that nanophotonic emerging technology may become of practical relevance. The key enabler for this study is a cross-layer design methodology of the optical transport medium, ranging from the consideration of the predictability gap between ONoC logic schemes and their physical implementations, up to architecture-level design issues such as the network interface and its co-design requirements with the memory hierarchy. In order to increase the practical relevance of the study, we consider a consolidated electrical NoC counterpart with an optimized architecture from a performance and power viewpoint. The quality metrics of this latter are derived from synthesis and place&route on an industrial 40nm low-power technology library. Building on this methodology, we are able to provide a realistic energy efficiency comparison between ONoC and ENoC both at the level of the system interconnect and of the system as a whole, pointing out the sensitivity of the results to the maturity of the underlying silicon nanophotonic technology, and at the same time paving the way towards compelling cases for the viability of such technology in next generation many-cores systems
Thermal Aware Design Method for VCSEL-Based On-Chip Optical Interconnect
Optical Network-on-Chip (ONoC) is an emerging technology considered as one of
the key solutions for future generation on-chip interconnects. However, silicon
photonic devices in ONoC are highly sensitive to temperature variation, which
leads to a lower efficiency of Vertical-Cavity Surface-Emitting Lasers
(VCSELs), a resonant wavelength shift of Microring Resonators (MR), and results
in a lower Signal to Noise Ratio (SNR). In this paper, we propose a methodology
enabling thermal-aware design for optical interconnects relying on
CMOS-compatible VCSEL. Thermal simulations allow designing ONoC interfaces with
low gradient temperature and analytical models allow evaluating the SNR.Comment: IEEE International Conference on Design Automation and Test in Europe
(DATE 2015), Mar 2015, Grenoble, France. 201
Layout aware router design and optimization for Wavelength-Routed Optical NoCs
Optical Networks-on-Chip are a promising solution for high-performance multi-core integration with better latency and bandwidth than traditional Electrical NoCs. Wavelength-routed ONoCs offer yet additional performance guarantees. However, WRONoC design presents new EDA challenges which have not yet been fully addressed. So far, most topology analysis is abstract, i.e., overlooks layout concerns, while for layout the tools available perform P&R but no topology optimization. Thus, a need arises for a novel optimization method combining both aspects of WRONoC design. In this thesis such a method is proposed and compared to the state of the art design procedure. Results available so far show a remarkable 50% reduction in maximum insertion loss with this new approach
Security of Electrical, Optical and Wireless On-Chip Interconnects: A Survey
The advancement of manufacturing technologies has enabled the integration of
more intellectual property (IP) cores on the same system-on-chip (SoC).
Scalable and high throughput on-chip communication architecture has become a
vital component in today's SoCs. Diverse technologies such as electrical,
wireless, optical, and hybrid are available for on-chip communication with
different architectures supporting them. Security of the on-chip communication
is crucial because exploiting any vulnerability would be a goldmine for an
attacker. In this survey, we provide a comprehensive review of threat models,
attacks, and countermeasures over diverse on-chip communication technologies as
well as sophisticated architectures.Comment: 41 pages, 24 figures, 4 table
High-Performance and Wavelength-Reused Optical Network on Chip (ONoC) Architectures and Communication Schemes for Manycore Processor
Optical Network on Chip (ONoC) is an emerging chip-scale optical interconnection technology to realize the high-performance and power-efficient inter-core communication for many-core processors. By utilizing the silicon photonic interconnects to transmit data packets with optical signals, it can achieve ultra low communication delay, high bandwidth capacity, and low power dissipation. With the benefits of Wavelength Division Multiplexing (WDM), multiple optical signals can simultaneously be transmitted in the same optical interconnect through different wavelengths. Thus, the WDM-based ONoC is becoming a hot research topic recently. However, the maximal number of available wavelengths is restricted for the reliable and power-efficient optical communication in ONoC. Hence, with a limited number of wavelengths, the design of high-performance and power-efficient ONoC architecture is an important and challenging problem.
In this thesis, the design methodology of wavelength-reused ONoC architecture is explored. With the wavelength reuse scheme in optical routing paths, high-performance and power-efficient communication is realized for many-core processors only using a small number of available wavelengths. Three wavelength-reused ONoC architectures and communication schemes are proposed to fulfil different communication requirements, i.e., network scalability, multicast communication, and dark silicon.
Firstly, WRH-ONoC, a wavelength-reused hierarchical Optical Network on Chip architecture, is proposed to achieve high network scalability, namely obtaining low communication delay and high throughput capacity for hundreds of thousands of cores by reusing the limited number of available wavelengths with the modest hardware cost and energy overhead. WRH-ONoC combines the advantages of non-blocking communication in each lambda-router and wavelength reuse in all lambda-routers through the hierarchical networking. Both theoretical analysis and simulation results indicate that WRH-ONoC can achieve prominent improvement on the communication performance and scalability (e.g., 46.0% of reduction on the zero-load packet delay and 72.7% of improvement on the network throughput for 400 cores with small hardware cost and energy overhead) in comparison with existing schemes.
Secondly, DWRMR, a dynamical wavelength-reused multicast scheme based on the optical multicast ring, is proposed for widely existing multicast communications in many-core processors. In DWRMR, an optical multicast ring is dynamically constructed for each multicast group and the multicast packets are transmitted in a single-send-multi-receive manner requiring only one wavelength. All the cores in the same multicast group can reuse the established multicast ring through an optical token arbitration scheme for the interactive multicast communications, thereby avoiding the frequent construction of multicast routing paths dedicatedly for each core. Simulation results indicate that DWRMR can reduce more than 50% of end-to-end packet delay with slight hardware cost, or require only half number of wavelengths to achieve the same performance compared with existing schemes.
Thirdly, Dark-ONoC, a dynamically configurable ONoC architecture, is proposed for the many-core processor with dark silicon. Dark silicon is an inevitable phenomenon that only a small number of cores can be activated simultaneously while the other cores must stay in dark state (power-gated) due to the restricted power budget. Dark-ONoC periodically allocates non-blocking optical routing paths only between the active cores with as less wavelengths as possible. Thus, it can obtain high-performance communication and low power consumption at the same time. Extensive simulations are conducted with the dark silicon patterns from both synthetic distribution and real data traces. The simulation results indicate that the number of wavelengths is reduced by around 15% and the overall power consumption is reduced by 23.4% compared to existing schemes.
Finally, this thesis concludes several important principles on the design of wavelength-reused ONoC architecture, and summarizes some perspective issues for the future research
Recommended from our members
Cross-Layer Pathfinding for Off-Chip Interconnects
Off-chip interconnects for integrated circuits (ICs) today induce a diverse design space, spanning many different applications that require transmission of data at various bandwidths, latencies and link lengths. Off-chip interconnect design solutions are also variously sensitive to system performance, power and cost metrics, while also having a strong impact on these metrics. The costs associated with off-chip interconnects include die area, package (PKG) and printed circuit board (PCB) area, technology and bill of materials (BOM). Choices made regarding off-chip interconnects are fundamental to product definition, architecture, design implementation and technology enablement. Given their cross-layer impact, it is imperative that a cross-layer approach be employed to architect and analyze off-chip interconnects up front, so that a top-down design flow can comprehend the cross-layer impacts and correctly assess the system performance, power and cost tradeoffs for off-chip interconnects. Chip architects are not exposed to all the tradeoffs at the physical and circuit implementation or technology layers, and often lack the tools to accurately assess off-chip interconnects. Furthermore, the collaterals needed for a detailed analysis are often lacking when the chip is architected; these include circuit design and layout, PKG and PCB layout, and physical floorplan and implementation. To address the need for a framework that enables architects to assess the system-level impact of off-chip interconnects, this thesis presents power-area-timing (PAT) models for off-chip interconnects, optimization and planning tools with the appropriate abstraction using these PAT models, and die/PKG/PCB co-design methods that help expose the off-chip interconnect cross-layer metrics to the die/PKG/PCB design flows. Together, these models, tools and methods enable cross-layer optimization that allows for a top-down definition and exploration of the design space and helps converge on the correct off-chip interconnect implementation and technology choice. The tools presented cover off-chip memory interfaces for mobile and server products, silicon photonic interfaces, 2.5D silicon interposers and 3D through-silicon vias (TSVs). The goal of the cross-layer framework is to assess the key metrics of the interconnect (such as timing, latency, active/idle/sleep power, and area/cost) at an appropriate level of abstraction by being able to do this across layers of the design flow. In additional to signal interconnect, this thesis also explores the need for such cross-layer pathfinding for power distribution networks (PDN), where the system-on-chip (SoC) floorplan and pinmap must be optimized before the collateral layouts for PDN analysis are ready. Altogether, the developed cross-layer pathfinding methodology for off-chip interconnects enables more rapid and thorough exploration of a vast design space of off-chip parallel and serial links, inter-die and inter-chiplet links and silicon photonics. Such exploration will pave the way for off-chip interconnect technology enablement that is optimized for system needs. The basis of the framework can be extended to cover other interconnect technology as well, since it fundamentally relates to system-level metrics that are common to all off-chip interconnects
Design Space Exploration and Resource Management of Multi/Many-Core Systems
The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends
Monolithic electronic-photonic integration in state-of-the-art CMOS processes
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 388-407).As silicon CMOS transistors have scaled, increasing the density and energy efficiency of computation on a single chip, the off-chip communication link to memory has emerged as the major bottleneck within modern processors. Photonic devices promise to break this bottleneck with superior bandwidth-density and energy-efficiency. Initial work by many research groups to adapt photonic device designs to a silicon-based material platform demonstrated suitable independent performance for such links. However, electronic-photonic integration attempts to date have been limited by the high cost and complexity associated with modifying CMOS platforms suitable for modern high-performance computing applications. In this work, we instead utilize existing state-of-the-art electronic CMOS processes to fabricate integrated photonics by: modifying designs to match the existing process; preparing a design-rule compliant layout within industry-standard CAD tools; and locally-removing the handle silicon substrate in the photonic region through post-processing. This effort has resulted in the fabrication of seven test chips from two major foundries in 28, 45, 65 and 90 nm CMOS processes. Of these efforts, a single die fabricated through a widely available 45nm SOI-CMOS mask-share foundry with integrated waveguides with 3.7 dB/cm propagation loss alongside unmodified electronics with less than 5 ps inverter stage delay serves as a proof-of-concept for this approach. Demonstrated photonic devices include high-extinction carrier-injection modulators, 8-channel wavelength division multiplexing filter banks and low-efficiency silicon germanium photodetectors. Simultaneous electronic-photonic functionality is verified by recording a 600 Mb/s eye diagram from a resonant modulator driven by integrated digital circuits. Initial work towards photonic device integration within the peripheral CMOS flow of a memory process that has resulted in polysilicon waveguide propagation losses of 6.4 dB/cm will also be presented.by Jason S. Orcutt.Ph.D
On the simulation and design of manycore CMPs
The progression of Moore’s Law has resulted in both embedded and performance
computing systems which use an ever increasing number of processing cores integrated
in a single chip. Commercial systems are now available which provide hundreds
of cores, and academics have proposed architectures for up to 1024 cores. Embedded
multicores are increasingly popular as it is easier to guarantee hard-realtime constraints
using individual cores dedicated for tasks, than to use traditional time-multiplexed processing.
However, finding the optimal hardware configuration to meet these requirements
at minimum cost requires extensive trial and error approaches to investigate the
design space.
This thesis tackles the problems encountered in the design of these large scale multicore
systems by first addressing the problem of fast, detailed micro-architectural simulation.
Initially addressing embedded systems, this work exploits the lack of hardware
cache-coherence support in many deeply embedded systems to increase the available
parallelism in the simulation. Then, through partitioning the NoC and using packet
counting and cycle skipping reduces the amount of computation required to accurately
model the NoC interconnect. In combination, this enables simulation speeds significantly
higher than the state of the art, while maintaining less error, when compared
to real hardware, than any similar simulator. Simulation speeds reach up to 370MIPS
(Million (target) Instructions Per Second), or 110MHz, which is better than typical
FPGA prototypes, and approaching final ASIC production speeds. This is achieved
while maintaining an error of only 2.1%, significantly lower than other similar simulators.
The thesis continues by scaling the simulator past large embedded systems up to
64-1024 core processors, adding support for coherent architectures using the same
packet counting techniques along with low overhead context switching to enable the
simulation of such large systems with stricter synchronisation requirements. The new
interconnect model was partitioned to enable parallel simulation to further improve
simulation speeds in a manner which did not sacrifice any accuracy.
These innovations were leveraged to investigate significant novel energy saving optimisations
to the coherency protocol, processor ISA, and processor micro-architecture.
By introducing a new instruction, with the name wait-on-address, the energy spent during
spin-wait style synchronisation events can be significantly reduced. This functions
by putting the core into a low-power idle state while the cache line of the indicated
address is monitored for coherency action. Upon an update or invalidation (or traditional
timer or external interrupts) the core will resume execution, but the active
energy of running the core pipeline and repeatedly accessing the data and instruction
caches is effectively reduced to static idle power. The thesis also shows that existing
combined software-hardware schemes to track data regions which do not require coherency
can adequately address the directory-associativity problem, and introduces a
new coherency sharer encoding which reduces the energy consumed by sharer invalidations
when sharers are grouped closely together, such as would be the case with a
system running many tasks with a small degree of parallelism in each.
The research concludes by using the extremely fast simulation speeds developed to
produce a large set of training data, collecting various runtime and energy statistics for
a wide range of embedded applications on a huge diverse range of potential MPSoC
designs. This data was used to train a series of machine learning based models which
were then evaluated on their capacity to predict performance characteristics of unseen
workload combinations across the explored MPSoC design space, using only two sample
simulations, with promising results from some of the machine learning techniques.
The models were then used to produce a ranking of predicted performance across the
design space, and on average Random Forest was able to predict the best design within
89% of the runtime performance of the actual best tested design, and better than 93%
of the alternative design space. When predicting for a weighted metric of energy, delay
and area, Random Forest on average produced results within 93% of the optimum
result.
In summary this thesis improves upon the state of the art for cycle accurate multicore
simulation, introduces novel energy saving changes the the ISA and microarchitecture
of future multicore processors, and demonstrates the viability of machine
learning techniques to significantly accelerate the design space exploration required to
bring a new manycore design to market
Contrasting wavelength-routed optical NoC topologies for power-efficient 3D-stacked multicore processors using physical-layer analysis
Optical networks-on-chip (ONoCs) are currently still in the concept stage, and would benefit from explorative studies capable of bridging the gap between abstract analysis frameworks and the constraints and challenges posed by the physical layer. This paper aims to go beyond the traditional comparison of wavelength-routed ONoC topologies based only on their abstract properties, and for the first time assesses their physical implementation efficiency in an homogeneous experimental setting of practical relevance. As a result, the paper can demonstrate the significant and different deviation of topology layouts from their logic schemes under the effect of placement constraints on the target system. This becomes then the preliminary step for the accurate characterization of technology-specific metrics such as the insertion loss critical path, and to derive the ultimate impact on power efficiency and feasibility of each design